joi, 2 noiembrie 2023

NLP to fight disinformation

Who did what?

The US military decided to use Natural Language Processing (NLP) [1] in order to detect (and possibly combat disinformation. They plan to use this new system in order to "read" online articles and detect potential manipulation of public opinion.

Why should we care?

We have seen in the past that bots were used to spread fear regarding natural disasters [2] in Australia when there was a big fire season. In that case they used it to blame most fires on arson and not climate change. As expected most people couldn't tell the difference between fact and fiction, so they believed the lies. Here some Twitter bot detection tools were used [3] to see what posts were made by bots and what posts not.

How was it done?

The system made by the US DoD uses a modified version of XLNet [4] for named entity recognition in order to classify nouns in a given article as people, places, organizations, or miscellaneous. The designed model was trained using CoNLL-2003 (a dataset consisting of named entities in various languages, specializing in defense, finance, news, and science documents).

It will first create an index of nouns it already classified and creates a knowledge graph with them and then send them on to more specialized models. The disadvantage of the system is that it then requires humans to find some patterns in the data. In an interview they said that: “We are building a sensor array that analysts need to see patterns on a larger scale than humans can comprehend.” So the system isn't fully automated, but is of course a big step forward from just guessing if posts are fake or not.

When they demoed the system it looked at stories about the Nagorno-Karabakh region and the rising tensions there. The conclusion was that Russian newspapers wanted to convince the readers that Turkey (their rival) was aiding Azerbaijan. Only Russian newspapers reported that.

---

References

[1] DeepLearning, Online, 4 NOV 2020, https://www.deeplearning.ai/the-batch/propaganda-watch/ Accessed 2 NOV 2023
[3] Michael W. Kearney, Online, 4 NOV 2020, https://github.com/mkearney/tweetbotornot Accessed 2 NOV 2023
[4] Zhilin Yang et al., XLNet: Generalized Autoregressive Pretraining for Language Understanding, arXiv:1906.08237 [cs.CL] https://doi.org/10.48550/arXiv.1906.08237

Niciun comentariu:

Trimiteți un comentariu

Gestionarea traficului prin inteligenta artificiala

  Gestionarea traficului             Circulația rutieră devine din ce în ce mai aglomerată și mai lentă, favorizându-se producerea a numeroa...