Difference between revisions of "Página de pruebas"

From Sinfronteras
Jump to: navigation, search
Line 3: Line 3:
  
 
====Fake news detection====
 
====Fake news detection====
In the previous section, we introduced the conceptual characterization  of  traditional  fake  news  and  fake  news  in  social media.  Based on this characterization,  we further explore  the  problem  definition  and  proposed  approaches  for fake news detection.
+
In the previous section, we introduced the '''conceptual characterization  of  traditional  fake  news  and  fake  news  in  social media'''.  Based on this characterization,  we further explore  the  '''problem  definition  and  proposed  approaches  for fake news detection'''.
  
 
=====Problem Definition=====
 
=====Problem Definition=====
In  this subsection, we present the details of mathematical formulation of fake news detection on social media.  Specifically, we will introduce the  definition of key components of fake news and then present the formal definition of fake news detection. The basic notations are defined below,
+
In  this subsection, we present the details of '''mathematical formulation of fake news detection on social media'''.  Specifically, we will introduce the  definition of '''key components of fake news''' and then present the '''formal definition of fake news detection.''' The basic notations are defined below,
  
*Let a refer to a News  Article. It consists of two major components: Publisher and Content. Publisher «Pa» includes a set of profile features to describe the original author, such as name, domain, age, among other attributes. Content «Ca» consists of a set of attributes that represent the news article and includes headline, text, image, etc.
+
*Let a refer to a ''News  Article''. It consists of two major components: ''Publisher'' and ''Content'':
 +
**''Publishe''r <math>
 +
\vec{p_a}
 +
</math>includes a set of profile features to describe the original author, such as name, domain, age, among other attributes.
 +
**''Content'' <math>
 +
\vec{c_a}
 +
</math>consists of a set of attributes that represent the news article and includes headline, text, image, etc.
  
*We also define Social News Engagements as a set of tuples «E={e_it}» to represent the process of how news spread over time among n users «U={u1, u2, .., un}» and their corresponding posts P={p1, p2, ..., pn} on social media regarding news article «a». Each  engagement e_it={ui, pi, t} represents that a user «ui» spreads news  article «a» using «pi» at time t. Note that we set t=Null if the article «a» does not have any engagement yet and thus «ui» represents the publisher.
+
*We also define ''Social News Engagements'' as a set of tuples <math>
 +
\varepsilon\{e_{it}\}
 +
</math>to represent the process of how news spread over time among ''n users'' <math>
 +
U=\{u_1, u_2,..., u_n\}
 +
</math>and their corresponding ''posts'' <math>
 +
P=\{p_1, p_2, ..., p_n\}
 +
</math> on social media regarding news article <math>
 +
a
 +
</math>. Each  engagement <math>
 +
e_{it}=\{u_i, p_i, t\}
 +
</math>represents that a ''user'' <math>
 +
u_i
 +
</math> spreads news  article <math>
 +
a
 +
</math>using <math>
 +
p_i
 +
</math> at time <math>
 +
t
 +
</math>. Note that we set <math>
 +
t = Null
 +
</math> if the article <math>
 +
a
 +
</math> does not have any engagement yet and thus <math>
 +
u_i
 +
</math> represents the publisher.
  
Definition 2 (Fake News Detection) Given the social news engagements E among n users for news article «a», the task of fake news detection is to predict whether the news article «a» is a fake news piece or not, i.e., F : E → {0,1} such that,
+
''Definition 2 (Fake News Detection) Given the social news engagements'' <math>
 +
\varepsilon
 +
</math> ''among n users for news article'' <math>
 +
a,
 +
</math>''the task of fake news detection is to predict whether the news article'' <math>
 +
a
 +
 
 +
</math> ''is a fake news piece or not, i.e.,'' <math>
 +
F: \varepsilon \rightarrow \{0, 1\}
 +
</math> ''such that,''
  
 
<math>
 
<math>
f(a) =
+
F(a) =
 
\begin{cases}
 
\begin{cases}
 
1, \text{if } a \text{ is a piece of fake news}  \\
 
1, \text{if } a \text{ is a piece of fake news}  \\
0, \text{otherwise}
+
0, \text{otherwise}  
 
\end{cases}
 
\end{cases}
 
</math>
 
</math>
  
where F is the prediction function we want to learn. Note that we define fake news detection as a binary classification problem for the following reason: fake news is essentially a distortion bias on information manipulated by the publisher. According to previous research about media bias theory [26],  distortion bias is usually modeled as a binary classification problem.
+
where <math>
 +
F
 +
</math> is the '''''prediction function''''' we want to learn. Note that we define fake news detection as a '''''binary classification problem''''' for the following reason: fake news is essentially a distortion bias on information manipulated by the publisher. According to previous research about media bias theory [26],  distortion bias is usually modeled as a binary classification problem.
 +
 
 +
Next, we propose a '''''general data mining framework for fake news  detection''''' which includes two phases:
  
Next, we propose a general data mining framework for fake news  detection which includes two phases: (i) feature extraction and (ii) model construction.  The feature extraction phase aims to represent news content and related auxiliary information in a formal mathematical structure, and model construction phase further builds machine learning models to better differentiate fake news and real news based on the feature representations.
+
* (i) Feature extraction: The feature extraction phase aims to represent news content and related auxiliary information in a formal mathematical structure.
 +
* (ii) Model construction: The model construction phase further builds machine learning models to better differentiate fake news and real news '''''based on the feature representations'''''.
  
 +
<br />
 
=====Feature Extraction=====
 
=====Feature Extraction=====
 
Fake news detection on traditional news media mainly relies on news content, while in social media, extra social context auxiliary information can be used to as additional information to help detect fake news. Thus, we will present the details of how to extract and represent useful features from news content and social context.
 
Fake news detection on traditional news media mainly relies on news content, while in social media, extra social context auxiliary information can be used to as additional information to help detect fake news. Thus, we will present the details of how to extract and represent useful features from news content and social context.

Revision as of 22:28, 17 March 2019

Fake News Detection on Social Media - A Data Mining Perspective

https://www.kdd.org/exploration_files/19-1-Article2.pdf

Fake news detection

In the previous section, we introduced the conceptual characterization of traditional fake news and fake news in social media. Based on this characterization, we further explore the problem definition and proposed approaches for fake news detection.

Problem Definition

In this subsection, we present the details of mathematical formulation of fake news detection on social media. Specifically, we will introduce the definition of key components of fake news and then present the formal definition of fake news detection. The basic notations are defined below,

  • Let a refer to a News Article. It consists of two major components: Publisher and Content:
    • Publisher includes a set of profile features to describe the original author, such as name, domain, age, among other attributes.
    • Content consists of a set of attributes that represent the news article and includes headline, text, image, etc.
  • We also define Social News Engagements as a set of tuples to represent the process of how news spread over time among n users and their corresponding posts on social media regarding news article . Each engagement represents that a user spreads news article using at time . Note that we set if the article does not have any engagement yet and thus represents the publisher.

Definition 2 (Fake News Detection) Given the social news engagements among n users for news article the task of fake news detection is to predict whether the news article is a fake news piece or not, i.e., such that,

where is the prediction function we want to learn. Note that we define fake news detection as a binary classification problem for the following reason: fake news is essentially a distortion bias on information manipulated by the publisher. According to previous research about media bias theory [26], distortion bias is usually modeled as a binary classification problem.

Next, we propose a general data mining framework for fake news detection which includes two phases:

  • (i) Feature extraction: The feature extraction phase aims to represent news content and related auxiliary information in a formal mathematical structure.
  • (ii) Model construction: The model construction phase further builds machine learning models to better differentiate fake news and real news based on the feature representations.


Feature Extraction

Fake news detection on traditional news media mainly relies on news content, while in social media, extra social context auxiliary information can be used to as additional information to help detect fake news. Thus, we will present the details of how to extract and represent useful features from news content and social context.

News Content Features

News content features c_a describe the meta information related to a piece of news. A list of representative news content attributes are listed below:

  • Source: Author or publisher of the news article
  • Headline: Short title text that aims to catch the attention of readers and describes the main topic of the article
  • Body Text: Main text that elaborates the details of the news story; there is usually a major claim that is specifically highlighted and that shapes the angle of the publisher
  • Image/Video: Part of the body content of a news article that provides visual cues to frame the story

Based on these raw content attributes, different kinds of feature representations can be built to extract discriminative characteristics of fake news. Typically, the news content we are looking at will mostly be linguistic-based and visual-based, described in more detail below.

  • Linguistic-based: Since fake news pieces are intentionally created for financial or political gain rather than to report objective claims, they often contain opinionated and inflammatory language, crafted as “clickbait” (i.e., to entice users to click on the link to read the full article) or to incite confusion [13]. Thus, it is reasonable to exploit linguistic features that capture the different writing styles and sensational headlines to detect fake news.
  • Visual-based: Visual cues have been shown to be an important manipulator for fake news propaganda. As we have characterized, fake news exploits the individual vulnerabilities of people and thus often relies on sensational or even fake images to provoke anger or other emotional response of consumers. Visual-based features are extracted from visual elements (e.g. images and videos) to capture the different characteristics for fake news.


Social Context Features

In addition to features related directly to the content of the news articles, additional social context features can also be derived from the user-driven social engagements of news consumption on social media platform. Social engagements represent the news proliferation process over time, which provides useful auxiliary information to infer the veracity of news articles. Note that few papers exist in the literature that detect fake news using social context features. However, because we believe this is a critical aspect of successful fake news detection, we introduce a set of common features utilized in similar research areas, such as rumor veracity classification on social media. Generally, there are three major aspects of the social media context that we want to represent: users, generated posts, and networks. Below, we investigate how we can extract and represent social context features from these three aspects to support fake news detection

  • User-based: As we mentioned in Section 2.3, fake news pieces are likely to be created and spread by non-human accounts, such as social bots or cyborgs. Thus, capturing users’ profiles and characteristics by user-based features can provide useful information for fake news detection.
  • Post-based: People express their emotions or opinions towards fake news through social media posts, such as skeptical opinions, sensational reactions, etc. Thus, it is reasonable to extract post-based features to help find potential fake news via reactions from the general public as expressed in posts.
  • Network-based: Users form different networks on social media in terms of interests, topics, and relations. As mentioned before, fake news dissemination processes tend to form an echo chamber cycle, highlighting the value of extracting network-based features to represent these types of network patterns for fake news detection. Network-based features are extracted via constructing specific networks among the users who published related social media posts.
Model Construction

In the previous section, we introduced features extracted from different sources, i.e., news content and social context, for fake news detection. In this section, we discuss the details of the model construction process for several existing approaches. Specifically we categorize existing methods based on their main input sources as: News Content Models and Social Context Models.

News Content Models

In this subsection, we focus on news content models, which mainly rely on news content features and existing factual sources to classify fake news. Specifically, existing approaches can be categorized as Knowledge-based and Style-based.

Knowledge-based: Knowledgebased approaches aim to use external sources to fact-check proposed claims in news content.

Existing fact-checking approaches can be categorized as expert-oriented, crowdsourcing-oriented, and computational-oriented:

Expert-oriented:


Crowdsourcing-oriented:


Computational-oriented:


Style-based: Style-based approaches try to detect fake news by capturing the manipulators in the writing style of news content. There are mainly two typical categories of style-based methods: Deception-oriented and Objectivity-oriented:

Deception-oriented:

Objectivity-oriented:

Social Context Models

Social context models include relevant user social engagements in the analysis, capturing this auxiliary information from a variety of perspectives. We can classify existing approaches for social context modeling into two categories: Stance-based and Propagation-based.

Note that very few existing fake news detection approaches have utilized social context models. Thus, we also introduce similar methods for rumor detection using social media, which have potential application for fake news detection.

Stance-based:

Propagation-based: