Difference between revisions of "Página de pruebas"

From Sinfronteras
Jump to: navigation, search
(Blanked the page)
(Tag: Blanking)
Line 1: Line 1:
=Fake News Detection on Social Media - A Data Mining Perspective=
 
https://www.kdd.org/exploration_files/19-1-Article2.pdf
 
  
==Fake news detection==
 
In the previous section, we introduced the '''conceptual characterization  of  traditional  fake  news  and  fake  news  in  social media'''.  Based on this characterization,  we further explore  the  '''problem  definition  and  proposed  approaches  for fake news detection'''.
 
 
===Problem Definition===
 
In  this subsection, we present the details of '''mathematical formulation of fake news detection on social media'''.  Specifically, we will introduce the  definition of '''key components of fake news''' and then present the '''formal definition of fake news detection.'''
 
 
 
'''The basic notations are defined below:'''
 
 
*Let <math>a</math> refer to a ''News  Article''. It consists of two major components: ''Publisher'' and ''Content'':
 
**''Publishe''r <math> \vec{p_a} </math>includes a set of profile features to describe the original author, such as name, domain, age, among other attributes.
 
**''Content'' <math> \vec{c_a} </math>consists of a set of attributes that represent the news article and includes headline, text, image, etc.
 
 
*We also define ''Social News Engagements'' as a set of tuples <math> \varepsilon\{e_{it}\} </math>to represent the process of how news spread over time among ''n users'' <math> U=\{u_1, u_2,..., u_n\} </math>and their corresponding ''posts'' <math> P=\{p_1, p_2, ..., p_n\} </math> on social media regarding news article <math> a </math>. Each  engagement <math> e_{it}=\{u_i, p_i, t\} </math>represents that a ''user'' <math> u_i </math> spreads news  article <math> a </math> using <math> p_i </math> at time <math> t </math>. Note that we set <math> t = Null </math> if the article <math> a </math> does not have any engagement yet and thus <math> u_i </math> represents the publisher.
 
 
 
<span style="background:#D8BFD8">'''''Definition 2 (Fake News Detection):'''''</span> ''Given the social news engagements'' <math>
 
\varepsilon
 
</math> ''among'' <math>n</math>''users for news article'' <math> a, </math>''the task of fake news detection is to predict whether the news article'' <math> a
 
</math> ''is a fake news piece or not, i.e.,'' <math> F: \varepsilon \rightarrow \{0, 1\} </math> ''such that:''
 
 
<math>
 
F(a) =
 
\begin{cases}
 
1, \text{ if } a  \text{  is a piece of fake news}  \\
 
0, \text{ otherwise}
 
\end{cases}
 
</math>
 
 
where <math> F </math> is the '''''prediction function''''' we want to learn. Note that we define fake news detection as a '''''binary classification problem''''' for the following reason: fake news is essentially a distortion bias on information manipulated by the publisher. According to previous research about media bias theory [26],  distortion bias is usually modeled as a binary classification problem.
 
 
 
Next, we propose a '''''general data mining framework for fake news  detection''''' which includes two phases:
 
 
*'''(i) Feature extraction:''' The feature extraction phase aims to represent news content and related auxiliary information in a formal mathematical structure.
 
*'''(ii) Model construction:''' The model construction phase further builds machine learning models to better differentiate fake news and real news '''''based on the feature representations'''''.
 
 
 
===Feature Extraction===
 
'''Fake news detection on traditional news media mainly relies on news content, while in social media, extra social context auxiliary information can be used to as additional information to help detect fake news.''' Thus, we will present the details of how to extract and represent useful features from news content and social context.
 
 
 
====News Content Features====
 
News content features <math>\vec{c_a}</math> describe the meta information related to a piece of news. A list of '''representative news content attributes''' are listed below:
 
 
*'''Source:''' Author or publisher of the news article
 
*'''Headline:''' Short title text that aims to catch the attention of readers and describes the main topic of the article
 
*'''Body Text:''' Main text that elaborates the details of the news story; there is usually a major claim that is specifically  highlighted  and  that shapes the angle of the publisher
 
*'''Image/Video:''' Part  of  the  body  content  of  a  news article that provides visual cues to frame the story
 
 
'''Based  on  these  raw  content  attributes,  different  kinds  of feature representations can be built to extract discriminative characteristics of fake news.'''  Typically, the news content we are  looking  at  will  mostly  be linguistic-based and visual-based, described in more detail below.
 
 
 
=====Linguistic-based=====
 
Since fake news  pieces  are  intentionally created for financial or political gain rather than to report  objective  claims,  they  often  contain  opinionated  and inflammatory  language,  crafted  as  ''"clickbait"''  (i.e.,  to  entice  users  to  click  on  the  link  to  read  the  full  article)  or to  incite  confusion [13].  Thus,  it  is  reasonable  to  exploit linguistic  features  that  capture  the  different  writing  styles and  sensational  headlines  to  detect  fake  news.
 
 
Linguistic-based features are extracted from the text content in terms of document organizations from different levels, such as characters, words, sentences, and documents.  In order to capture  the  different  aspects  of  fake  news  and  real  news,  existing  work  utilized  both  '''''common  linguistic  features'''''  and '''''domain-specific linguistic features'''''.
 
 
======Common linguistic features======
 
''Common linguistic features'' Are often used to represent documents for various tasks in  natural  language  processing. Typical  common  linguistic  features  are:
 
 
*'''''(i) Lexical  features:'''''  Including  character-level  and  word-level  features,  such  as  total  words,  characters per word, frequency of large words, and unique words.
 
 
*'''''(ii) Syntactic features:''''' Including sentence-level features, such as frequency of function words and phrases (i.e., "n-grams" and bag-of-words approaches [24]) or punctuation and parts-of-speech  (POS)  tagging.
 
 
======Domain-specific linguistic features======
 
These are specifically aligned to news domain, such as quoted words, external links, number of graphs, and the average length of graphs, etc [62].
 
 
 
Moreover, other features can  be  specifically  designed  to  capture  the  deceptive  cues in  writing  styles  to  differentiate  fake  news,  such  as  lying-detection features [1].
 
 
 
=====Visual-based=====
 
Visual cues have been shown to be an important  manipulator  for  fake  news  propaganda. As we have characterized, fake news exploits the individual vulnerabilities of people and thus often relies on sensational or even fake images to provoke anger or other emotional response of consumers. Visual-based features are extracted from visual elements (e.g.  images and videos) to capture the different characteristics for fake news.
 
 
Faking images were identified based on various user-level and tweet-level hand-crafted features using classification framework [28]. Recently, various '''''visual''''' and '''''statistical''''' ''features'' has been extracted for news verification [38]:
 
 
*'''''Visual features''''' include:
 
 
:*Clarity score
 
:*Coherence score
 
:*Similarity distribution histogram
 
:*Diversity score
 
:*Clustering score.
 
 
*'''''Statistical features''''' include ''count, image  ratio,  multi-image  ratio,  hot  image  ratio,  long  image ratio,'' etc.
 
 
 
====Social Context Features====
 
In  addition  to  features  related  directly  to  the  content  of the news articles, '''additional social context features can also be derived from the''' '''''user-driven social engagements''''' '''of news consumption on social media platform.'''  '''''Social engagements''''' represent  the  news  proliferation  process  over  time,  which provides useful auxiliary information to infer the veracity of news articles.  <span style="color:#FF0000">Note that few papers exist in the literature that  detect  fake  news  using  social  context  features.  However, because we believe this is a critical aspect of successful fake news detection</span>, '''we introduce a set of common features utilized  in  similar  research  areas,  such  as:'''  '''''rumor  veracity classification  on  social  media'''''.
 
 
 
Generally,  there  are  three major aspects of the social media context that we want to represent:
 
* Users,
 
* Generated posts, and
 
* Networks.
 
 
Below, we investigate how we can extract and represent social context features from these three aspects to support fake news detection:
 
 
 
=====User-based=====
 
As  we  mentioned  in  Section  2.3,  fake  news pieces  are  likely  to  be  created  and  spread  by  non-human accounts,  such as social bots or cyborgs.  Thus,  capturing users’ profiles and characteristics by user-based features can provide  useful  information  for  fake  news  detection.
 
 
=====Post-based=====
 
People express their emotions or opinions towards fake news through social media posts, such as skeptical  opinions,  sensational  reactions,  etc. Thus,  it  is  reasonable to extract post-based features to help find potential fake news via reactions from the general public as expressed in posts.
 
 
=====Network-based=====
 
Users  form  different  networks  on  social media in terms of interests, topics, and relations. As mentioned before, fake news dissemination processes tend to form  an  echo  chamber  cycle,  highlighting  the  value  of  extracting network-based features to represent these types of network  patterns for fake news detection. Network-based features are extracted via constructing specific networks among the users who published related social media posts.
 
 
 
 
===Model Construction===
 
In the previous section, we introduced features extracted from different sources, i.e., '''''news content''''' and '''''social context''''', for fake news detection. In this section, we discuss the details of the model construction process for several existing approaches. Specifically we categorize existing methods based on their main input sources as: '''''News Content Models''''' and '''''Social Context Models'''''.
 
 
 
====News Content Models====
 
In this subsection, we focus on news content models. which mainly rely on '''''news content features''''' and '''''existing factual sources''''' to classify fake news. Specifically, existing approaches can be categorized as '''''Knowledge-based''''' and '''''Style-based.'''''
 
 
 
=====Knowledge-based=====
 
Knowledge-based approaches aim to use '''external sources to''' '''''fact-check''''' '''proposed claims in news content'''. The goal of fact-checking is to assign a truth value to a claim in a particular context [83]. ''Fact-checking'' has attracted increasing attention, and many efforts have been made to develop a feasible automated fact-checking system.
 
 
Existing fact-checking approaches can be categorized as: '''''expert-oriented''''', '''''crowdsourcing-oriented''''', and '''''computational-oriented.'''''
 
 
 
======Expert-oriented approaches======
 
Expert-oriented fact-checking heavily relies on human domain experts to investigate relevant data and documents to construct the verdicts of claim veracity, for example ''PolitiFact'' [11], ''Snopes'' [12], etc.  However, expert-oriented  fact-checking  is  an  intellectually  demanding and time-consuming process, which limits the potential for high efficiency and scalability.
 
 
======Crowdsourcing-oriented approaches======
 
Crowdsourcing-oriented fact-checking exploits the ''"wisdom  of  crowd"''  to  enable  normal  people  to  annotate news  content;  these  annotations  are then  aggregated to  produce  an  overall  assessment  of  the  news  veracity.  For example, ''Fiskkit'' [13] allows users to discuss and annotate the accuracy of specific parts of a news article.  As another example, an anti-fake news bot named "For real" is a public account in the instant communication mobile application ''LINE'' [14], which allows people to report suspicious news content which is then further checked by editors.
 
 
======Computational-oriented approaches======
 
This approaches aims to provide an automatic scalable system to classify true and false claims. Previous computational-oriented fact checking methods try to solve two majors issues:  '''''(i) identifying check-worthy claims''''' (identificar las frases que deben ser comprobadas) and '''''(ii) discriminating the veracity of fact claims.'''''
 
 
.
 
.
 
.
 
 
=====Style-based=====
 
Style-based approaches try to detect fake news by capturing the ''manipulators'' in the writing style of news content. There are mainly two typical categories of style-based methods: '''''Deception-oriented''''' and '''''Objectivity-oriented''''':
 
 
 
======Deception-oriented======
 
These stylometric  methods  capture  the deceptive (engañoso) statements or claims from news content. The motivation of deception detection originates from forensic psychology (i.e., Undeutsch Hypothesis) [82] and various forensic tools including Criteria-based Content Analysis [84] and Scientific-based Content Analysis [45] have been developed.
 
 
More recently,  advanced natural language processing models are applied to spot deception phases from the following perspectives: '''''Deep syntax''''' and '''''Rhetorical  structure'''''.
 
 
* '''''Deep syntax models''''' have been implemented using Probabilistic context-free grammar (PCFG),  with  which  sentences  can  be transformed into rules that describe the syntax structure. Based on the PCFG, different rules can be developed for deception detection, such as ''unlexicalized/lexicalized'' production rules and grandparent rules [22].
 
 
* '''''Rhetorical structure theory''''' can be utilized to capture the  differences  between  deceptive  and  truthful  sentences [68].
 
 
* Deep  network  models,  such  as  ''convolutional neural networks (CNN)'', have also been applied to classify fake news veracity [90].
 
 
 
======Objectivity-oriented======
 
''Objectivity-oriented approaches''  capture  style  signals that can indicate a decreased objectivity of news content and thus the potential to mislead consumers, such as: ''hyperpartisan styles'' and ''yellow-journalism''.
 
 
* '''''Hyperpartisan styles''''' represent extreme behavior in favor of a particular political party, which often correlates with a  strong  motivation  to  create  fake  news. Linguistic-based features can be applied to detect hyper partisan articles [62].
 
 
* '''''Yellow-journalism'''''  represents  those  articles  that  do  not  contain  well-researched  news,  but instead rely on eye-catching headlines (i.e., ''clickbait'') with a propensity for exaggeration, sensationalization,scare-mongering, etc.
 
 
 
====Social Context Models====
 
The  nature  of  social  media  provides  researchers  with  additional  resources  to  supplement  and  enhance  News  Con-tent  Models.  Social  context  models  include  relevant  user social engagements in the analysis, capturing this auxiliary information  from  a  variety  of  perspectives.  We  can  classify existing approaches for social context modeling into two categories: '''''Stance-based''''' and '''''Propagation-based'''''.  Note that very few existing fake news detection approaches have utilized social context models.  Thus, we also introduce similar methods for rumor detection using social media, which have potential application for fake news detection.
 
 
 
=====Stance-based=====
 
Stance-based approaches utilize users' viewpoints  from  relevant  post  contents  to  infer  the  veracity  of original  news  articles.
 
 
The  stance  of  users'  posts  can  be represented  either explicitly or implicitly:
 
* Explicit stances are  direct  expressions  of  emotion  or  opinion,  such  as  the "''thumbs  up''"  and  "''thumbs  down''"  reactions  expressed  in Facebook.
 
 
* Implicit stances can be automatically extracted from  social  media  posts.  Stance  detection  is  the  task  of automatically determining from a post whether the user is in  favor  of,  neutral  toward,  or  against  some  target  entity, event,  or idea [53]
 
 
 
=====Propagation-based=====
 

Revision as of 22:12, 18 March 2019