:::: MENU ::::

Analyzing Language for Persuasion Markers in Twitter Discussions

In studying the #GamerGate discussion(s) on Twitter, I’m using a variety of theories including persuasion, community action, and paranoid style in politics. I could use some help making sense of what I’m seeing, so I’ll be blogging as I go. Please contact me or use the comment functions if you have ideas.

First up, language and social influence. Why this approach? The argument about what #GamerGate is – a discussion about ethics in game journalism or a coordinated harassment effort – could be explored in part by examining the language participants use. I’m interested especially in whether users are persuasive in their languages – can they convince readers that the discussion is about what they claim it’s about? [The mainstream press says, “no“, “no“, “no“, you get the idea.] I’m not advocating for that argument. As you can see from the links in this paragraph, it’s been resolved. I’m curious, rather, about whether the language used by tweeters indicates an argument was even happening, and I’m using the presence/absence of persuasion markers to figure that out.

Some Background Research

A number of language features are connected to social influence, especially in online communication [1]:

  • lexical diversity
  • powerful language
  • language intensity

Lexical diversity refers to a class of measures of the range of vocabulary a writer uses. Often, lexical diversity is measured using a type-token ratio (number of unique words [types] divided by total number of words [tokens]) [2]. Low linguistic diversity leads to low evaluations from readers [3] and “negatively impacts credibility and influence” [1, p. 598]. A number of social factors have been shown to impact lexical diversity including anxiety [4] and writing apprehension [5], both of which produced writing with low lexical diversity.

Powerful language is defined by what it’s missing, namely linguistic features such as tag questions (e.g. “isn’t it?”), hedges (e.g., “sort of”, “kinda”), hesitations (e.g., “um”), fragments, and intensifiers (e.g., “really”) [6]. Writers who use powerful language are perceived as more competent and authoritative [7], and their arguments are judged as more persuasive [6]. Several studies have found that women tend to use less powerful language style than men [8, 910].

Many researchers use Bowers’ [11, p. 345] definition of language intensity as “the quality of language which indicates the degree to which the speaker’s attitude toward a concept deviates from neutrality.” Intensity is often conveyed through emotionality [12] and is measured using some variety of a scale in which words are labeled according to their intensity. Some popular scales include Jones and Thurstone [13] and Burgoon and Miller [14]. Intense language is associated with persuasion [12], resistance to persuasion [14], perceived credibility [12], attitude-behavior consistency [15]. Receivers have been shown to tolerate more intense messages from men than from women [16].

My Expectations

Based on this research about the relationships between linguistic style and social influence, I expect influential tweeters to have high lexical diversity and use powerful, intense language. By influential, I mean those tweeters who are able to control the message. I’m thinking retweets are decent proxies for persuasion and influence – I assume people RT persuasive arguments/authors. I also expect those most committed to the ethics in gaming argument to use the most intense language – they seem the least likely to be persuaded otherwise, based on mainstream media reports. I’m also wondering whether the “ethics in journalism” argument is failing because its supporters do not use persuasive language. Rather, posts like “Actually, it’s about ethics” are not persuasive. So, maybe that argument is losing because (a) there’s a crap-ton of harassment happening that makes it wrong/irrelevant, and (b) even when it’s not harassing, the language used isn’t very persuasive.

My Measures

Lexical diversity was calculated using a standard type:token ratio (unique words divided by total words).

LD = wu/wt    (1)

I created a measure for powerful language using a similar ratio approach. First, I calculated the ratios of common hedges (e.g., “i feel like”, “probably”) and intensifiers (e.g., “really”), and I then used the inverse of the combined ratio (total hedges divided by total words) to measure the power of the language used.

P=1(wh/wt)    (2)

I used a ratio of high intensity markers (e.g., “very”, “strongly”) to low intensity markers (e.g., “poor”, “mildly”) to measure language intensity.

LI =ih/il    (3)

My Code

You’re right. This project is also a great excuse for me to learn more Python. If you’ve ever talked to me about code in person, you know how that makes me feel [not awesome]. But, here we are. I’m writing some utilities for automatically analyzing tweets, and the code is available on GitHub. That code assumes you used TwitterGoggles to collect and parse the tweets.


[1] D. Huffaker, “Dimensions of Leadership and Social Influence in Online Communities,” Human Communication Research, vol. 36, no. 4, pp. 593–617, Oct. 2010.

[2] J. J. Bradac, J. W. Bowers, and J. A. Courtright, “Three Language Variables in Communication Research: Intensity, Immediacy, and Diversity,” Human Communication Research, vol. 5, no. 3, pp. 257–269, 1979.

[3] J. J. Bradac, C. W. Konsky, and R. A. Davies, “Two Studies of the Effects of Linguistic Diversity Upon Judgments of Communicator Attributes and Message Effectiveness,” Communication Monographs, vol. 43, no. 1, pp. 70–79, Mar. 1976.

[4] S. V. Kasl and G. F. Mahl, “Relationship of disturbances and hesitations in spontaneous speech to anxiety,” Journal of Personality and Social Psychology, vol. 1, no. 5, pp. 425–433, 1965.

[5] J. A. Daly, “The Effects of Writing Apprehension on Message Encoding,” Journalism Quarterly, vol. 54, no. 3, pp. 566–572, Sep. 1977.

[6] T. Holtgraves and B. Lasky, “Linguistic Power and Persuasion,” Journal of Language and Social Psychology, vol. 18, no. 2, pp. 196–205, Jun. 1999.

[7] J. J. Bradac, M. R. Hemphill, and C. H. Tardy, “Language Style on Trial: Effects of ‘Powerful’ and ‘Powerless’ Speech Upon Judgments of Victims and Villains,” Western Journal of Speech Communication: WJSC, vol. 45, no. 4, pp. 327–341, Fall 1981.

[8] F. Crosby and L. Nyquist, “The Female Register: An Empirical Study of Lakoff’s Hypotheses,” Language in Society, vol. 6, no. 3, pp. pp. 313–322, 1977.

[9] V. Savicki, D. Lingenfelter, and M. Kelley, “Gender Language Style and Group Composition in Internet Discussion Groups,” Journal of Computer-Mediated Communication, vol. 2, no. 3, pp. 0–0, 1996.

[10] S. C. Herring, “Gender and power in online communication,” in The Handbook of Language and Gender, J. Holmes and M. Meyeroff, Eds. 2003, pp. 202–228.

[11] J. W. Bowers, “Language intensity, social introversion, and attitude change,” Speech Monographs, vol. 30, no. 4, pp. 345–352, Nov. 1963.

[12] M. A. Hamilton and J. E. Hunter, “The effect of language intensity on receiver evaluations of message, source, and topic,” Persuasion: Advances through meta-analysis, pp. 99–138, 1998.

[13] L. V. Jones and L. L. Thurstone, “The psychophysics of semantics: an experimental investigation,” Journal of Applied Psychology, vol. 39, no. 1, pp. 31–36, 1955.

[14] M. Burgoon and G. R. Miller, “Prior attitude and language intensity as predictors of message style and attitude change following counterattitudinal advocacy.,” Journal of Personality and Social Psychology, vol. 20, no. 2, p. 246, 1971.

[15] P. A. Andersen and T. R. Blackburn, “An Experimental Study of Language Intensity and Response Rate in E Mail Surveys,” Communication Reports, vol. 17, no. 2, pp. 73–82, Summer 2004.

[16] M. Burgoon, S. B. Jones, and D. Stewart, “Toward a Message-Centered Theory of Persuasion: Three Empirical Investigations of Language Intensity1,” Human Communication Research, vol. 1, no. 3, pp. 240–256, 1975.

So, what do you think ?

You must be logged in to post a comment.