:::: MENU ::::
Posts tagged with: gamergate

Analyzing Language for Persuasion Markers in Twitter Discussions

In studying the #GamerGate discussion(s) on Twitter, I’m using a variety of theories including persuasion, community action, and paranoid style in politics. I could use some help making sense of what I’m seeing, so I’ll be blogging as I go. Please contact me or use the comment functions if you have ideas.

First up, language and social influence. Why this approach? The argument about what #GamerGate is – a discussion about ethics in game journalism or a coordinated harassment effort – could be explored in part by examining the language participants use. I’m interested especially in whether users are persuasive in their languages – can they convince readers that the discussion is about what they claim it’s about? [The mainstream press says, “no“, “no“, “no“, you get the idea.] I’m not advocating for that argument. As you can see from the links in this paragraph, it’s been resolved. I’m curious, rather, about whether the language used by tweeters indicates an argument was even happening, and I’m using the presence/absence of persuasion markers to figure that out.

Some Background Research

A number of language features are connected to social influence, especially in online communication [1]:

  • lexical diversity
  • powerful language
  • language intensity

Lexical diversity refers to a class of measures of the range of vocabulary a writer uses. Often, lexical diversity is measured using a type-token ratio (number of unique words [types] divided by total number of words [tokens]) [2]. Low linguistic diversity leads to low evaluations from readers [3] and “negatively impacts credibility and influence” [1, p. 598]. A number of social factors have been shown to impact lexical diversity including anxiety [4] and writing apprehension [5], both of which produced writing with low lexical diversity.

Powerful language is defined by what it’s missing, namely linguistic features such as tag questions (e.g. “isn’t it?”), hedges (e.g., “sort of”, “kinda”), hesitations (e.g., “um”), fragments, and intensifiers (e.g., “really”) [6]. Writers who use powerful language are perceived as more competent and authoritative [7], and their arguments are judged as more persuasive [6]. Several studies have found that women tend to use less powerful language style than men [8, 910].

Many researchers use Bowers’ [11, p. 345] definition of language intensity as “the quality of language which indicates the degree to which the speaker’s attitude toward a concept deviates from neutrality.” Intensity is often conveyed through emotionality [12] and is measured using some variety of a scale in which words are labeled according to their intensity. Some popular scales include Jones and Thurstone [13] and Burgoon and Miller [14]. Intense language is associated with persuasion [12], resistance to persuasion [14], perceived credibility [12], attitude-behavior consistency [15]. Receivers have been shown to tolerate more intense messages from men than from women [16].

My Expectations

Based on this research about the relationships between linguistic style and social influence, I expect influential tweeters to have high lexical diversity and use powerful, intense language. By influential, I mean those tweeters who are able to control the message. I’m thinking retweets are decent proxies for persuasion and influence – I assume people RT persuasive arguments/authors. I also expect those most committed to the ethics in gaming argument to use the most intense language – they seem the least likely to be persuaded otherwise, based on mainstream media reports. I’m also wondering whether the “ethics in journalism” argument is failing because its supporters do not use persuasive language. Rather, posts like “Actually, it’s about ethics” are not persuasive. So, maybe that argument is losing because (a) there’s a crap-ton of harassment happening that makes it wrong/irrelevant, and (b) even when it’s not harassing, the language used isn’t very persuasive.

My Measures

Lexical diversity was calculated using a standard type:token ratio (unique words divided by total words).

LD = wu/wt    (1)

I created a measure for powerful language using a similar ratio approach. First, I calculated the ratios of common hedges (e.g., “i feel like”, “probably”) and intensifiers (e.g., “really”), and I then used the inverse of the combined ratio (total hedges divided by total words) to measure the power of the language used.

P=1(wh/wt)    (2)

I used a ratio of high intensity markers (e.g., “very”, “strongly”) to low intensity markers (e.g., “poor”, “mildly”) to measure language intensity.

LI =ih/il    (3)

My Code

You’re right. This project is also a great excuse for me to learn more Python. If you’ve ever talked to me about code in person, you know how that makes me feel [not awesome]. But, here we are. I’m writing some utilities for automatically analyzing tweets, and the code is available on GitHub. That code assumes you used TwitterGoggles to collect and parse the tweets.


[1] D. Huffaker, “Dimensions of Leadership and Social Influence in Online Communities,” Human Communication Research, vol. 36, no. 4, pp. 593–617, Oct. 2010.

[2] J. J. Bradac, J. W. Bowers, and J. A. Courtright, “Three Language Variables in Communication Research: Intensity, Immediacy, and Diversity,” Human Communication Research, vol. 5, no. 3, pp. 257–269, 1979.

[3] J. J. Bradac, C. W. Konsky, and R. A. Davies, “Two Studies of the Effects of Linguistic Diversity Upon Judgments of Communicator Attributes and Message Effectiveness,” Communication Monographs, vol. 43, no. 1, pp. 70–79, Mar. 1976.

[4] S. V. Kasl and G. F. Mahl, “Relationship of disturbances and hesitations in spontaneous speech to anxiety,” Journal of Personality and Social Psychology, vol. 1, no. 5, pp. 425–433, 1965.

[5] J. A. Daly, “The Effects of Writing Apprehension on Message Encoding,” Journalism Quarterly, vol. 54, no. 3, pp. 566–572, Sep. 1977.

[6] T. Holtgraves and B. Lasky, “Linguistic Power and Persuasion,” Journal of Language and Social Psychology, vol. 18, no. 2, pp. 196–205, Jun. 1999.

[7] J. J. Bradac, M. R. Hemphill, and C. H. Tardy, “Language Style on Trial: Effects of ‘Powerful’ and ‘Powerless’ Speech Upon Judgments of Victims and Villains,” Western Journal of Speech Communication: WJSC, vol. 45, no. 4, pp. 327–341, Fall 1981.

[8] F. Crosby and L. Nyquist, “The Female Register: An Empirical Study of Lakoff’s Hypotheses,” Language in Society, vol. 6, no. 3, pp. pp. 313–322, 1977.

[9] V. Savicki, D. Lingenfelter, and M. Kelley, “Gender Language Style and Group Composition in Internet Discussion Groups,” Journal of Computer-Mediated Communication, vol. 2, no. 3, pp. 0–0, 1996.

[10] S. C. Herring, “Gender and power in online communication,” in The Handbook of Language and Gender, J. Holmes and M. Meyeroff, Eds. 2003, pp. 202–228.

[11] J. W. Bowers, “Language intensity, social introversion, and attitude change,” Speech Monographs, vol. 30, no. 4, pp. 345–352, Nov. 1963.

[12] M. A. Hamilton and J. E. Hunter, “The effect of language intensity on receiver evaluations of message, source, and topic,” Persuasion: Advances through meta-analysis, pp. 99–138, 1998.

[13] L. V. Jones and L. L. Thurstone, “The psychophysics of semantics: an experimental investigation,” Journal of Applied Psychology, vol. 39, no. 1, pp. 31–36, 1955.

[14] M. Burgoon and G. R. Miller, “Prior attitude and language intensity as predictors of message style and attitude change following counterattitudinal advocacy.,” Journal of Personality and Social Psychology, vol. 20, no. 2, p. 246, 1971.

[15] P. A. Andersen and T. R. Blackburn, “An Experimental Study of Language Intensity and Response Rate in E Mail Surveys,” Communication Reports, vol. 17, no. 2, pp. 73–82, Summer 2004.

[16] M. Burgoon, S. B. Jones, and D. Stewart, “Toward a Message-Centered Theory of Persuasion: Three Empirical Investigations of Language Intensity1,” Human Communication Research, vol. 1, no. 3, pp. 240–256, 1975.

#GamerGate vs #StopGamerGate2014 By the Numbers – 10/20 edition

Edited on 10/20: Added info about specific users, more numbers.

Carly Kocurek, one of my smart and savvy IIT colleagues, pointed out that the #GamerGate and #StopGamerGate2014 discussions on Twitter are worth examining. So, I fired up a TwitterGoggles instance to track those hastags and these others she recommended:


I saw @Gaming_Sparrow‘s tweet comparing the popularity of the two main hashtags. @ybika asked for some response, so I ran a couple quick queries on the data I’d collected and found totally different numbers.

From 10/17/2014 – 10/20/2014, I see


33,039 users

278,548 tweets



6,303 users

16,099 tweets

A few things could be happening:

  • Keyhole may be using a case-sensitive search, and mine is case-insensitive
  • Keyhole and I are getting different data from Twitter
  • #GamerGate has gained popularity and is now used by every side of the argument

No more time to process this today, but I’ll come back to it. What do you think is going on?

More info added later on 10/20:

I’ve seen a couple other tweets or posts about the number of users and the distribution of #gamergate tweets (e.g., Waxpancake: 100 people posts 24% of the tweets). It’s difficult to compare my data to his/theirs because I don’t know how Keyhole and Waxpancake are collecting their data. I contribute to TwitterGoggles on GitHub and know it much better. Of course, it still relies on the Twitter Search API, so there’s lots I don’t know about what’s not in my data. Anyway, here are some things I noticed while looking at my data.

#gamergate dominates other tags

Here’s a quick graph I made using Tableau. In this chart, the x-axis represents time where each bar is an hour, and the y-axis represents the number of tweets posted. The colors of the stacked bars map to the hashtags that appear in the tweet: blue for #gamergate only, orange for #stopgamergate2014 only, and green for tweets with both tags. This graph isn’t designed to make detailed comparisons easy – it’s just to show how incredibly popular #gamergate is compared to other tags. I also found it interesting that tweets contain both tags since they’re mostly at odds. Of course, one of the tweets with both is my own because I wanted it to show up in both conversations. Though, I may regret posting at all. Isn’t that the problem?

@mfreema55 asked me to post a higher-res image and explain the time info. So, here you go. The hours are GMT – so the graph says when people in the U.S. get off work, they start tweeting about this stuff.

Some of the most active voices change their names

Twitter assigns accounts unique user id’s, but users can change their full names if they’d like. A few accounts in the #gamergate conversation (I use the term broadly to refer to all the data associated with the hashtags above) have changed their names while tweeting. For instance, @nahalennia changed zir* name from “You Didn’t Listen” (160 tweets) to “The Future You Choose” (590 tweets) at some point in the last 3 days. So did @PsychokineticEX. Zir changed names from ADMIRALOF#GAMERGATE (528 tweets) to THE ADMIRAL (174 tweets). Both accounts are among the top 25 most active.

Users can also change their handles (the part after the @), but that seems far less common in this group. User #2815636153 is an interesting exception. Zir used names “and_next_name,” “my_next_name,” “need_next_name,” “the_next_name,” “their_next_name,” and “your_next_name” this weekend.

Skewed distribution of tweets/user

Like much of online activity, a few people are responsible for most of the content. This isn’t the most skewed distribution I’ve ever seen, but it’s definitely skewed. Or, it has a long tail. Depends on how you look at it. I haven’t normalized this (for anything, including how many tweets this account usually post), but that would be interesting too. I.e., maybe @SomeKindaBoogin just tweets constantly, so it’s not suprising that zir tweeted in this conversation a lot. Again, this graph isn’t about details. It’s unreadable at that level because I wanted to show you how incredibly long this tail is. Even if just a few people are incredibly active, there are still thousands of people engaging at some level. That’s exciting.

Tweets per user


* I’m using gender-neutral pronouns since I don’t know who these accounts belong to, whether they are owned by a person or a group, and since it makes sense to use gender-neutral pronouns when talking about harassment and safety.