Qualitatively Coding Tweets

In studying politicians on Twitter, one of my goals is to understand what they’re talking about. The trouble is, tweets are incredibly difficult to code. Researchers at Maryland claimed success with a coding scheme for Congress’ tweets, but my colleagues, students, and I were never able to reach acceptable inter-rater reliability using their scheme (see our new scheme after the jump). We tried a few times, even met to discuss and adjust disagreements, and now I’m suspicious about the reliability of Golbeck’s scheme. The authors don’t provide their kappas, just percent agreement. The problem there is that percent agreement isn’t a good measure of reliability. Especially when the categories are numerous, broad, or incredibly narrow, high percent agreement can be misleading. Matthew Lombard has an excellent guide to interrater reliability where you can learn more.

Our Process

We used three rounds of coding to develop a robust coding scheme for the action taken in tweets. The resulting scheme used six codes – narrating, positioning, directing to information, requesting action, giving thanks, and other – to categorize the kind of action taken in a tweet. Codes were not mutually exclusive meaning a tweet could be coded as exhibiting more than one action. For example, “With massive debt, why are taxpayers funding wine tasting? Washington’s spending addiction continues http://t.co/2QaYJmo” a tweet from Jim DeMint, was coded as both positioning and directing to information. We calculated Cohen’s kappa scores for each code and found very strong agreement between coders. The code definitions, examples, and kappas are in the table below. Positioning and directing to information were by far the most common actions exhibited on Twitter. Most of the differences between our results and Golbeck et al.’s lie in our distinctions between positioning statements and information statements.

Code Definition Example N Cohen’s kappa
Narrating Telling a story about their day, describing activities headed up to the Fox News camera for an interview (Ron Paul) 173 0.83
Positioning Situating one’s self in relation to another politician or political issue, may be implied rather than explicit A9: Theoretically, not realistically. HC spending is growing 4x inflation and driving our debt. Let’s tackle the real threat. #ryanttv (Paul Ryan) 405 0.87
Directing to information Pointing to a resource URL, telling you where you can get more info Harkin Announces More Than $300,000 for Housing in Tama County http://1.usa.gov/lf6Aem (Tom Harkin) 465 0.70
Requesting action Explicitly telling followers to go do something online or in person (not just visiting a link but asking them to do something like sign a petition, apply, vote) – look for action verbs RSVP to my Immigration Forum with Rep. Luis Gutierrez this Saturday in Brooklyn http://t.co/qTcWugs (Yvette Clark) 15 0.70
Thanking Says nice things about or thanks someone else, e.g. congratulations, compliments @rmartindc Thanks. MoC’s handwriting is probably on par with M.D.’s. Glad I could make your job easier. (John Shimkus) 57 0.90
Other Doesn’t fit in any other Action category, or one can’t tell what they’re doing @jfor441 Will do! (Jason Chaffetz) 20

What’s Next

We have a couple working papers about the results of our action coding; please email me if you’d like to read them. Next, we’re coding for the manner in tweets in order to understand the tones tweeters use and whether they relate to other aspects of the tweeters’ communication or offline behaviors.


    • libbyh |

      These codes are for a subset of tweets posted by members of the House and Senate during summer 2011. I have larger datasets that we’re now using to test our coding scheme. We’re also exploring machine algorithms for clustering and classifying, but so far, the tweets have proven too short for the available algorithms.

