:::: MENU ::::
Browsing posts in: Research

Paper, Panel, and Workshop at CSCW 2014

I’ll be jumping back into work next semester. What better way to kick off my return than a trip to CSCW 2014 in Baltimore?! I’m organizing the Feminism and Social Media Research Workshop on Sunday, participating in the panel The Ethos and Pragmatics of Data Sharing, and presenting a paper called Tweet Acts: How Constituents Lobby Congress via Twitter. I’ll post specifics for the panel and paper when the program’s available. In the meantime, apply to join us for the workshop! No position paper required, just a short abstract about your work and a couple questions about your interests in feminism and social media research.

Abstract for Tweet Acts: How Constituents Lobby Congress via Twitter

Twitter is increasingly becoming a medium through which constituents can lobby their elected representatives in Congress about issues that matter to them. Past research has focused on how citizens communicate with each other or how members of Congress (MOCs) use social media in general; our research examines how citizens communicate with MOCs. We contribute to existing literature through the careful examination of hundreds of citizen-authored tweets and the development of a categorization scheme to describe common strategies of lobbying on Twitter. Our findings show that contrary to past research that assumed citizens used Twitter to merely shout out their opinions on issues, citizens utilize a variety of sophisticated techniques to impact political outcomes.

Get the PDF from IIT’s institutional repository

Scale and the Analysis of Large Text Databases

I gave a talk Friday as part of IIT’s Social Networks and Innovation conference, and here are the slides:

Basically, my talk was an overview of three projects that analyze subsets of my big Twitter data. You can Framing in Social Media now, and papers from the others are waiting for publishers’ OK to share. In each project, I analyzed 10’s or 100’s of thousands of tweets.

In the mentioning networks paper, I examined the social network that results from members of Congress mentioning one another on Twitter to see if that network is structurally different from other Congressional networks like roll call votes or shared press releases. The structure of the network would determine who has influence, and I determined that influence offline is the best predictor of influence online. Sadly, social media is not a new route to influence.

The framing paper also analyzes Congress’ tweets, this time to see whether and how Congress frames political issues when they don’t have to go through traditional media to get their message out. We found that Congress does frame when talking directly to the public, especially about issues such as energy policy, gender equality, and healthcare. We created a measure of polarization based on framing efforts that, when combined with DW-Nominate voting data, gives a more nuanced and complete picture of political polarization in the 113th Congress.

The last project analyzes tweets aimed at Congress, authored by citizens. In this project, we investigated whether citizens use social media to try to impact political outcomes, and to understand how they do so. We found 16 different strategies citizens employ to lobby their representatives and grouped those into 5 different speech acts. It turns out we do more than shout and moan on social media; we do actually try to understand where our representatives stand and to get them to change their minds about issues that are important to us.

Setting up an EC2 instance for TwitterGoggles

TwitterGoggles requires Python 3.3. I’m new to Python, and 3.3 is (relatively) new to everyone. So, getting help is both necessary and challenging. I want to run TwitterGoggles on Amazon EC2 instances, so I’m setting up an AMI that has all of the requirements:

  • gcc 4.6.3
  • git
  • mlocate 0.22.2
  • MySQL 5.5
  • Python 3.3

I started with an Amazon Linux AMI and installed the stuff I needed. You can save yourself some trouble by launching an instance with my AMI: ami-e73b558e.

Install Dependencies

  1. Update the system
    sudo yum update
  2. Install C compiler so we can install Python
    sudo yum install gcc
  3. Install software yum can take care of for us
    sudo yum groupinstall "Development tools"
    sudo yum install -y mysql git mlocate
  4. Update the DB locate uses to find your stuff
    sudo updatedb

Install Python 3.3.1

Here’s the best guide: http://www.unixmen.com/howto-install-python-3-x-in-ubuntu-debian-fedora-centos/

Basically you have to

  1. Download the release you want. In my case
    wget http://www.python.org/ftp/python/3.3.1/Python-3.3.1.tgz
  2. Extract the compressed files and switch the directory
    gunzip Python-3.3.1.tgz
    tar xf Python-3.3.1.tar
    cd Python-3.3.1
  3. Configure, compile, and install
    sudo ./configure --prefix=/opt/python3
    sudo make
    sudo make install
  4. Add python3 to your path
    export PATH=$PATH:/opt/python3/bin

Install easy_install-3.3

I ran into some problems related to a missing “zlib” errors. I reinstalled zlib from source, then reconfigured and reinstalled Python 3.3.1. Once that worked, I was able to install and use easy_install-3.3 for module management.

wget http://pypi.python.org/packages/source/d/distribute/distribute-0.6.39.tar.gz
tar xf distribute-0.6.39.tar.gz
cd distribute-0.6.39
sudo python3 setup.py install

Two Python scripts for gathering Twitter data

Anyone who has talked to me about my research in the last year and a half knows I’m constantly frustrated by the challenges of capturing and storing Twitter data (not to mention sharing – that’s another blog post). I hired a couple of undergrads to help me write scripts to automatically collect data and store it in a relational MySQL database where I can actually use it. We chose to use the streaming API because we limit data by person rather than by content. The Twitter Search API can handle only about 10 names at a time in the “from” or “mentions” query parameters. Since we’re studying over 1500 people, we’d have to run 150 different searches to get data for everyone. Using the Streaming API has its problems too – most notably that any time the script fails, we miss some data.

Below, I provide some info and links to two different scripts for collecting data from Twitter. Both are written in Python. One uses the Streaming API and one uses the Search API. Depending on your needs, one will be better than the other. The two store data slightly differently as well. They both parse tweets into relational MySQL databases, but the structure of those databases differs. You’ll have to decide which API gets you the data you need and how you want your data stored.

Both options come with all the caveats of open-source software developed within academia. We can’t provide much support, and the software will probably have bugs. Both scripts are still in development though, so chances are your issue will get addressed (or at least noticed) if you add it to the Issues on GitHub. If you know Python and MySQL and are comfortable setting and managing cron jobs and maybe shell scripts, you should be able to get one or both of them to work for you.

Option 1: pyTwitterCollector and the Streaming API

When to use this option:

  • You want to collect data from Twitter Lists (e.g., Senators in the 113th Congress)
  • You want data from large groups of specific users
  • You want data in real-time and aren’t worried about the past
  • You need to run Python 2.7
  • You want to cache the raw JSON to re-parse later

What to watch out for:

  • Twitter allows only one standing connection per IP so running multiple collectors is complicated
  • You need to anticipate events since the script doesn’t look back in time

Originally written in my lab, pyTwitterCollector uses the streaming API to capture tweets in real time. You can get the pyTwitterCollector code from GitHub.

Option 2: TwitterGoggles and the Search API

When to use this option:

  • You want data about specific terms (e.g., Obamacare)
  • You want data from before the script starts (how far back you can go changes on Twitter’s whim)
  • You can run Python 3.3

What to watch out for

  • Complex queries may need to be broken into more than one job (what counts at complicated is up to Twitter – if it’s too complicated, the search just fails with no feedback)

Originally written by Phil Maconi and Sean Goggins, TwitterGoggles uses the search API to gather previously posted tweets. You can get the TwitterGoggles code from GitHub.


CSCW paper and poster about Congress on Twitter

My colleagues and I will present a paper and a poster at CSCW 2013 in San Antonio in February. Both submissions are based on data we collected from Twitter around politicians and their use of social media.

What’s Congress Doing on Twitter? (paper)

With Jahna Otterbacher and Matt Shapiro, this paper reports our first summary stats about who’s using Twitter and what they’re accomplishing. Using data from 380 members of Congress’ Twitter activity during the winter of 2012, we found that officials frequently use Twitter to advertise their political positions and to provide information but rarely to request political action from their constituents or to recognize the good work of others. We highlight a number of differences in communication frequency between men and women, Senators and Representatives, Republicans and Democrats. We provide groundwork for future research examining the behavior of public officials online and testing the predictive power of officials’ social media behavior.

Read the paper

Read my guest post at Follow the Crowd about the paper

“I’d Have to Vote Against You”: Issue Campaigning via Twitter (poster)

With Andrew Roback, one of my great graduate students, this poster focuses on the citizen side of the Twitter conversation. Specifically, using tweets posted with #SOPA and #PIPA hashtags and directed at members of Congress, we identify six strategies constituents employ when using Twitter to lobby their elected officials. In contrast to earlier research, we found that constituents do use Twitter to try to engage their officials and not just as a “soapbox” to express their opinions.

Read the Extended Abstract

Calculating Geometric Mean in NodeXL

My social networks reading group read De Choudury et al’s WWW’10 paper about inferring social networks from email (citation’s below) this week, and I was inspired by our discussion to calculate geometric means for Twitter mentions. You can read elsewhere about the public officials and social media project, but basically I have a bunch of data from Twitter including how often members of Congress mention one another. DeChoudury and colleagues point out that using a binary decision to evaluate the weight of edges (reciprocated or not) doesn’t make much sense when you’re talking about personal communication. They use an alternative weight they call “geometric mean” that I thought might be useful for Twitter mentions. Their equation for geometric mean is

wsij = sqrt[(wij*wji)]

Where two individuals and j exchange messages. wij is the number of messages from i to j and wji is the number of messages from j to i. In my data, these are mentions. My data is already in a NodeXL workbook, and I’ve already collapsed duplicate edges to create an “Edge Weight” column. Here’s what my data look like now:

To calculate the geometric mean, I used a helper column “MergedVertices” with the formula

=[@[Vertex 1]]&[@[Vertex 2]].

Then the formula for “Geometric Mean” is:

=SQRT(SUMIF([MergedVertices],[@[Vertex 1]]&[@[Vertex 2]],[Edge Weight])*(SUMIF([MergedVertices],[@[Vertex 2]]&[@[Vertex 1]],[Edge Weight])))

That solution is courtesy of Sean Cheshire who answered my StackOverflow question. Thanks, Sean!

Citation: De Choudhury, M., Mason, W. A., Hofman, J. M., & Watts, D. J. (2010). Inferring relevant social networks from interpersonal communication. Proceedings of the 19th international conference on World wide web – WWW ’10 (p. 301). New York, New York, USA: ACM Press. Retrieved from http://dl.acm.org/citation.cfm?id=1772690.1772722

RE: Social Media Is Bullshit

BJ Mendelson wrote a book. He called it Social Media is Bullshit so people would buy it and those who didn’t would get his point anyway. This isn’t a review of the book. You can read lots of those. Instead, this is a note-style post about stuff from BJ’s book that come up in my research (and teaching). When I quote the book, I’m quoting the Kindle Edition and giving you Kindle locations in (): Mendelson, B.J. (2012-09-04). Social Media Is Bullshit . Macmillan. Kindle Edition. If you make it to the end of the post, you’ll get a little insight into my own work. This is a draft post and may change when I have time to edit.

Continue Reading

#drought12 and #climategate Twitter Explorers

My brother sent me an interesting story from NPR about how farmers are using social media to keep track of what’s happening to others’ crops, and it got me thinking about whether I could get some sort of data explorer running on the fly when a new hashtag pops up. For the Public Officials and Social Media project we’re gathering so much data every day that it’s hard to get a sense of it overall. We’re always looking for easier ways to mine that data and to get a quick sense of what’s happening. Martin Hawksey has created TAGSExplorer, a set of free tools based on Google Spreadsheets, that lets you do just that. In about 5 minutes, I set up spreadsheets for #drought12 and #climategate and made publicly accessible explorers for each collection of tweets. Check them out:

drought12 Explorer

Screenshot of the #drought12 explorer

Congress Hashtag Networks

This morning, I led an hour of WebShop 2012. At the beginning of the talk, I asked the audience, especially students, to brainstorm questions about public officials and Twitter, specifically. You can see the list we generated as a Google doc. Many of those questions my colleagues and I are already investigating, but like I said this morning, we can’t do it all. That list alone will keep me busy for a long, long time. One question that did come up both last night and this morning was about how polarized or divided the conversation is. So, last night, I processed some hashtag data  from my public officials on social media project so we could investigate.

Towards the end of the talk, I used that hashtag data as a live demo to show my work process and luckily, Bernie Hogan, Marc Smith, Ben Schneiderman, and others jumped in with ideas for this very graph. Without their input, it would have been hard for me to demonstrate how social and contentious analyzing this data is.

U.S. Congress Shared Hashtag Network

Network of members of Congress based on their usage of the same Twitter hashtags between 12/22/2011 – 03/15/2012

To get this graph, I downloaded the raw JSON of tweets by members of Congress from Twitter’s streaming API, parsed it into a normalized MySQL database, and then queried to pull just hashtags and the users who posted them. I used UCINet to transform that two-mode tweeter-hashtag network into a one-mode tweeter network and imported that one-mode network into NodeXL. That means each tie exists because those two tweeters used the same hashtag. I used some existing data about the followers, friends, tweets, chamber of Congress, political party, sex, and state they represent as attributes for each member of Congress. The timeframe for these tweets is December 22, 2011 – March 15, 2012.

Marc predicted that we’d see a partisan divide. What do you think? Take to the comments to tell me what you see (or what you’d like to see in a new graph)!

You can see a bigger, better version of this graph, one with labels, and their metadata at NodeXL Graph Gallery: graph without labels, graph with labels


Summer of NSF Workshops

I’ve been lucky enough this summer to be invited to two NSF workshops. The first, the Consortium for the Science of Sociotechnical Systems (CSST) Summer Institute wrapped up last Thursday and was an incredible experience. CSST’s summer institute for doctoral students and pre-tenure faculty covers a range of topics from community-building to getting tenure to balancing work and life. I’m thankful to the organizers, mentors, and participants for all their hard work and advice. I feel much more prepared to face the next steps in my career, and I’m thrilled to have been included.

In a couple weeks, I’ll head to Maryland to be a speaker in Summer Social Webshop 2012 (Webshop). Webshop is a workshop for graduate students around technology-mediated social participation. I’ll be talking about my work on elected officials’ social media use. See the schedule for a list of other speakers. Looks like it will be a great workshop!

Thanks, National Science Foundation, for supporting these kinds of intensive community- and expertise-building workshops.