I gave a talk at the DIMACS Workshop on Building Communities for Transforming Social Media Research Through New Approaches for Collecting, Analyzing, and Exploring Social Media Data at Rutgers University last week. Here are my slides and roughly what I said:
Many of today’s talks are about gathering big social data or automating its analysis. I’m going to focus instead on connecting disparate data sources to increase the impacts of social media research.
Most of my work is about public policy, civic action, and how social media plays a role in each. Today, I’ll talk mainly about a study of how Congress uses Twitter and how that use influences public discussion of policy.
I’m in a department of humanities but have degrees in information and work experience in web development. My position allows me to witness divides in how different disciplines think about data and research, and I’ve included a few comments in my talk about why those divides matter.
Often when we study social media we’re looking at trends or trying to generalize about or understand whole populations, but I’m interested in a specific subset of people, what they do online, and how that online activity influences the offline world. This focus allows me to connect what we know about people offline with what they do online quite reliably. For instance, I can connect data such as party affiliation, geolocation, gender, tenure in Congress, chamber, voting record, campaign contributions, the list goes on, because these values are known for members of Congress. Getting their data from Twitter is trickier. Govtrack does a nice job keeping track of official accounts, but politicians often have 2 or 3 accounts – for campaign messaging, for personal use – and there are many bogus accounts out there. Once you find their accounts, you can use a variety of tools to capture their Twitter data. I wrote my own in Python and MySQL because most other free tools focus on hashtags and the Search API, and those don’t return the Twitter data I need.
So, back to the study. There’s plenty of hype in the popular press about how politicians wield social media influence to impact policy. Look at how Obama used the internet to become President! I wondered whether these claims are true – was social media providing a new route to influence for members of Congress?
Turns out, not so much. The people positioned to exert the most influence online occupy similar positions of power offline.
How can I be sure? Network theory, especially social capital theory, provides tools for making judgments about the relative power of people as a function of their positions in a network. And luckily, I’m not the only one who thinks network theory is a good way to interrogate power in Congress. Other researchers have used network theory to analyze relationships among bill co-sponsors, roll call votes, congressional committees, and press events. I’ll focus just on cosponsorship because it’s the most widely used measure of legislative influence. I compared legislative influence with a person’s ability to control the spread of information.
To do so, I used Jim Fowler’s cosponsorship network data and his measure of influence – connectedness – which is a weighted closeness centrality measure for the network analysts among us.
My data is a network of mentions among members of Congress. I have a few hundred thousands tweets from 2008 – present in which Congress mention one another about 75,000 times. Every mention creates an explicit, directed link between two members, and these links form the network I’m interested in.
Fowler’s data is an undirected, implicit network in that members are connected through their affiliation with legislation. To use Fowler’s data, I needed a way to connect members of Congress on Twitter to members in his data, a sort of key, if you will. Keith Poole’s ICPSR ID is a widely used unique identifier for individual members of Congress (that Fowler also uses) so, I developed a mapping of Twitter ID to ICPSR ID.
So Fowler used bill cosponsorship networks to figure out who wields influence over what legislation gets addressed and eventually passed. Turns out members with high connectedness are more effective at convincing their peers to vote with them. I used the same algorithm to measure who wields influence over the spread of information online. Being able to spread information quickly allows politicians to control the conversation. We know from studies of framing, for instance, that the first frame is likely to get traction and essentially constrain future discussions of an issue. What I found was that people with legislative influence also control information. While this correlation isn’t necessarily causal in either direction, what matters is that the same people wield influence both online and off. I can tell you a little more about those people too, because I was able to connect online behavior data with off-line demographic and political data.
Members of Congress who control the conversation, or at least are in a position to, are male House Republicans. If you’re a female, Democrat, feminist scholar like me, that’s scary. What are some of the implications of that information control? Well, for starters, male House Republicans nearly never talk about issues facing marginalized groups such as pay inequality, discrimination, and poverty. Instead, the online conversation is about the ACA, gun control, and the debt. Whether you think we should be talking more about poverty or the ACA, I expect you’d agree that some diversity in both topics and talkers would be welcome. But that’s not what I’m seeing. Instead, I’m seeing male House Republicans controlling the spread of information and attention through the Congressional network and to all of its followers.
We’ve now taken a whirlwind tour through one of my studies of political communication on Twitter. What did we find that matters? First, social media data is most useful when we connect it with other data. Second, social media is not providing an alternate route to power for members of Congress. Third, maleness and Republicanness are the most reliable routes to influence online. From my view, these results paint a pretty bleak picture were social media doesn’t actually challenge the status quo, and groups that wield disproportionate, and often oppressive, influence offline do so online as well. This isn’t quite the democratizer or equalizer I was hoping for, but as I work to understand what’s happening among citizens, maybe I’ll see something different.
I’d rather not end on a depressing note, so let me end on a call to action instead. The technical expertise required to do this study – to collect Fowler’s data, to collect Twitter data, to do the statistical and network analyses – may seem second nature to many of us here, but they are not to the people best equipped to interrogate this data. Let’s not measure of impact by the size of our dataset or the lines of code we had to write to get it. For instance, I have a colleague in political science at IIT who knows much more than I do about legislative influence and political communication. He shouldn’t also be asked to learn R and Python to contribute to the discussion about social media’s role in influencing public policy discussions. I hope we can remove, or at least diminish, the technical barriers for subject matter experts and scientists of other stripes to use [big] [open] [social] data and that we can change graduate education to train students in both social theory and technical tools.
Access Fowler’s papers and data: http://jhfowler.ucsd.edu/cosponsorship.htm
Access Poole’s ICPSR ID data and information: http://www.voteview.com/icpsr.htm