Background, Design, and Motivation
On May 10, 2018, Democrats on the United States House Intelligence Committee released
3500+ Facebook and Instagram ads created by the Internet Research Agency (IRA) between 2015 and 2017. The IRA is believed to have created these ads to influence the outcome of the 2016 United States presidential election, and in general influence Americans' political views.
Probably the worst advertisement from the IRA dataset.
I first read articles about this dataset the day after it was released, and I was intrigued. The story was straight-forwardly interesting. To me, the ads featured on sites like Wired
and the Washington Post
were a look into how non-Americans perceived America. The ads featured in these news reports were direct appeals to wildly different sectors of the American populace — conservative, liberal, LGBTQ, gun-owners, Native Americans, incarcerated — and had a bluntness in their messaging that other American political ads I saw lacked.
I thought of the ad release as a great sociological dataset — published results from an unethical experiment designed to maximize American advertising response. This dataset helped to answer questions like "what ideas, in our present political environment, drive different groups of Americans to share ideas with their peers?". While limited, it could also answer questions like, "to what extent, if any, did these ads result in concrete changes to American's political views, or the American political system?". I am sure there are dozens of other questions this dataset could answer that would not have even occurred to me.
Unfortunately for anyone interested in answering these questions, the US House Intelligence Committee and/or Facebook made the data about as difficult to analyze as possible. Each ad was contained in an individual PDF file with unorganized images, free text, and sometimes incomplete or absent metadata. While I did not (and do not) have the time to actually study these ads full-time, I felt bad that the people who did would have to waste their time cleaning up this data. I make terribly messy datasets usable for a living these days, so I thought that maybe I could do this data preparation work, and then make that data public for others who had the inclination to dive deeper. I also strongly believe that interactive
and "fun" visualizations can make trends in data pop out to users, so I wanted to try and stretch some of my web development skills on a data explorer for the ads.
Example PDF released by Congress for the IRA ad dataset.
After a month or so of nights after work, the result was the Russian Ad Explorer
, and the accompanying Russian Ads dataset hosted on Github
. The dataset contains the images extracted into .png files, and text/metadata extracted into JSON format and .csv files. Additionally, because the provided audience tags were for the most part hyper-specific and non-overlapping, I editorialized and hand-labeled audience tags like "Incarcerated", "Latinx", and "Above Age 30" (this process is definitely subject to error). I then made a data explorer in d3.js
based on this data, so that others could easily page through the data and get a sense of its significance.
The experience ended up being a fun, and in my view, pretty successful project. Probably the hardest part was writing this blog post. I originally wanted to make the post that was a grand analysis of the dataset — where I would tell you all what it all really
meant. But mostly, I have no idea. This dataset is a small piece or a starting point in what certainly must be a much larger project. This project would really dig in to what the aims were of the IRA, to what extent their were successful, and what their successes can tell us about political communication in the sphere of social media. Hopefully, some other people see this project, and feel motivated to research further (at least one person has!
). If one of those people is you, send me a message!
All that aside though, I didn't want to leave you with nothing. It's hard to pour over a dataset for a month and not have a few takeaways. Check below for a few of my biggest ones, listed in bullet-point format.
Not Just Conservatives, Because Progressive Get Clicks
When news first hit of Russians using Facebook and Instagram ads to politically influence Americans, many people that I talked to assumed that they would be targeting the stereotypical conservative voter: an older internet novice that is easily swept up by conspiratorial thinking. Even when evidence of IRA ads clearly appealing to black voters were published, at least some assumed that these ads must have been shown to conservative white voters, in order to make them feel reactionary anger towards the concerns of Black voters.
Both of these viewpoints are incorrect in the IRA data release. Only 18% of the ads appealed towards viewers with what I labeled as obviously conservative interests (Gun Rights, Police, Patriotism, Anti-Immigrant, Christianity, Army, Texas, The South, Conservative), compared to 64% of ads targeted towards obviously progressive interests (Progressive, African American, Islam, Prison, Native American, Latinx, LGBTQ+). Furthermore, ads aimed towards conservatives performed worse in average number of clicks per ad (959 vs 1,290 clicks), despite the IRA paying more money for conservative ads ($51 vs $21). Digging deeper, we can see that even though Progressive and Conservative ads were, on average, shown
to the same amount of people (~13,000 for both), the percentage of people who clicked
after viewing was higher for Progressive ads (9% vs. 6%).
The IRA purchased many more ads aimed at progressives, but paid more for the ads they aimed at conservatives.
There is some reason to be skeptical of these statistics. As described earlier, I base my analysis on qualitative "interest categories" that I personally have determined. The IRA's original interest targets numbered in the hundreds, and were not specific enough to facilitate this kind of analysis. In a process detailed in the about page of the Explorer
, I aggregrated them into broader categories, but mistakes could have been made. The more important reason to think harder about these statistics is that the distribution of ad clicks, impressions, and cost is more exponential than normal. A few extremely successful, or extremely expensive ads, could have skewed those numbers either way. But, this is at least a start ¯\_(ツ)_/¯.
Progressive ads were better at converting ad viewers ("impressions") into user interactions ("clicks").
As far as I can tell with the data explorer, there were no ads highlighting black empowerment aimed towards conservative voters. The IRA targeted all sides of the political spectrum, and some of the most creative ads come from campaigns aimed towards Black people and queer people [link
The Data We Don't Have
If we accept that one of the IRA's main goals was to elect Trump, then what were the aims of these ads appealing to progressive groups? Some think that these ads were also created to increase polarization in American politics by creating echo chambers and otherwise decreasing empathy for the other "side" — I cover my own view on this idea in the "Sowing Division" section. However, I think that most compelling theory motivating these ads is that the IRA was simply trying everything and anything they could to attract as many likes and subscribers as possible. Conservative, progressive, it didn't really matter, as long as the IRA were able to increase their audience for further efforts.
This leads to a logical next question: after the likes and subscribes, what happened next? We can make a few guesses. One assumes that workers in the IRA were posting Facebook comments on all these posts, possibly of a political nature, and that the groups responsible for these posts themselves had non-sponsored posts, comments, events, and group descriptions that could all lend insight into what the IRA's true aims were. That data could also tell us how successful those aims were. Congress says they may release some of this data soon, but have not given an indication of when.
Memes — Not Even Political Memes, Just Memes
Meme images had an 11.5% conversion rate (the rate at which ad views turned into ad clicks), best for 3rd of all the ad categories I specified.
Most people who spend some time with the Russian Ad Explorer soon stumble upon the IRA's meme posts, which were shown mostly from April to July 2016. Most have almost no political content (at least one exception: link
), and some are actually kind of funny (sue me: link
). Many people who used the explorer assumed that these memes were honey-traps for pro-Trump pages — that, at some point down the line, Memeopolis would start advocating for Trump. This line of thinking is backed up somewhat by my cost-efficiency statistics, which show that Meme ads were among the fourth most cost-effective ads for viewer clicks.
As I described in the "Data We Don't Have" section, this claim is difficult to immediately verify with this dataset. For example, Memeopolis, the IRA's main meme-sharing page, never released a directly pro-Trump advertisement despite the success of its meme advertisements. If it was re-directing users to pro-Trump pages or viewpoints, it did so via non-advertised posts on the Memeopolis page itself, or otherwise through Facebook comments or other user behaviors. Congress says it will release such data at some point
, but until then it is hard to know what to make of Memeopolis.
Vote for Bernie, or Jill Stein
I have assumed that the IRA worked to get Trump elected, but they also worked at times to get Bernie the primary nomination over Hillary, and in rare instances even suggested voting for Jill Stein. Notably, however, they never put in one positive word for Hillary, despite a wealth of ads aimed at conservatives, except for one time
Someone at some point, made an ad called "Muslims for Hillary" aimed towards Progressives and "The Muslim Brotherhood", a frequent target audience that I assume was a mistake based in misunderstanding of American culture. One assumes that this ad was meant to scare people
who are progressive but fear Muslim people
. But what was most notable to me was that despite the wide reporting that one of the IRA's principle goals was to sow chaos, and that this strategy often included appealing to progressive voters, they only once created an ad that said anything positive about Hillary. They had their eye on the prize: getting Donald Trump elected.
Post-Election Goals of the IRA
The IRA continued to function after the election of Trump, which many presumed was their primary goal. So what goals did they chase afterwards? Here's are some of the categories I have noticed:
- A new campaign called "Black Guns Matter"
- New campaigns targeted towards the formerly imprisoned
- A self-defense class marketed towards Black Americans
- Campaigns aimed towards Native Americans began
- Campaigns aimed towards Latinx Americans began
- LGBTQ+ campaigns ended
- Seemingly fewer advertisements for conservatives than before the election
I do not yet know why (if there is any specific reason) these campaigns started when they did. I think many of them probably coincide with events in the news — approval of the Dakota access pipeline, persecution of Latinx immigrants, etc. I also think that some of the urgency with the Trump-based ad campaigns went away after he got elected.
One thing I did notice is that campaigns launched after 2017 seemed to perform better, on average, than campaigns beforehand. For example, as noted in the "Memes" section, campaigns targeted towards Latinx people had the best cost efficiency for ad clicks by a wide margin than all other groups. This is probably due to either Facebook finding better ways to target users with their ads, or to the IRA figuring out how to create clickable advertisements at low cost. Browsing over the "Latinx" targeted ads shows that they lack the long tail of zero-click ads that occupy most other categories — almost every ad hit the mark.
IRA ads have a long tail when it comes to impressions and clicks. Some ad categories, like the 'Latinx' category seemed better able to beat this power law than others.
Self-Defense Classes, and Other Events
One of the most well-reported aspects of the IRA ads were the ways in which its representative coerced unknowing Americans into holding protests, rallies, and other events for them. Probably one of the most bizarre examples is how a mixed martial arts fighter was paid to host a successful series of self-defense classes for Black Americans, detailed in this article
. You can find these ads
in the Explorer by clicking the "Self-Defense" tag.
What struck me most about these ads is that in terms of budget, they received much more funding than campaigns that did not lead to real-world actions. This suggests that the IRA was, maybe obviously, willing to throw down more money on events that had an impact outside of social media. Unfortunately, exactly how much takes a bit longer to suss out, as no metadata was included with the ads denoting if they were a "Facebook Event" or not (my broken brain says: maybe a convolutional neural network trained on the images can figure it out..).
Not Just Americans
Target Location: Germany, France, United Kingdom
This a relatively minor point, but some ads released in this dataset were not aimed at Americans at all. Specifically, some of the anti-immigration ads (link
) were double-shared towards other European countries, probably because of these countries' increasing antipathy towards immigrants.
It is not clear to me from information in the initial release if this ad dataset was supposed to comprise the sum-total efforts of the IRA, or just the American portion. Regardless, it is worth noting regardless that the purpose of the IRA was not only to influence American voters, but also in rare cases people in other countries too.
Many have assumed that the IRA's goals included not only getting Trump elected to the presidency, but also just to generally sow chaos and stoke partisan divisions bewteen liberal and conservative Americans. This is generally the reaction of people I show the explorer to in person: "they're trying to turn us against each other!"
This reaction has bothered me a lot. It has been a common refrain in the past few years, especially by political moderates and conservatives, that one of the prime problems in the past few years has been political polarization as an evil unto itself. Democrats' and Republicans' (and by extension conservatives' and progressives') inability to discuss and compromise on disagreements is sometimes blamed for many of the wildly-unpopular conservative laws and policies
that shape daily life in America.
Does turning progressives and conservatives against each other actually work as a strategy to weaken the United States with regards Russia? Personally, I'm skeptical: I think that directly advocating for privatized health insurance, privatized schools, regressive tax systems, prison expansion, racist laws, and environmental deregulation among other things is a much more effective way of crippling the American system. While more information is needed to state it conclusively, I would guess that the IRA understands this too. We'll see soon, either way.
Regardless, one thing I hope everyone takes away from the Explorer and its datasets is that while sowing division was sometimes a goal, the IRA was also motivated by the concrete goal of supporting Donald Trump and conservative policies. Almost no advertisements supported Hillary Clinton; ads were taken out for her Democratic challengers purely in the service of weakening her political bid. Ads that appealed to progressives often honey-pots that redirected to Trump-supporting efforts, or just thinly-veiled Trump supporting pages themselves. The IRA spent, on average, 2.5x times as much on ads I classified as conservative ads, even when they did not perform as well as progressive ads.
There is loads more to see in this dataset. Seriously! Play around with the explorer, and send me an email or a tweet
if you spot anything interesting.
I started working on some automatic bar chart / scatter plot / histogram visualizations of the data using d3.js, but haven't had much chance to finish it yet. Keep track of my twitter for that — I might post some updates.
Most interesting, perhaps, is a new visualization project using IRA data. Twitter just released a gargantuan dataset of tweets composed by the IRA at this link
. There are new challenges with this dataset, both technical and visual. On the technical side, the dataset is too large to be operated on in memory like in the Russian Ad Explorer, so I will likely have to learn new frameworks (React?) to query a database on a server hosted somewhere. Fun, honestly. But the other challenge is how to query and search networked, textual data in a way that makes it easy to extract patterns from data. There are millions of tweets, so how do you find something interesting? Maybe once I am done with grad school applications, we will find out...
Coming soon, more d3.js, whatever project I do.