#Celebgate: Searching for Illicit Content During the 2014 Celebrity Photo Leaks
On August 31st, 2014, nearly 500 sensitive images captured from the mobile phones of various celebrities were released onto 4chan.com. With alarming alacrity, these stolen personal photographs made their way to slightly more mainstream content sites, including Reddit, Tumblr and Twitter. Internet users and media respondents have termed the phenomenon “Celebgate” or, more popularly and vulgarly, “The Fappening” (a portmanteau between happening and slang for masturbation). The leak raised numerous questions about privacy rights online, iCloud security, and the responsibilities of host sites. With the takedown of these photos as an exemplar, how do we search for information that makes itself hard to find?
The spread of viral content generally lends itself well to analysis because the content leaves a long trail. However, in the case of #Celebgate, the sensitivity of the content demanded its censorship on some of the most prominent sites to host it. Consequently, the trail these photos leave are full of (literal) missing links.
Different platforms responded with different stances towards removing the leaked content. Most of the celebrity initiated requests for removal cited the Digital Millennium Copyright Act rather than privacy rights, as it is some of the toughest legislature available for forced content removal. Following suit, sites had to adopt the language of property rather than privacy in justifying their policies. Reddit, which quickly became a principal hub for hosting the stolen images, shut down the dominant forum (or subreddit) centralizing them about a week after their initial upload. They stated publicly that, “In accordance with our legal obligations, we expeditiously removed content hosted on our servers as soon as we received DMCA requests from the lawful owners of that content, and in cases where the images were not hosted on our servers, we promptly directed them to the hosts of those services.” In removing the pivotal nucleus for these images, Reddit decentralized ways in which Internet users could find them.
In practice, Google has allowed many links to stolen content to remain findable in their results
Google was faced with similar legal pressure to remove links to the stolen content. They stated, “We’re removing these photos for community guidelines and policy violations (e.g. nudity and privacy violation) on YouTube, Blogger and Google+. For search we have historically taken a different approach as we reflect what’s online — but we remove these images when we receive valid copyright (DMCA) notices.” Yet despite these assertions, in practice, Google has allowed many links to stolen content to remain findable in their results. For instance, Justin Verlander, the boyfriend of hacking victim Kate Upton, filed a copyright claim with the request to remove 444 URLs hosting Upton’s stolen photos. In a transparency report, Google states that no action was taken in 41% of these cases; 41% of the images requested for removal still appear in Google search results. While they did remove the majority of links to the most central sites, they allowed many of the more peripheral domains to remain. For instance, though Reddit removed The Fappening subreddit, Google could still reveal links to Upton’s photos on subreddits like /r/Celebs or /r/ledzepplin.
Bearing in mind this discrepancy in censorship as well as the knowledge that Google is typically the first port of call for enquiries online, both for specific host websites and detached content, we arrive at an interesting question: How does centralization and decentralization of content create a shift in search terms? Specifically, when Reddit shut down /r/TheFappening, how did individuals adapt their Google search terms to locate the decentralized content?
Figure I. Searches for “fappening”, “Jennifer Lawrence photos,” “4chan” and “reddit” between August 1 and September 31, 2014.
While Reddit, 4chan, and “Jennifer Lawrence photos” spiked immediately during the first outbreak of photos, it took a bit of a lag for “fappening” to peak. This could have something to do with the newness of the term as it related to the phenomenon. However, during the second outbreak of photos on September 21st, the established term “fappening” rose at the same rate as 4chan and Reddit. This could be an indicator that once the leak was classified with a name, the images began to be searched collectively rather than by specific victims of the theft—we can see searches for Jennifer Lawrence’s photos drop below searches for the movement as a whole. Moreover, while 4chan was the first port of call during the first leak, Reddit was more highly searched during the second wave. There could be many reasons for this, one of which could be the more ‘mainstreaming’ of the phenomenon; while 4chan prides itself on its obscurity, Reddit proclaims itself “The Front Page of the Internet.”
Figure II: Searches for “fappening”, “Jennifer Lawrence photos,” “Kate Upton photos,” “Kim Kardashian” and “Rihanna” between August 1 and October 31, 2014.
Searches for both “Jennifer Lawrence” and “Kim Kardashian” are notably higher than searches for “fappening,” and the specific peaks for each celebrity correspond with the release of their own private photos. Looking at this graph, it would appear that while certain individuals generated interest on their own, “The Fappening” as a phenomenon gained notable interest as well. These findings could have interesting implications for how victims of the cyber-thefts are viewed; as individuals or part of a class. An analysis of search terms provides insight into not only how a viral phenomenon diversifies or concentrates, but also how search users must conceptualize (or reconceptualize) the phenomenon in order to locate it.