In 2021, journalist Aroon Deep broke a story on Medianama, a tech and policy news site, detailing how the Indian government had requested the removal of hundreds of tweets critical of Prime Minister Narendra Modi’s handling of the COVID-19 crisis. The story made global news, driving a new wave of requests for access to Lumen — the internet takedown request database housed at Harvard Law School, which Deep had used to uncover the Modi administration’s efforts to silence its critics. “The [takedown] notices made available on Lumen have stand-alone value,” Deep said at the time. “More often than not, they are the genesis of our stories about censorship.”
In the 22 years since Lumen’s founding, its impact has grown at an accelerated pace alongside the scope of the internet itself. Like so many start-ups, it began in a very different place and time. Back in 1998, the internet was beginning to flex, find form, and take off. Peer-to-peer file-sharing platforms sprouted like mushrooms, making it possible to enjoy music and other creative content without paying for it. Copyright holders were freaking out.
In response, President Bill Clinton signed the Digital Millennium Copyright Act, or DMCA, in October 1998. The act offers legal recourse when copyright is violated online and creates a safe harbor for the platforms and internet service providers hosting the material in question. The holder simply files a takedown notice with the provider, requiring the material in question to be removed from the internet. Problem solved. Big exhale.
Or not. Soon, a new problem emerged: There is little incentive for hosts — Google, for example, or YouTube — to question the validity of the takedown notice. From a legal perspective, it is in their best interest to simply remove the material on behalf of the party making the claim. Yes, they enjoy a safe harbor — but only if they abide by existing laws. And investigating the legitimacy of every claim would take time and money from other, potentially more lucrative, projects. For companies, it was simply safer and less costly to hit the delete button. (Over the course of its history, Google has received more than 9 billion takedown notices related to copyright alone.)
In time, that reality opened the Clinton-era copyright law to abuse. If someone didn’t like what they saw, it was easy enough to claim copyright infringement, file a takedown notice, and have the material removed.
The year the DMCA passed, Wendy Seltzer ’99 was a student in Jonathan Zittrain’s new course, Internet & Society. “There are ways in which law shapes behavior, even if it never sees the inside of a courtroom,” Zittrain ’95, George Bemis Professor of International Law, says of the law. “Those takedown requests have an impact, and if there’s no court case, there’s no means of tracking that effect.” During a classic water cooler conversation in Pound Hall between Zittrain and Seltzer, a seed was planted: What if there were a neutral, centralized, searchable database of takedown notices to address the information gap?
The seed of that idea wouldn’t germinate until 2002, when Seltzer returned to Harvard Law as a fellow at what is now known as the Berkman Klein Center for Internet & Society, which Zittrain helped found. The resulting project, now called Lumen, has since collected approximately 35 million takedown notices and counting. This year, the database is on track to receive some 8 million notices, according to Adam Holland, Lumen’s project manager since 2012.
“Lumen’s goal is to bring transparency and knowledge to what information is available online, or not, and why,” Holland says. “We don’t have a policy stance; we simply believe that good data makes good policy, and therefore what we are strongly in favor of is the idea that takedown notices should be studied.”
Uncovering dirty deeds done dirt cheap
Christopher Bavitz, WilmerHale Clinical Professor of Law, supports Lumen’s work as principal investigator. He also uses the database for his own research in IP and media law. Before coming to Harvard Law in 2008, Bavitz served as senior director of legal affairs for EMI Music North America.
“We couldn’t have a user-generated content internet without the DMCA,” he comments. “Some copyright holders might say that would be a great thing, but I think on balance we want to allow people to express themselves by uploading material to these platforms — and we also want to respect the rights of copyright owners who can send these notices.” Lumen, he says, brings clarity and neutrality to that process.
Researchers come to the database with a range of questions. For example, a researcher might want to analyze how takedown notices originating in Brazil impact the revenue of that country’s music industry. A broader agenda — with eye-opening results — might examine how many notices are fraudulent.
In 2016, research by Professor Eugene Volokh of the UCLA School of Law leveraged Lumen’s database to reveal that over a four-year period, nearly 200 out of 700 court orders submitted to Google were highly suspect, with 80 confirmed as forgeries. The forgeries ranged from amateur Photoshop jobs to alter the names on real court documents to a company whose entire business model was built on creating fake court orders. The company was later prosecuted by the Texas attorney general as a result of Volokh’s research.
Research has also revealed tactics exploiting the DMCA itself. In 2020, a Wall Street Journal article uncovered hundreds of instances of fraudulent filers plagiarizing material from the post they wanted removed and publishing it on a different site, backdated — thus opening the door to a seemingly valid allegation of copyright infringement and a swift removal of the material.
In one example, a Colorado woman’s long-abandoned LiveJournal site was commandeered to publish backdated Russian-language posts about a businessman with alleged ties to organized crime. Those fake posts were then used as the basis for a takedown notice for a valid investigative piece about the businessman published by a Ukrainian affiliate of the Global Investigative Journalism Network. (Google restored 52,000 deleted links after the Journal shared its findings.)
Without Lumen, it’s highly unlikely that any such malfeasance would have been discovered. Yet, smoking out illegal activity was not part of the project’s original intention. At its launch in 2002, the database was called Chilling Effects, reflecting the potential negative impact on freedom of expression caused by overzealous copyright enforcement. Seltzer, who had just taught herself to code, built a site with an FAQ section and annotated examples of cease-and-desist letters to help people better understand their rights relative to the law.
Submission of takedown notices to Lumen is voluntary in the United States. Yet, as word of the database spread, the number of takedown notices the site received grew, and other institutions and companies got involved. Seltzer cites Professors Laura Quilter and Jennifer Urban, both graduates of UC Berkeley Law, for their role in connecting Google with Chilling Effects.
“After we started getting notices from Google, and then a bit later from Twitter, the project started to feel almost infrastructural — that this was something that helps the internet work better,” Seltzer says. Thanks to Lumen, a user’s Google search might now include a message noting that one or more results have been removed due to a complaint received under the DMCA, with a link back to the complaint in the Lumen database.
By 2015, the project’s evolution and growing prominence required a new name: “We wanted to move from the unstated implication that many of these requests had the effect of chilling legitimate speech to a more neutral suggestion of illumination,” says Seltzer. (The new name also made more sense to non-native English speakers, an important consideration, given the database’s global reach.) Today, Lumen’s roster of participating companies has expanded to include a host of household names, including Reddit, Medium, Wikipedia, GitHub, Vimeo, and WordPress.
“It’s in everyone’s interest to be part of something larger,” says Holland. “YouTube no doubt has internal analytics on takedown notices, but with group participation, you get the benefit of network effects and a much richer, more complex picture.”