Through the Caselaw Access Project, Harvard Law School has made millions of legal decisions more accessible to researchers than ever before. On campus last week, the inaugural Caselaw Research Summit, hosted by the Harvard Library Innovation Lab, brought to light the diversity of research that the project is making possible.

The Caselaw Access Project (CAP) was the result of five years of work by the Harvard Library Innovation Lab at Harvard Law School. Between 2013-18, the HLS Library digitized more than 40 million pages of data covering 6.5 million individual cases; the most comprehensive database of American law available anywhere outside the Library of Congress. But unlike the latter it gives nationwide researchers free, immediate access to judicial decisions from each of the 50 states, dating back to their founding. Tweaks are still being made to CAP, notably a new Historical Trends app that can trace the number of a times a word was used in legal cases over the years, with a timeline pointing to the relevant cases.

The day-long summit at Milstein West brought together research teams from as far away as Oxford, England, who gave a variety of presentations on their use of the dataset, enhancing research that was already underway, with faster, more comprehensive access to data. Presenters explored the contents of court opinions and the evolution of language, and examined  themes like link rot and connecting legal data with other digital collections.

The CAP will allow researchers “to go where no legal researcher has gone before, at least not behind a paywall,” said Professor Jonathan Zittrain ’95, vice dean for Library and Information Resources, at the start of the summit. Invoking the movie “Field of Dreams,” Zittrain said that CAP has its own version of “if you build it, they will come.” As he said, “We had to have the same faith that if we were going through the trouble to scan this material, people would find entirely new ways of making use of the data. I wasn’t always so sure about this, until today.”

A series of presentations over the day showed how CAP is shedding new light on ongoing questions. A research team led by Benjamin Nyblade, of the UCLA School of Law, explored the question of “What Harms are Irreparable?” Their work hinged on the changing use of the term since 1991, when a landmark study by Douglas Laycock concluded that the irreparable injury rule (in which no relief can be granted unless there is harm which monetary compensation cannot cure) was essentially dead. However, Nyblade said, in recent decades the Supreme Court has reinforced the importance of judges evaluating whether harms are irreparable before enjoining laws or actions.

This project was already underway before CAP was introduced, and the team’s research was accordingly re-energized. Originally they had followed Laycock’s method of studying the results of specific cases, using LexisNexis and other available sources. But using CAP, they were now able to pull out all cases using the word irreparable, and to systematically analyze their content. Thus specific trends—including a recent rise of the term “irreparable harm,” rather than “irreparable injury”—could now be studied in context.

The study is still ongoing, and Nyblade noted that challenges remain in coding all that data—particularly since the word occurs in 90,000 cases, half of which are post-1980. Yet, he noted, this was a case where the CAP “take traditional questions that law professors and scholars have long been interested in, and help find better answers.”

Linguistic questions were also explored in a presentation, “Estimating Historical Trends in Legal Semantics via Aligned Textual Embeddings” by Ilya Akdemir of UC Berkeley.

This talk showed how the database was used for an ongoing legal problem—namely, the change in the meanings of certain words over the years. Linguists have long been fascinated by the transformations that certain words have undergone: In a famous example, King James II of England described St. Paul’s Cathedral as “amusing awful and artificial”—faint praise indeed, until you consider that “awful” was then closer to the modern “awesome.” Likewise, the 8th Amendment has prompted much legal debate over what “cruel and unusual punishment” would have meant in 1791. And until very recently, a “cell phone” was something only found in jail.

WATCH Caselaw Access Project Research Summit recordings

Presentations and lightning talks from the June 21 event.

The Berkeley team used the database to chart the evolution of a word by the company it keeps. And in some cases, the context of a word would speak to sweeping cultural changes: Before 1900, for instance, the word “marriage” was found within words like “intermarriage,” reflecting social norms of the time. After 1970 it was more likely to be found alongside words like “dissolution” and “divorce.” Likewise, the concept of a “usurious” loan was declining by 1970, when that phrase nearly disappears—an indication, perhaps, that high interest was less likely to be prosecuted.

Another use for linguistics was offered by Keyon Vafa of Columbia University, whose talk suggested that judges or candidates’ ideology can be grasped through their use of language. Traditionally, Vafa said, this would be determined through how they voted—a method that may be less reliable because of the idiosyncrasies of different courts, and because some decisions may be unavailable. A better method, he suggested, would be “text-based ideal points.” When discussing abortion, for example, words like “unborn” and “babies” would distinguish a right-leaning legislator; “woman” and “choice” a left-leaning one.

Tina Ching, of the University of Oregon, addressed the nuts-and-bolts problem of link rot (links to dead websites) in legal cases. This happens in some cases because the copying of a link gets corrupted over time—for example, with spaces or punctuation inserted. The CAP, she said, can be used to trace any citations of the faulty link, which can be fixed or deleted—a small step that will make life easier for many future researchers.

The summit was the law school’s first attempt to bring researchers together in person to meet and learn, and to share how the Library Innovation Lab can better support researchers’ work.

“I think everyone at the summit was really impressed,” said HLS Lecturer Adam Ziegler, director of the Harvard Library Innovation Lab. “It puts into context how much work they’ve done in just six months’ time. We won’t be changing the law itself, the law is what is written on the page. But in terms of using the data to draw new connections between ideas, between people and between organizations … that is all really powerful. And it can only be done when the data is available the way we have made it available.”


In 2015, Harvard Law School announced that it was digitizing its entire collection of U.S. case law, one of the largest collections of legal materials in the world, making the collection available online, for free, to anyone with an Internet connection.