With the rise of artificial intelligence, reams of content are being used and appropriated in ways the creators never imagined. Does this practice fall under fair use, or are thousands of copyright cases soon to take place? Harvard Law School’s Rappaport Forum, “Intellectual Property and the Dawn of Generative AI,” with intellectual property experts Pamela Samuelson and Justin Hughes ’86, went deep into the implications of the new technology.
Moderator Ruth Okediji LL.M. ’91 S.J.D. ’96, Jeremiah Smith Jr. Professor of Law at Harvard, opened the Oct. 30 program by posing a question to the two panelists: “What is new about AI with respect to copyright? Why is this just not another wave of what we saw with the internet?”
In some ways, AI isn’t entirely dissimilar from past inventions, said Samuelson, a professor at Berkeley Law. “There have been disruptive technologies before, and sometimes they’ve been subject to lawsuits. So, this is not entirely new.”
A pioneer in digital copyright law, Samuelson pointed out that “moral panics” over intellectual property go all the way back to the vogue for player pianos in the early 1900’s, and then the photocopy machine, which was “the bane of publishers’ existence” for a time. Then came the home video recorder — seen as such a great threat to American film producers and the public, she said, that Motion Picture Association of America president Jack Valenti once likened it to to the risk presented by “the Boston Strangler to a woman alone.”
But in those cases, she pointed out, exact copies were made of the original work. AI is trickier because it makes a new creation out of many previous works. “The steps that go into the development and deployment of these systems are first gathering large quantities of information, most of the information is available on the open internet.” The technology, she said, makes “an effort to understand through statistical modeling what things tend to be next to what things, and what things tend not to be next to things.” For example, an image generator may draw generic dogs and cats from different works depicting many particular dogs and cats, she said. But it doesn’t replicate exactly the dogs and cats in its data set. “That’s actually a thing that we find difficult to parse.”
She noted that new rules were being written even as she spoke: Earlier that day, the White House had issued an executive order calling for stronger standards in AI safety and security. She also cited a report that Goldman Sachs issued in March of this year, claiming that 300 million jobs would be lost due to generative AI. “Usually, it’s only the recording industry or the motion picture industry or the publishers who are upset,” she said. “Now, everybody is upset.”
Hughes, a Loyola Law School professor who previously worked in the Obama administration as senior advisor on intellectual property, argued that the creation of the “large language models” used in AI copies material in ways that violate Section 106 of U.S. copyright law. Further, he said, many of these models draw from Books3, a widely used generative AI training set, which was built from a “shadow library” of nearly 200,000 pirated copyrighted books.
Image generation, he added, has its own grey areas.
“There’s extensive reporting on the digital sweatshops around the world … in which images are taken off the internet and they are useless” for use by generative AI unless they are annotated, he said. “So, you have to have someone annotating the image, producing a copy, and then producing a data set from which the model is trained. And our best speculation is that all the major companies keep copies of these training sets.”
Okediji asked whether such copying should be actionable. In response, Hughes noted that when existing material is repurposed through AI, many scholars consider this a “non-expressive” use of the original works. “Many scholars would like to push it over into concepts we already have. I think we should be ready for a complete re-think on everything. Because one thing that happens in this conversation … is this relentlessly anthropomorphic language: ‘It learns. It trains. It remembers. It has a knowledge set.’”
“If we do think that it [AI language] is learning,” he added, “then I think it may be a kind of ingestion that is quasi-expressive.”
“The courts have been pretty clear that if you are making copies of things in order to extract information, especially if you then enable more things to happen with that information that advances the public interest, that actually is something which doesn’t run afoul.”
Pamela Samuelson
So far, noted Samuelson, courts have tended to be friendly toward AI source material being covered by fair use. “I think the shadow library issue is going to be one of the tough ones for the courts to deal with,” she said. “But the courts have been pretty clear that if you are making copies of things in order to extract information, especially if you then enable more things to happen with that information that advances the public interest, that actually is something which doesn’t run afoul. And in terms of the training data sets, a big consideration has been whether the second use of the work, does it have the same purpose as the original?”
Thus, a fair use defense might apply if a programmer uses a book for a substantially different reason (to extract statistical correlations for a productivity tool) than that for which the author originally wrote it (to sell copies).
Samuelson also suggested that it may be too late for regulation, asking audience members to raise their hands if they use ChatGPT. “You and 180 million other people are users of these technologies,” she said to the room full of uplifted arms.
“If these generative AI companies are infringers, then potentially all the users are infringers too. And I don’t think courts are going to necessarily think that’s a really cool idea.” That’s one of the reasons, she said, the Supreme Court sided with Sony in a 1984 copyright lawsuit [Sony Corp. Of America v. Universal City Studios, Inc.] brought by entertainment companies over the popular Betamax video recorder, which consumers were using to record television programs to watch them later.
“I am tired of this monolithic god innovation getting in the way of serious discussions about what’s best for all of us.”
Justin Hughes
But the courts did shut down Napster, Hughes pointed out in response, a service which also had millions of users. “I say to you [that] you shouldn’t care about the amounts at stake because that’s the deep pockets of Microsoft and Google, and you shouldn’t care. … What you should care about is what’s best for all of us.”
He cited a recent comment by Senate Majority Leader Chuck Schumer labelling innovation as the north star of his approach to AI. “I wanted to strangle him. … Because the north star for a senator or the leader of the Senate is the welfare of the American people, not innovation. I am tired of this monolithic god innovation getting in the way of serious discussions about what’s best for all of us.”
To illustrate his point that innovation isn’t always to everyone’s benefit, Hughes recalled the very first Rappaport Forum, at which speakers agreed, he said, that “Generation Z, Millennials are statistically more stressed out and more depressed than any generation before them, and it’s because of social media.”
Jerome “Jerry” Rappaport ’49 M.P.A. ’63, helped to launch the Harvard Law School Forum in 1946. In 2020, he and his wife Phyllis supported the establishment of the Rappaport Forum at Harvard Law to promote and model rigorous, open, and respectful discussion of vital issues facing the world.