Against the backdrop of vast amounts of data being harnessed for Artificial Intelligence (AI), Harvard Law School Library’s Institutional Data Initiative (IDI) convened a conversation between Ruth Okediji, Harvard Law School professor of intellectual property, and Greg Leppert, IDI’s Executive Director. They explored how equitable access and participation in AI could be advanced, with a particular focus on African representation and data sovereignty. 

At the outset of the conversation, Professor Okediji emphasized the foundational challenge for African nations and peoples in this shift toward AI: that their representation in the data that trains these systems is minimal.  

“Africa in particular has been historically so badly misrepresented, not represented, partially represented.”  

-Ruth Okediji, Harvard Law School 
Professor of intellectual property 

“We have this mass amount of information and pressure to release information and to make it more accessible and usable and secure, and then we have huge amounts of the world’s population that just are not showing up in any of that data at all…Africa in particular has been historically so badly misrepresented, not represented, partially represented.”  

Professor Okediji further emphasized the challenges surrounding information gathering, information dissemination or even information creation, observing “the vast majority of the world is not actually a literary culture.” Relying primarily on written texts for AI training means that enormous swaths of knowledge from oral traditions, lived experience, and non-written cultural practices may be left out of these systems entirely.  

Meanwhile, data that has been applied to AI training has often been extracted “without the knowledge or permission of individuals who are engaged on social media, or the internet,” she said. This raises questions about consent, ownership, and the ethics of building AI systems on data taken from people who may not have awareness of its use.  

“The idea of a privacy jurisprudence in Sub-Saharan Africa is pretty shallow, so it’s a moment to begin to create ideas and visions of what it looks like for privacy regulation to define what technology can and cannot do,” Professor Okediji said.  

To address the ethical issues around data extraction, both Professor Okediji and Leppert push for “normative ecosystems that can affect the design of AI.”  

One way forward, Professor Okediji suggests, is to reimagine how authorship and copyright can empower communities and ensure more ethical participation. “What we should be most concerned about is the permission and the authorization of these communities to capture their stories and their data, in whatever medium we choose.”  

Norms such as attribution, particularly for oral traditions passed down within particular communities, can help ensure that proper credit is given to those originating communities.  

For Professor Okediji, viewing data as the result of partnership between communities and technology, rather than as something to be extracted, is a crucial step toward breaking patterns of inequity. “It is a moment for Sub-Saharan African countries to reimagine their legislation, to reconceptualize things like privacy and property in ways that are more attentive to our technological moment.”  

When thinking about the development of AI, Professor Okediji is mindful of the risk that a desire for local control of data might lead to data silos. But she is hopeful that shared goals of representation will “spur some multinational convergence, at least on basic principles, because we all have an interest in getting the best information possible from these large language models.” 

Professor Okediji finds inspiration in the role of libraries, emphasizing the unique value they bring as information ecosystems. As she explains, libraries “allow you to understand [a] topic, not in [an] insular silo, but in this network of information that makes the learning of the average user much richer and much more nuanced.” Librarians are trained to create interconnected webs of knowledge. Bringing their expertise into the development of large language models could encourage a richer and more nuanced delivery of information.  

Building on the role libraries play as preservers of cultural knowledge, organizations like the Institutional Data Initiative work to ensure that libraries and other knowledge institutions can participate in the development of AI. When these vast and diverse repositories are thoughtfully incorporated into AI training, they can help ensure that AI systems reflect the full spectrum of the world’s voices, experiences, and histories. 

Ruth L. Okediji is the Jeremiah Smith, Jr. Professor of Law at Harvard Law School, and Faculty Co-Director at the Berkman Klein Center for Internet and Society  

The Institutional Data Initiative is a research initiative at Harvard Law School Library that works with knowledge institutions—from libraries and museums to cultural groups and government agencies—to refine and publish their collections as data to facilitate responsible AI training. 

 

Filed in: In the Community

Contact Us
phone: 617-495-3455
email: asklib@law.harvard.edu
library website: hls.harvard.edu/library

Stay Connected
Library Innovation Lab (LIL) Blog
Instagram @hlslibrary
Facebook @hlslibrary

Et Seq blog (archived)