Internet Archive weighs in on Artificial Intelligence at the Copyright Office

All too often, the formulation of copyright policy in the United States has been dominated by incumbent copyright industries. As Professor Jessica Litman explained in a recent Internet Archive book talk, copyright laws in the 20th century were largely “worked out by the industries that were the beneficiaries of copyright” to favor their economic interests. In these circumstances, Professor Litman has written, the Copyright Office “plays a crucial role in managing the multilateral negotiations and interpreting their results to Congress.” And at various times in history, the Office has had the opportunity to use this role to add balance to the policymaking process.

We at the Internet Archive are always pleased to see the Copyright Office invite a broad range of voices to discussions of copyright policy and to participate in such discussions ourselves. We did just that earlier this month, participating in a session at the United States Copyright Office on Copyright and Artificial Intelligence. This was the first in a series of sessions the Office will be hosting throughout the first half of 2023, as it works through its “initiative to examine the copyright law and policy issues raised by artificial intelligence (AI) technology.”

As we explained at the event, innovative machine learning and artificial intelligence technology is already helping us build our library. For example, our process for digitizing texts–including never-before-digitized government documents–has been significantly improved by the introduction of LSTM technology. And state-of-the-art AI tools have helped us improve our collection of 100 year-old 78 rpm records. Policymakers dazzled by the latest developments in consumer-facing AI should not forget that there are other uses of this general purpose technology–many of them outside the commercial context of traditional copyright industries–which nevertheless serve the purpose of copyright: “to increase and not to impede the harvest of knowledge.” 

Traditional copyright policymaking also frequently excludes or overlooks the world of open licensing. But in this new space, many of the tools come from the open source community, and much of the data comes from openly-licensed sources like Wikipedia or Flickr Commons. Industry groups that claim to represent the voice of authors typically do not represent such creators, and their proposed solutions–usually, demands that payment be made to corporate publishers or to collective rights management organizations–often don’t benefit, and are inconsistent with, the thinking of the open world

Moreover, even aside from openly licensed material, there are vast troves of technically copyrighted but not actively rights-managed content on the open web; these are also used to train AI models. Millions, if not billions, of individuals have contributed to these data sources, and because none of them are required to register their work for copyright to arise, it does not seem possible or sensible to try to identify all of the relevant copyright owners–let alone negotiate with each of them–before development can continue. Recognizing these and a variety of other concerns, the European Union has already codified copyright exceptions which permit the use of copyright-protected material as training data for generative AI models, subject to an opt-out in commercial situations and potential new transparency obligations

To be sure, there are legitimate concerns over how generative AI could impact creative workers and cause other kinds of harm. But it is important for copyright policymakers to recognize that artificial intelligence technology has the potential to promote the progress of science and the useful arts on a tremendous scale. It is both sensible and lawful as a matter of US copyright law to let the robots read. Let’s make sure that the process described by Professor Litman does not get in the way of building AI tools that work for everyone.

National Library Week 2023: Brenton, user experience

To celebrate National Library Week 2023, we are introducing readers to four staff members who work behind the scenes at the Internet Archive, helping connect patrons with our collections, services and programs.

Brenton Cheng learned to program in BASIC on an Apple II Plus at age 9. His mother was one of the earliest computer programmers and his dad was a marketing consultant for technology products in Portola Valley, California. By age 12, Cheng had written a series of animated games that he put together in a hand-assembled software package. It sold about four copies.

Now, Cheng is a senior engineer at the Internet Archive, where he leads the user experience (UX) team. “Our goal is to give our patrons a great experience on the Archive.org website while making sure that under the hood, our technologies are as simple, robust and maintainable as possible,” said Cheng, who has been at the organization for seven years.

Despite his early computer exposure, Cheng wanted to study something more tangible in college. He pursued mechanical engineering and earned a bachelor’s degree from Princeton University and a master’s from Stanford University. Along the way, he developed a love of contemporary dance and improvisation. Inspired by the creativity of movement, he veered toward biomechanical engineering in graduate school. 

Entering the job market, Cheng said he wanted a flexible schedule so he would be able to take workshops and occasionally go on tour with dance companies. He was a freelance computer programmer for about a decade, then worked at Astrology.com and NBCUniversal for another 10 years. 

In 2016, Cheng said he was drawn to the Internet Archive by its mission, reputation and people. “Being in the dance world, I was constantly surrounded with all kinds of eclectic, eccentric, fascinating, brilliant people,” he said. “There were certain common elements in the way the Archive embraces and benefits from diversity. I found many artists and engineers working in novel ways. That felt very much at home.”

From his experience working with improvisation in dance, Cheng said he loves trying to create the conditions within which people contribute their best work and feel good about what they’re doing. His team is focused on fighting for users and constantly making the website better for the public. “I also serve the digital librarians who are collecting and providing content for our patrons,” Cheng said. “I am giving them the tools, platform and environment to do their magic.” 

Tell us something about your role at the Internet Archive that most people wouldn’t know about.
Simultaneously with supporting the Archive’s mission and helping our patrons, I am always holding in the back of my mind the subtext of a “small team, long term.” These ideas guide choices around process, technologies and architecture. We regularly discard choices that would entail too much complexity or require too much on-going, hands-on maintenance. And we try to resist rushing features out the door that will only add to our technical debt later.

What is the most interesting project you’ve worked on at the Internet Archive?
I set up a wiki to allow scholars to submit transcriptions of scanned Balinese palm leaves.

What has been your greatest achievement (so far) at the Internet Archive?
Creating a team that likes working together, is resilient through conflicts and pushes each other to keep getting better.

What are you reading?
The Sense of Style by Steven Pinker. It’s a contemporary writing style manual that incorporates cognitive science and linguistics and acknowledges the evolving nature of language.

National Library Week 2023: Caitlin, events

To celebrate National Library Week 2023, we are introducing readers to four staff members who work behind the scenes at the Internet Archive, helping connect patrons with our collections, services and programs.

If there’s an event at the Internet Archive, there’s a good chance Caitlin Olson had her hand in it. And with about 80 events last year, including 40 in-person, that keeps her plenty busy. 

“I’m a helper by nature and my role involves wearing a lot of hats,” said Olson, senior executive assistant for seven years. 

While not a librarian by training, Olson said she enjoys supporting librarians and their work. Olson provides support for webinars online and parties at the Internet Archive’s headquarters in San Francisco. She also assists Internet Archive’s founder Brewster Kahle in his work, helps staff with IT issues (including migrating to remote work during the pandemic), and pinch hits when needed. 

“I’m the go-to person for most questions because if I don’t know the answer, I likely know who will,” Olson said, who prefers working behind-the-scenes and is known as a fixer who keeps a calm head. “Brewster says I help soothe the organization. I often can jump in and solve a problem.”

After graduating from high school in a small town in northern California, Olson said she gravitated to the Bay Area for college, so has both the “country mouse” and “city mouse” experience. After a stint in journalism, she was drawn to the Internet Archive. “I wanted to work for a place where people felt passionate about what they were doing—and I found that here,” Olson said.

What’s an aspect of your job that you especially like?
I work with our ceramicist who creates all of our statues for the Archive. Fun Fact: after you work here for three years, you get a statue made in your likeness (if you want).

What is the most interesting project you’ve worked on at the Internet Archive? 
Our annual Public Domain Day events and the book talks we host in collaboration with The Booksmith

Favorite collection at the Internet Archive?
The Attention K-Mart Shoppers collection 

What are you reading?
Fuzz: When Nature Breaks the Law by Mary Roach, which is about what happens when animals commit crimes, and From Here to Eternity: Traveling the World to Find the Good Death by Caitlin Doughty, which is a book that explores death-care in different cultures and it’s written by a badass mortician. 

Book Talk: The Apple II Age

Join author Laine Nooney for an IN-PERSON reading from their new book, followed by a conversation with historian Finn Brunton.

REGISTER NOW

“The Apple II Age is a joy to read and an extraordinary achievement in computer history. A rigorous thinker and a bright and witty writer, Nooney offers a compelling account of the initial attempts to make computers inviting to the public. The Apple II Age, like the old microcomputer itself, is bound to intrigue both experts and newcomers to the subject.” ―JOANNE MCNEIL, author of ‘Lurking: How a Person Became a User’

Join us for an engrossing origin story for the personal computer—showing how the Apple II’s software helped a machine transcend from hobbyists’ plaything to essential home appliance.

6:00 PM — Reception
6:30 PM — Book Talk: The Apple II Age
7:30 PM — Book Signing 

Please note that this event will be held in person at the Internet Archive.

REGISTER NOW

If you want to understand how Apple Inc. became an industry behemoth, look no further than the 1977 Apple II. It was a versatile piece of hardware, but its most compelling story isn’t found in the feat of its engineering, the personalities of Apple’s founders, or the way it set the stage for the company’s multibillion-dollar future. Instead, historian Laine Nooney shows, what made the Apple II iconic was its software. The story of personal computing in the United States is not about the evolution of hackers—it’s about the rise of everyday users.

Recounting a constellation of software creation stories, Nooney offers a new understanding of how the hobbyists’ microcomputers of the 1970s became the personal computer we know today. From iconic software products like VisiCalc and The Print Shop to historic games like Mystery House and Snooper Troops to long-forgotten disk-cracking utilities, The Apple II Age offers an unprecedented look at the people, the industry, and the money that built the microcomputing milieu—and why so much of it converged around the pioneering Apple II.

Laine Nooney is assistant professor of media and information industries at New York University. Their research has been featured by outlets such as The Atlantic, Motherboard, and NPR. They live in New York City, where their hobbies include motorcycles, tugboats, and Texas hold ’em.

Book Talk: The Apple II Age
May 11 @ 6pm
IN-PERSON @ 300 Funston Ave., San Francisco
Register now for the free, in-person event

National Library Week 2023: Liz, donations

To celebrate National Library Week 2023, we are introducing readers to four staff members who work behind the scenes at the Internet Archive, helping connect patrons with our collections, services and programs.

Liz Rosenberg first worked with the Internet Archive in the early days of the Great 78 Project. She helped design the digitization workflow of 78rpm records and estimates transferring 30,000 sides of records herself.

The self-described “record lady,” Rosenberg said the project was the perfect entrée to the organization. She graduated from Drexel University with a degree in music industry technology, with a specialty in audio recording and production.

In 2020, Rosenberg was officially hired by the Internet Archive in patron services and later asked to lead the organization’s physical donation program. She continues with the Great 78 Project, overseeing monthly uploads, resolving metadata issues and coordinating digitization of donated collections with partners at George Blood LP.

“The Internet Archive is a place that I had always dreamed of working,” Rosenberg said. “I really looked up to the mission of the Internet Archives so when the opportunity came up to work for them directly, I couldn’t have said yes faster.”

As donations manager, Rosenberg receives inquiries from individuals and librarians about donating their physical media to the Internet Archive for preservation and digitization, from single items to collections of millions of objects. She has overseen the donations of small folk music collections, individual collectors’ passion projects, and college libraries including Bowling Green State University and the University of Hawaii. 

The individual collector contributions often are triggered by the death of a loved one. “Those tend to be sensitive situations for families,” she said. “But they are grateful to almost be able to spend time with them through the preservation of their collection and be able to go and visit whenever they want. That’s very special.”

Rosenberg keeps a “warm and fuzzy thank you file” on her computer from donors that she said keeps her motivated to encourage others to share their collections, like the message below:

Dear Liz,

You are amazing! Thank you for your kind guidance and generous ways. Seeing the dedication today has brought a difficult and costly task of storing these books over such a long period of time to this heartfelt moment and for such a worthy cause. I am in the middle of grading portfolios and preparing for a solo art exhibition so, as usual, I need to juggle the books in between. I will be in touch soon but, again, I just wanted to let you know how wonderful you and your organization are 🙂

in kindest regard, Karen

What is the most rewarding part of your job?
For me, it’s really about preserving stories. I feel such a connection to donors that I work with when I get to hear the story of how a collection was created. We want to preserve those stories alongside the media itself. And that’s really such a privilege.

What has been your greatest achievement (so far) at the Internet Archive?
Presenting on behalf of the Internet Archive at the 2022 Association for Recorded Sound Collections Conference. A recording of the presentation, as given to the Internet Archive staff shortly after the conference, can be found on the Internet Archive here.

What’s your favorite item at the Internet Archive?
This transcription recording of a child playing accordion: https://archive.org/details/78_four-leaf-clover_sonny-walikis-and-his-squeeze-box_gbia0001730a. We transferred this record without knowing who the performer was or anything about their history. The family of Sonny Walikis actually found the recording in our collection shortly after their family member had passed away and reached out to tell us the history of the recordings. I always think of this record as the best example of why we preserve media – to connect people to lost stories and help memories live on.

What’s your favorite collection at the Internet Archive?
The 78rpm record collection! archive.org/details/georgeblood

What are you reading?
The Tower of Swallows by Andrzej Sapkowski

What is your secret talent?
Morphing into a children’s choir! I was a recording studio intern and we had children booked to sing the part but they got too distracted in the booth. So I sang all of the parts slowed down 10% and we sped them up to make me sound “child-like”. The results are one of my only vocal credits: https://www.youtube.com/watch?v=WlKhVhuTiik.

AI Audio Challenge: Audio Restoration of 78rpm Records based on Expert Examples

http://great78.archive.org/

Hopefully we have a dataset primed for AI researchers to do something really useful, and fun– how to take noise out of digitized 78rpm records.

The Internet Archive has 1,600 examples of quality human restorations of 78rpm records where the best tools were used to ‘lightly restore’ the audio files. This takes away scratchy surface noise while trying not to impair the music or speech. In the items are files in those items are the unrestored originals that were used.

But then the Internet Archive has over 400,000 unrestored files that are quite scratchy and difficult to listen to.

The goal is, or rather the hope is, that a program that can take all or many of the 400,000 unrestored records and make them much better. How hard this is is unknown, but hopefully it is a fun project to work on.

Many of the recordings are great and worth the effort. Please comment on this post if you are interested in diving in.

Book Talk: Against Progress

Join journalist MARIA BUSTILLOS for a virtual book talk with author & professor of law JESSICA SILBEY for her latest book, AGAINST PROGRESS.

REGISTER NOW

When first written into the Constitution, intellectual property aimed to facilitate “progress of science and the useful arts” by granting rights to authors and inventors. Today, when rapid technological evolution accompanies growing wealth inequality and political and social divisiveness, the constitutional goal of “progress” may pertain to more basic, human values, redirecting IP’s emphasis to the commonweal instead of private interests.

Against Progress considers contemporary debates about intellectual property law as concerning the relationship between the constitutional mandate of progress and fundamental values, such as equality, privacy, and distributive justice, that are increasingly challenged in today’s internet age. Following a legal analysis of various intellectual property court cases, Jessica Silbey examines the experiences of everyday creators and innovators navigating ownership, sharing, and sustainability within the internet eco-system and current IP laws. Crucially, the book encourages refiguring the substance of “progress” and the function of intellectual property in terms that demonstrate the urgency of art and science to social justice today.

Purchase Against Progress from Stanford University Press.

JESSICA SILBEY is Professor of Law at the Boston University School of Law. She is the author of Against Progress: Intellectual Property and Fundamental Values in the Internet Age (Stanford, 2022), The Eureka Myth: Creators, Innovators, and Everyday Intellectual Property (Stanford, 2015), and was a Guggenheim Fellow in 2018.

BOOK TALK: AGAINST PROGRESS
May 9 @ 10am PT / 1pm ET
Register now for the free, virtual event

San Francisco Board of Supervisors Unanimously Passes Resolution in Support of Digital Rights For Libraries

San Francisco City Hall from east end of Civic Center Plaza

In a stunning show of support for libraries, late yesterday afternoon the San Francisco Board of Supervisors voted unanimously to support a resolution backing the Internet Archive and the digital rights of all libraries.

Supervisor Connie Chan, whose district includes the Internet Archive, authored the legislation and brought the resolution before the Board. “At a time when we are seeing an increase in censorship and book bans across the country, we must move to preserve free access to information,” said Supervisor Chan. “I am proud to stand with the Internet Archive, our Richmond District neighbor, and digital libraries throughout the United States.”

WATCH Supervisor Chan introduce the resolution:

What’s in the resolution?

The resolution is a powerful statement in support of libraries, beginning:

Resolution recognizing the irreplaceable public value of libraries, including online libraries like the Internet Archive, and the essential rights of all libraries to own, preserve, and lend both digital and print books to the residents of San Francisco and the wider public; supporting the Internet Archive and its public service mission; and urging the California State Legislature and the United States Congress to support digital rights for libraries, including controlled digital lending and the option for libraries to own their digital collections. 

Read the full resolution

Rally on the steps of San Francisco City Hall

Supporters surround Internet Archive founder Brewster Kahle and District 1 Supervisor Connie Chan on the steps of City Hall.

Before the vote, supporters rallied outside on the steps of City Hall. Joining Supervisor Chan on the steps were Brewster Kahle, Internet Archive; Cindy Cohn, Electronic Frontier Foundation; Chuck Roslof, Wikimedia Foundation; and author and activist Liz Henry.

“It’s a sad day that we have to be here to talk about the importance of maintaining access to information through libraries,” said Brewster Kahle, Digital Librarian of the Internet Archive. “We must stand firm in our commitment to providing Universal Access to All Knowledge.”

“The Internet Archive and its goal of universal access to all human knowledge represents the best of Technology.” said Cindy Cohn, Executive Director of the Electronic Frontier Foundation. “We must stand up for the privacy of our reading, the digital lending strategies that publishers want to promote violates our privacy and our ability to investigate freely.”

“The work of the Wikimedia Foundation centers around providing access to knowledge for all people, around the world.” said Chuck Roslof, Lead Counsel at the Wikimedia Foundation. “In this mission, Wikipedia doesn’t stand alone. Libraries and archives play a critical role as part of our ecosystem of free knowledge, to ensure that all of us have access to reliable, accurate information about the world around us. The Internet Archive is the internet’s library, and it is an invaluable resource to Wikipedia editors and readers…”

Author and disability justice activist Liz Henry spoke about the importance of digital libraries from their experience as a wheelchair user. “Access to digital lending from libraries and the Internet Archive is a critical lifeline for disabled people and seniors.” said Henry, going on to explain how they used the Internet Archive to research a brick that they found under their house during construction. Using materials from the web, as well as digital books from the Internet Archive and San Francisco Public Library, Henry was able to determine that the brick, stamped C H for City Hall, was manufactured in the 1870s, and was part of the original City Hall structure, which burned down in the 1906 earthquake. Henry completed their research while they were having mobility issues and limited to the house, underscoring the importance of digital access to library materials. You can read more about this fascinating discovery on Henry’s blog. 

Many thanks to Supervisor Chan for being a strong advocate for libraries, and for making San Francisco the first municipality to codify the importance of digital libraries and controlled digital lending in a resolution. Many thanks as well to all the supporters who joined us on the steps and who submitted letters in support of the resolution.