Skip to Main Content

University Library

LibGuides

Introduction to Generative AI

This library guide is a UIUC campus resource to read and reference for instructional, professional, and personal learning. Updates will occur on a semester basis. Last Updated: November 2024

Training AI

HathiTrust LogoWhile experts generally believe that training generative AI using varied corpus materials is a fair use, until court cases address the issue head-on on it will remain an open question. Experts generally believe it is a fair use based on cases such as Authors Guild v. Hathitrust, 755 F.3d 87 (2d Cir. 2014), where the Second Circuit Court of Appeals found that scanning books for text and data mining (TDM) was “quintessentially” a fair use. Note a few caveats, however:

  1. In HathiTrust, the corpus scanned were physical books, so no licensing was involved.
  2. In HathiTrust, TDM was not generative as it is with AI models.

Note: If there are contractual restrictions involved that expressly prohibit training AI, it may not be copyright infringement to train AI using the corpus, but it could be a breach of contract (see Contractual/Licensing Issues).

Contractual/Licensing Issues

While general copyright exceptions and limitations apply to carve out acceptable uses of otherwise copyright-protected works, contractual restrictions can curtail any or all of those exceptions and limitations, such as fair use. For example, most things on the World Wide Web are automatically protected by copyright. Many websites employ terms of service (contracts) to curtail the use of their content. If the terms of service of a website state that its contents may not be used to train AI, then a breach of those terms may be considered a breach of contract. However, courts vary on whether they will enforce terms of service with no “click-through” license on the grounds that there is no legally binding contract between the person browsing the web and the website owner.

Courts are much more likely to enforce so-called “click-through” licenses that require a person browsing the website to acknowledge their compliance with the terms of service by clicking “I agree” to the terms of the website. Another example is library-licensed databases that collect journal articles. Patrons may think that no licensing binds them to comply with the terms of the licenses between the library and the database vendors, but they would be mistaken. In return for access to the database of materials, the libraries pay a fee and agree to bind themselves and their patrons to terms and conditions laid out in elaborate license agreements–many of which expressly prohibit scraping TDM, and generative AI corpus training.

Creation and Ownership of AI

A photograph of Oscar Wilde leaning on his handIt is well known that only human authors can create and own a copyright. However, the mere fact that technology was used to create the output alone is no reason to deny a copyright to an author. For instance, in Burrow Giles Lithographic Co. v. Sarovny, 111 U.S. 53 (U.S. 1884), an author used a camera to take a photograph of Oscar Wilde. Because the author made creative choices regarding the lighting, the positioning of the subject of the photo, and the like, the Supreme Court found that the photographer owned the copyright in the photo. The key is that the input by the creator must satisfy the requirement of originality, which is a touchstone of copyright law. As such, if someone merely enters a prompt into AI, such as draw a monkey, the US Copyright Office may deny registration due to a lack of originality.

However, if the human author “select[s] or arrange[s] AI-generated material in a sufficiently creative way [such] that “‘the resulting work as a whole constitutes an original work of authorship”’ then the creator may be able to claim copyright in the resulting AI-generated work. Similarly, if ‘’an artist . . . modif[ies] material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection’” the author may own a copyright in the modified work. “In these cases, copyright will only protect the human-authored aspects of the work, which are ‘independent of’ and do ‘not affect’ the copyright status of the AI-generated material itself.” See the USCO Copyright Registration Guidance: Works Containing Material Generated by AI (citations omitted).

Note: However, if the resulting image or work is substantially similar to the work used to train the AI, there may still be a claim of copyright infringement.