Skip to Main Content

University Library

LibGuides

Introduction to Generative AI

This library guide is a UIUC campus resource to read and reference for instructional, professional, and personal learning. Updates will occur on a semester basis. Last Updated: March 2024

Generative? Artificial? Intelligence?

A characterization of an AI structure with two people. One person has a magnifying glass against a circuit in the shape of a human head. Another person is carrying a gear. A lightbulb, gears, and a phone are in the background.Artificial Intelligence (AI) is a way to train computers to carry out a range of complex tasks and to learn from these tasks. It uses a process called machine learning to identify patterns in images, texts, and other materials. This process can involve human input or be fully automated. Generative Artificial Intelligence (genAI) is a specific kind of AI that is designed to generate new text, images, video, or audio based on the content that it has studied.

In order for the computer to generate new material, it needs to: 

  1. Break large amounts of information into machine-readable chunks
  2. Create a model to represent patterns in this information
  3. Use this model to derive new works

Let’s take a closer look at this process for text, images, and sound.

How does text-based AI work?

An open book with code behind it, generated by Adobe FireflyTools like ChatGPT and Microsoft Co-pilot are GPTs, or generative pretrained transformers. 

They’re generative in that they’re designed to generate new text, which is awesome if you’re trying to write a new country music song about burritos or a new plot hook for a novel, but not so great if you’re trying to quote and cite published scholarly articles. 

These tools are pre-trained. By the time you ask an AI tool to generate something for you, it’s already done all of its homework. It has studied large amounts of text authored by humans in order to understand the relationship between individual words. For example, it can see that the words “plots”, “corn,” and “experiment” tend to appear in paragraphs on the Morrow Plots, our experimental corn field, but these algorithms also know that “corn” can also appear in a chowder recipe.

The computer’s understanding of words and their relationships is called a language model. AI models are trained on hundreds of thousands of texts, so we refer to these as large language models or LLMs. 

 

Finally, these tools are transformers, which means that they take what they know about combinations of words to create new combinations that sound plausible. They use an algorithm called a transformer to generate new text one word at a time, adding an element of randomness to mimic human creativity. 

Resources

How does image-based AI work?

A robot looking at a painting with the sunset behind it, generated by Adobe FireflyInstead of looking for relationships between words, computers “see” an image by looking at pixels, the basic building blocks of digital images. The computer will compare one pixel to the ones that surround it, paying close attention to colors, outlines, and texture. During the training process, the computer learns how to identify parts of an image that are important (feature recognition) and then categorize them (classification). Tools like Adobe Acrobat and ABBY Fine Reader use optical character recognition to identify the shapes of letters and create transcriptions of text. 

If you have ever had to prove you’re not a robot by clicking all the squares with a fire hydrant or by typing the letters that appear in a picture, you’ve helped a computer identify features and classify them. We refer to human intervention in the training process as “supervised learning,” but computers can also engage in “unsupervised learning” by checking their work without having to ask a human.

As part of the training process, the computer will add random pixels to an image, then test to see if it can still recognize the objects in the picture. This digital noise helps the computer to create a mathematical representation of what it sees as the essence of an object. For example, it might associate the phrase “tabby cat” with a formula for an object with a round shape, two pointy ears, and patterns in black, brown, and white.  When generating images, the computer will use the mathematical equation, plus what it has learned when it added pixels,  to create an outline then add details to the picture, spreading them throughout the image in a process known as stable diffusion. 

Computers use this process to generate 2D images as well as 3D objects. Tools like Polycam can take photos of one object from many different angles and stitch them together using a technique called photogrammetry. 

Resources

How does sound-based AI work?

A robot happily listening to music, generated by Adobe FireflySurprisingly, AI uses image recognition techniques to identify and recreate sounds. The two aspects of sound that are most important for generative AI are frequency (Is this a high-pitched screech or a low growl?) and amplitude (Is this a whisper or a shout?). The computer visualizes these two aspects of sound as a spectrogram–it represents frequencies vertically along the y-axis and uses a sliding scale of colors to represent the intensity of the amplitude. From there, the computer uses visual pattern recognition to identify important features and then classify them. This technique allows the computer to isolate the rumble of a plane flying overhead from an umpire’s whistle because these sounds form different patterns on the spectrogram. Once it recognizes these patterns, it can then use the information about frequency and amplitude to create sounds in new combinations. 

Resources