Skip to main content

University Library, University of Illinois at Urbana-Champaign

Introduction to R: Getting Data from a CSV file

A brief tutorial on the R programming language.

Variables

The following section covers how to import data from a csv file into a variable.   A variable is simply a space in the computer's memory that saves one piece of data. This can be a single number or character string, or, more often, a larger group of data (such as the data structures discussed in the next section). Variables are very important, because, once the data is associated with a variable name, you can use that name to refer to it until it is deleted. The concept of variables is common to most computer programming languages, and, while languages like C++ and Java require you to create the variable ahead of time and specifiy the type of data that will occupy it, in R, you can assign data to the variable immediately, without specifying any information about the data. The use of term "variable" can be confusing here, since, in statistics it is common to refer to a column in a table as a variable, and, in the sense that both are containers that store and identify data, they are similar concepts. In R, however, a "variable" can store any data or group of data.

Importing Data into R

R can handle data in a number of formats, but among the easiest to obtain and load is the CSV. A CSV (Comma Separated Values) file is just a plain text file that has columns and rows like a table or speadsheet. In a CSV, each row is on a different line and the entries in every row are separated by commas. Most spreadsheet software, like Excel or OpenOffice Calc, can save the contents of a spreadsheet in csv form. To follow along with this tutorial, download the data, in CSV form, from the link on the Introduction tab. Make sure to move the data to somewhere obvious, like the desktop. To upload data from a CSV file to R, use the "read.csv" command as shown below:


 importing data into R

 

When using "read.csv," follow it directly (no spaces) with parentheses that contain the full name of the file being uploaded. In the example above, the syntax is "read.csv("C:\\Users\\nameredacted\\Desktop\\illinois_census_by_county.csv")". This provides several challenges, especially if you're not used to dealing with file paths, so let's walk through it.

First, the filename needs to be in quotation marks (either double or single will work).

Second, in addition to the filename, we'll need to tell R where the file is located on the hard drive. One way to do this, as shown above, is to include the full file path with the filename. The file path is like an address in words that tells the computer's operating system how to find the file. On a Windows machine, the path can be found by right-clicking the file icon, and selecting "Properties." You'll see the file path, minus the name of the file, in the "General" tab, labeled "location." The full file name will be this path plus the name of the file (make sure to include the ".csv" at the end). On a Mac, you can copy the full file path by right-clicking on the file or folder in the Mac Finder. While in the right-click menu, hold down the option key. The right-click menu will then reveal an option to "Copy "filename" as Pathname."

Another issue for Windows users is that the names of the directories in the filepath will be separated by backslash ("\") characters. Unfortunately, like many other programming languages, the backslash has a special meaning in R: it is used to designate escape characters. When supplying R with a file path in Windows, every backslash must be doubled (as shown above).

There are times when giving the full file path might be desirable, but in most cases a short cut can be taken. R allows us to define a working directory that will act as a default file path. To set the working directory click "File"->"Change Dir," select the folder where the data is located and click "OK." When the working directory is set to the folder that contains the data to be imported, the read.csv command will need only the filename between the quotation marks.

After successfully executing the "read.csv" command, the data stored in the csv file will rush through the screen, but to actually use the Data, it will need to be stored in a variable.

In R, a variable can hold any type or size of data and can have any name that isn't already a command in R (most nouns will work). Since this data involves educational attainment,  "educ" was chosen as its name. To store the CSV's contents in "educ," use the same command as before (tip: hitting the up arrow scrolls backwards through previously executed commands) but precede it with "educ<-." The "<-" is called the assignment operator. Think of it as a directional arrow that tells R, "this data goes here." The syntax here is "educ<-read.csv("fullfilepathoffile")", then "educ" to ensure it read in correctly.

That Data goes here

Notice, in the above example, that once the CSV is stored as a variable, the variable name can be used to display the variables contents. Just typing the variable name can be a good way to check that the "read" function executed properly.

Other Filetypes

Of course, CSV is far from the only filetype out there for formatting data. R also has commands that automatically read data from tab separated value (TSV) files and HTML tables. R can deal with other filetypes manually using the 'scan' and 'readlines' functions (The R Cookbook, Chapter 4 provides easy to follow instructions for these functions, see the Recommended Resources section of this guide). Functions for easily importing most data formats (including XML and files created with/for SPSS, SAS, and Stata) can also be downloaded through the R project's packages page.  

Scholarly Commons

Scholarly Commons's picture
Scholarly Commons
Contact:
306 Main Library
Drop-ins welcome
Monday-Friday 8:30am-6:00pm
Phone: 217-244-1331
Website