# University Library, University of Illinois at Urbana-Champaign

A guide to visualizing your data using four common types of charts.

## Types of data for scatter plots

Scatter plots are excellent for comparing two quantitative variables to see if they correlate.

In the scatter plot below, we can see a positive correlation between car speed and stopping distance. In other words, the faster the car was going, the longer distance it would require to stop.

## Making an excellent scatter plot

Strive for clarity: Each point on a scatter plot represents one data point. You'll want to make sure that these points are big enough to see, but not so big they obscure each other. Unless you have a very good reason to add color, it is probably best to keep a scatter plot grayscale.

Consider a trend line: If the scatter plot shows a linear correlation between the two variables, it can be helpful to include the trend line that summarizes the correlation. This can be useful whether the trend is positive or negative. (Negative in this sense simply means that the correlation is inverse: when one variable increases, the other decreases.)

Consider the scale: Sometimes, your data will demonstrate exponential increases or decreases in value. In this case, it may make sense to draw the scatter plot axes on a logarithmic scale instead of a linear scale.

Below is a scatter plot that compares Gross National Income per capita to life expectancy. Notice that the x-axis, income, is on a logarithmic scale: each increasing tick mark is twice as much as the tick mark before.

Above chart "GNI and Life Expectancy log scale" created by Wikipedia user Ljstalpers under CC-BY-SA 4.0.

## Things to avoid

Spurious correlations: Be careful about conflating correlation and causation. It's tempting to ascribe cause and effect when two variables correlate, but coincidental correlations happen all the time. (You can see some amusing examples at the website Spurious Correlations.) When creating and sharing a scatter plot, be clear about what the correlation (or lack thereof) can suggest, as well as what you don't have data to support.

Too little or too much: Try not to use a scatter plot when you either have very few data points, or a large number of data points. If you have too few points, any trends could show up completely by chance. On the other hand, if you have too many data points they will overlap and prevent your audience from seeing the whole picture.

## Accessibility considerations

Color: One important consideration for accessibility is the use of color. There are several different types of color-blindness, which can affect how well your audience reads your chart. The most common is red-green color-blindness, which means that red and green look very similar. As much as you may want to use green to mean something positive and red to mean something negative, you should pick a different pair of colors instead. The websites ColorBrewer, Contrast-A, and Viz Palette are great tools for identifying colors that work well together and also work for color-blind audiences. You can also check your finished product against a color blindness simulator. You may also want to test how your chart looks in black and white, either by printing it out or using your chart creation tool to transform the colors to grayscale.

Text: It's also important to ensure that any text on your chart is easy to read, which is affected by both the size of the text and the choice of font. Try to avoid "pretty" fonts; it's best to use something sans serif like Ariel or Calibri. When designing your chart, try to keep all text horizontal (nobody wants to have to tilt their head to read). Make sure your chart title and any labels are descriptive and clear.

Embedding an image: If you are embedding the chart as an image in a document or online, you should also include an "alt" tag that describes what the chart shows. If you are able to share the data behind the chart, it is also recommended to provide a descriptive link to the data near the chart image.