LinkedIn Learning | Log In with UIUC
Type Python in search box for complete list.
Good for:
Why Python:
Recommended setup(s):
Best resource to learn quickly:
Python Data Analysis on LinkedIn Learning (access through SSO)
How to load/use a library in Python that isn't loaded by default
# to load libraries you've installed with pip or conda,
# you import them like so:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sys
# note you can choose an alias to refer to a loaded library or module.
# This means we can refer to matplotlib.pyplot as plt when calling functions later.
How to install and load new packages
# We highly recommend using Anaconda python 3.x and the conda environment for data analysis.
# To install a new library in Anaconda, open the Anaconda prompt and activate whatever
# environment you want to use (defualt is 'base'). Then simply call: conda install <library>
# where <library> is the name of the library you want to install.
#from the anaconda prompt, you only need to type:
# conda install wxPython
# OR
# pip install wxPython
# again, conda is preferred because it does some checking to make sure you won't break everything
# by installing something incompatible with whatever else you have installed.
# from inside a jupyter notebook, it's best to use the following syntax:
!conda install --yes --prefix {sys.prefix} wxPython
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: C:\ProgramData\Anaconda3
added / updated specs:
- wxpython
The following packages will be downloaded:
package | build
---------------------------|-----------------
conda-4.8.3 | py37hc8dfbb8_1 3.1 MB conda-forge
wxpython-4.0.7.post2 | py37h5fe3f0a_3 22.0 MB conda-forge
------------------------------------------------------------
Total: 25.1 MB
The following NEW packages will be INSTALLED:
python_abi conda-forge/win-64::python_abi-3.7-1_cp37m
The following packages will be UPDATED:
ca-certificates 2019.11.28-hecc5488_0 --> 2020.6.20-hecda079_0
certifi 2019.11.28-py37_0 --> 2020.6.20-py37hc8dfbb8_0
conda 4.8.2-py37_0 --> 4.8.3-py37hc8dfbb8_1
openssl 1.1.1d-hfa6e2cd_0 --> 1.1.1g-he774522_0
wxpython 4.0.4-py37h6538335_0 --> 4.0.7.post2-py37h5fe3f0a_3
Downloading and Extracting Packages
wxpython-4.0.7.post2 | 22.0 MB | | 0%
wxpython-4.0.7.post2 | 22.0 MB | | 0%
wxpython-4.0.7.post2 | 22.0 MB | 2 | 3%
wxpython-4.0.7.post2 | 22.0 MB | 6 | 7%
wxpython-4.0.7.post2 | 22.0 MB | #1 | 11%
wxpython-4.0.7.post2 | 22.0 MB | #7 | 17%
wxpython-4.0.7.post2 | 22.0 MB | ##2 | 23%
wxpython-4.0.7.post2 | 22.0 MB | ##7 | 28%
wxpython-4.0.7.post2 | 22.0 MB | ###2 | 32%
wxpython-4.0.7.post2 | 22.0 MB | ###7 | 38%
wxpython-4.0.7.post2 | 22.0 MB | ####2 | 42%
wxpython-4.0.7.post2 | 22.0 MB | ####6 | 46%
wxpython-4.0.7.post2 | 22.0 MB | #####1 | 52%
wxpython-4.0.7.post2 | 22.0 MB | #####6 | 56%
wxpython-4.0.7.post2 | 22.0 MB | ###### | 61%
wxpython-4.0.7.post2 | 22.0 MB | ######4 | 65%
wxpython-4.0.7.post2 | 22.0 MB | ######9 | 69%
wxpython-4.0.7.post2 | 22.0 MB | #######4 | 74%
wxpython-4.0.7.post2 | 22.0 MB | #######8 | 79%
wxpython-4.0.7.post2 | 22.0 MB | ########3 | 83%
wxpython-4.0.7.post2 | 22.0 MB | ########7 | 87%
wxpython-4.0.7.post2 | 22.0 MB | #########2 | 93%
wxpython-4.0.7.post2 | 22.0 MB | #########7 | 98%
wxpython-4.0.7.post2 | 22.0 MB | ########## | 100%
conda-4.8.3 | 3.1 MB | | 0%
conda-4.8.3 | 3.1 MB | #2 | 13%
conda-4.8.3 | 3.1 MB | ####8 | 49%
conda-4.8.3 | 3.1 MB | #######4 | 74%
conda-4.8.3 | 3.1 MB | ########## | 100%
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
How to call functions and classes from other files
# To access other python scripts, you can import them as you would a library.
# For example, if you had a script called other_stuff.py in the same folder,
# you can import it as follows:
import other_stuff
# if there was a function called, do_stuff(a,b) which returned a + (b/2),
# you could now call it like:
a = other_stuff.do_stuff(2,7)
print(a)
5.5
How to open and work with Excel datasets in Python.
# The pandas library is a great package for dealing with raw data.
# It can import from csv or xlsx files directly, and handles
# a lot of the importing and data types for you.
import pandas as pd
data = pd.read_excel('ExampleDataClean.xlsx')
print(data.columns, data)
Index(['Date', 'Location', 'Field1', 'Field2', 'Field3', 'Field4', 'Field5',
'Field6', 'Field7', 'Field8', 'Field9', 'Field10'],
dtype='object') Date Location Field1 Field2 Field3 Field4 Field5 \
0 2020-05-06 US 0.087072 0.466163 0.256585 0.850972 0.024691
1 2020-05-07 US 0.374148 0.596247 0.460567 0.455865 0.473421
2 2020-05-09 MX 0.950790 0.786810 0.375649 0.651548 0.224263
3 2020-05-11 CA 0.956982 0.137989 0.949380 0.251682 0.422171
4 2020-05-13 US 0.275037 0.759614 0.623671 0.096792 0.265659
5 2020-05-15 CA 0.579684 0.597635 0.354101 0.926063 0.061220
6 2020-05-17 US 0.935200 0.917529 0.661647 0.260087 0.040231
7 2020-05-19 MX 0.782576 0.316358 0.387379 0.021592 0.390715
8 2020-05-21 MX 0.283339 0.928038 0.543262 0.318045 0.896379
9 2020-05-23 US 0.357202 0.342773 0.762433 0.097341 0.628032
10 2020-05-25 CA 0.071354 0.107643 0.787911 0.413408 0.876708
11 2020-05-27 CA 0.016469 0.364118 0.303169 0.654925 0.702061
12 2020-05-29 CA 0.844085 0.347905 0.430369 0.789135 0.326151
13 2020-05-31 US 0.621470 0.226082 0.096330 0.699755 0.608162
14 2020-06-02 MX 0.075084 0.893168 0.448265 0.001585 0.092250
15 2020-06-04 US 0.479846 0.586265 0.751123 0.731068 0.320718
16 2020-06-06 US 0.227084 0.188554 0.463362 0.728477 0.220309
17 2020-06-08 US 0.401304 0.951056 0.434225 0.758877 0.387425
18 2020-06-10 US 0.446687 0.553265 0.703386 0.075320 0.572195
Field6 Field7 Field8 Field9 Field10
0 0.170840 0.306948 0.224802 0.227992 0.423540
1 0.037419 0.164160 0.317844 0.357118 0.233253
2 0.364410 0.277153 0.687860 0.586596 0.062513
3 0.782476 0.938242 0.237547 0.456039 0.619374
4 0.174680 0.850576 0.956483 0.336973 0.135804
5 0.301500 0.766449 0.856508 0.824069 0.974019
6 0.086861 0.234838 0.124674 0.453847 0.309176
7 0.835033 0.681286 0.232524 0.841582 0.754590
8 0.957806 0.934876 0.111279 0.977557 0.263052
9 0.125401 0.686620 0.188627 0.035643 0.983478
10 0.379938 0.816157 0.613449 0.978133 0.097426
11 0.658342 0.041644 0.410201 0.881190 0.438716
12 0.398118 0.793277 0.739230 0.561713 0.419796
13 0.007754 0.395009 0.142180 0.111411 0.494485
14 0.170952 0.360447 0.141528 0.524062 0.287555
15 0.210069 0.992240 0.572826 0.811075 0.336953
16 0.300855 0.083729 0.329310 0.069730 0.878799
17 0.187967 0.394746 0.848277 0.216059 0.038225
18 0.774101 0.265008 0.099812 0.121811 0.634335
How to read in CSV files
# In Python, there are many ways to read in a csv file.
# One of the easiest is to use pandas like above.
# you would call read_csv() like so:
import pandas as pd
data = pd.read_csv('ExampleDataClean2.csv')
print(data.columns, data)
# If your data is dirtier than this and contains blank cells, use the keep_default_na parameter to compensate.
Index(['Timestamp', 'Val1', 'Val2', 'Val3'], dtype='object') Timestamp Val1 Val2 Val3 0 15:50:40.94 7741335 pig attached -0.656250 6.699250 1 15:50:41.08 7741335 pig attached -0.562500 6.770688 2 15:50:41.23 7741335 pig attached -1.265625 7.151688 3 15:50:41.38 7741335 pig attached -0.656250 10.191750 4 15:50:41.51 7741335 pig attached 1.062500 12.961937 .. ... ... ... ... 398 15:51:38.99 7741335 pig attached -0.437500 7.381875 399 15:51:39.14 7741335 pig attached 0.500000 6.905625 400 15:51:39.27 7741335 pig attached -1.343750 6.770688 401 15:51:39.41 7741335 pig attached 1.015625 6.667500 402 15:51:39.56 7741335 pig attached -1.000000 6.611937 [403 rows x 4 columns]
How to analyze parts of your data
# Use slicing to look at specific components of your data.
import pandas as pd
import numpy as np
data = pd.read_csv('ExampleDataClean2.csv')
print(data.columns, data)
# Now we'll calculate the min and max of Val2 and Val3. NOTE: Unlike R and Matlab, python is 0 indexed.
# 0 indexing means the first value starts at index 0, not 1.
minVal2, minVal3 = data[['Val2', 'Val3']].min()
maxVal2, maxVal3 = data[['Val2', 'Val3']].max()
print(f'min of Val2: {minVal2}, max of Val2: {maxVal2}\n' +
f'min of Val3: {minVal3}, max of Val3: {maxVal3}.')
# You can also apply your own function to the pandas dataframe like so:
# Let's take the inverse root of the values in the last column:
res = data['Val3'].apply(lambda x: 1/np.sqrt(x) if x !=0 else 0)
print(res)
# For column 3, we'll run into some negatives. we can filter those out in our function
res2 = data['Val2'].apply(lambda x: 1/np.sqrt(x) if x > 0 else -1/np.sqrt(x) if x > 0 else 0)
print(res2)
Index(['Timestamp', 'Val1', 'Val2', 'Val3'], dtype='object') Timestamp Val1 Val2 Val3
0 15:50:40.94 7741335 pig attached -0.656250 6.699250
1 15:50:41.08 7741335 pig attached -0.562500 6.770688
2 15:50:41.23 7741335 pig attached -1.265625 7.151688
3 15:50:41.38 7741335 pig attached -0.656250 10.191750
4 15:50:41.51 7741335 pig attached 1.062500 12.961937
.. ... ... ... ...
398 15:51:38.99 7741335 pig attached -0.437500 7.381875
399 15:51:39.14 7741335 pig attached 0.500000 6.905625
400 15:51:39.27 7741335 pig attached -1.343750 6.770688
401 15:51:39.41 7741335 pig attached 1.015625 6.667500
402 15:51:39.56 7741335 pig attached -1.000000 6.611937
[403 rows x 4 columns]
min of Val2: -1.625, max of Val2: 1.1875
min of Val3: 6.3103125, max of Val3: 20.5105.
0 0.386355
1 0.384312
2 0.373935
3 0.313239
4 0.277757
...
398 0.368058
399 0.380538
400 0.384312
401 0.387274
402 0.388898
Name: Val3, Length: 403, dtype: float64
0 0.000000
1 0.000000
2 0.000000
3 0.000000
4 0.970143
...
398 0.000000
399 1.414214
400 0.000000
401 0.992278
402 0.000000
Name: Val2, Length: 403, dtype: float64
How to plot data in Python
# There are a lot of great libraries for plotting (matplotlib, bokeh, seaborn, plot.ly, ...)
# Matplotlib is a staple though, and very extensible. We'll use it here.
%matplotlib notebook
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
import datetime
import dateutil
data = pd.read_csv('ExampleDataClean2.csv')
print(data.columns, data)
# we can print directly from pandas like so:
data.plot(use_index=True, y=['Val2', 'Val3'], style='o-')
# we can also extract that data and plot it separately:
xs = data['Timestamp'].tolist()
ys = data['Val3'].tolist()
# convert datetime strings to dates
xs = [dateutil.parser.parse(s)+datetime.timedelta(0,60-np.floor(float((xs[0].split(':')[-1])))) for s in xs]
# set up new plot
fig = plt.figure(2)
ax = fig.add_subplot()
# set x axis to datetime
ax.xaxis_date()
ax.xaxis.set_major_locator(mdates.SecondLocator(interval=5))
# format the date so we don't get super long strings
date_fmt = mdates.DateFormatter('%S.0')
ax.xaxis.set_major_formatter(date_fmt)
# plot the data
ax.plot(xs, ys, linestyle='-', linewidth=1, color='darkgrey')
# do some formatting to clean up the look of the plots
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.xaxis.set_tick_params(top='off', direction='out', width=1, labelsize=10)
ax.yaxis.set_tick_params(right='off', direction='out', width=1, labelsize=10)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
ax.set_xlabel(r'Time, $sec$', fontsize=20)
# display the plot
plt.show()
Index(['Timestamp', 'Val1', 'Val2', 'Val3'], dtype='object') Timestamp Val1 Val2 Val3 0 15:50:40.94 7741335 pig attached -0.656250 6.699250 1 15:50:41.08 7741335 pig attached -0.562500 6.770688 2 15:50:41.23 7741335 pig attached -1.265625 7.151688 3 15:50:41.38 7741335 pig attached -0.656250 10.191750 4 15:50:41.51 7741335 pig attached 1.062500 12.961937 .. ... ... ... ... 398 15:51:38.99 7741335 pig attached -0.437500 7.381875 399 15:51:39.14 7741335 pig attached 0.500000 6.905625 400 15:51:39.27 7741335 pig attached -1.343750 6.770688 401 15:51:39.41 7741335 pig attached 1.015625 6.667500 402 15:51:39.56 7741335 pig attached -1.000000 6.611937 [403 rows x 4 columns]