Type Python in search box for complete list.
Good for:
Why Python:
Recommended setup(s):
Best resource to learn quickly:
Python Data Analysis on Lynda (access through SSO)
How to load/use a library in Python that isn't loaded by default
# to load libraries you've installed with pip or conda,
# you import them like so:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sys
# note you can choose an alias to refer to a loaded library or module.
# This means we can refer to matplotlib.pyplot as plt when calling functions later.
How to install and load new packages
# We highly recommend using Anaconda python 3.x and the conda environment for data analysis.
# To install a new library in Anaconda, open the Anaconda prompt and activate whatever
# environment you want to use (defualt is 'base'). Then simply call: conda install <library>
# where <library> is the name of the library you want to install.
#from the anaconda prompt, you only need to type:
# conda install wxPython
# OR
# pip install wxPython
# again, conda is preferred because it does some checking to make sure you won't break everything
# by installing something incompatible with whatever else you have installed.
# from inside a jupyter notebook, it's best to use the following syntax:
!conda install --yes --prefix {sys.prefix} wxPython
Collecting package metadata (repodata.json): ...working... done Solving environment: ...working... done ## Package Plan ## environment location: C:\ProgramData\Anaconda3 added / updated specs: - wxpython The following packages will be downloaded: package | build ---------------------------|----------------- conda-4.8.3 | py37hc8dfbb8_1 3.1 MB conda-forge wxpython-4.0.7.post2 | py37h5fe3f0a_3 22.0 MB conda-forge ------------------------------------------------------------ Total: 25.1 MB The following NEW packages will be INSTALLED: python_abi conda-forge/win-64::python_abi-3.7-1_cp37m The following packages will be UPDATED: ca-certificates 2019.11.28-hecc5488_0 --> 2020.6.20-hecda079_0 certifi 2019.11.28-py37_0 --> 2020.6.20-py37hc8dfbb8_0 conda 4.8.2-py37_0 --> 4.8.3-py37hc8dfbb8_1 openssl 1.1.1d-hfa6e2cd_0 --> 1.1.1g-he774522_0 wxpython 4.0.4-py37h6538335_0 --> 4.0.7.post2-py37h5fe3f0a_3 Downloading and Extracting Packages wxpython-4.0.7.post2 | 22.0 MB | | 0% wxpython-4.0.7.post2 | 22.0 MB | | 0% wxpython-4.0.7.post2 | 22.0 MB | 2 | 3% wxpython-4.0.7.post2 | 22.0 MB | 6 | 7% wxpython-4.0.7.post2 | 22.0 MB | #1 | 11% wxpython-4.0.7.post2 | 22.0 MB | #7 | 17% wxpython-4.0.7.post2 | 22.0 MB | ##2 | 23% wxpython-4.0.7.post2 | 22.0 MB | ##7 | 28% wxpython-4.0.7.post2 | 22.0 MB | ###2 | 32% wxpython-4.0.7.post2 | 22.0 MB | ###7 | 38% wxpython-4.0.7.post2 | 22.0 MB | ####2 | 42% wxpython-4.0.7.post2 | 22.0 MB | ####6 | 46% wxpython-4.0.7.post2 | 22.0 MB | #####1 | 52% wxpython-4.0.7.post2 | 22.0 MB | #####6 | 56% wxpython-4.0.7.post2 | 22.0 MB | ###### | 61% wxpython-4.0.7.post2 | 22.0 MB | ######4 | 65% wxpython-4.0.7.post2 | 22.0 MB | ######9 | 69% wxpython-4.0.7.post2 | 22.0 MB | #######4 | 74% wxpython-4.0.7.post2 | 22.0 MB | #######8 | 79% wxpython-4.0.7.post2 | 22.0 MB | ########3 | 83% wxpython-4.0.7.post2 | 22.0 MB | ########7 | 87% wxpython-4.0.7.post2 | 22.0 MB | #########2 | 93% wxpython-4.0.7.post2 | 22.0 MB | #########7 | 98% wxpython-4.0.7.post2 | 22.0 MB | ########## | 100% conda-4.8.3 | 3.1 MB | | 0% conda-4.8.3 | 3.1 MB | #2 | 13% conda-4.8.3 | 3.1 MB | ####8 | 49% conda-4.8.3 | 3.1 MB | #######4 | 74% conda-4.8.3 | 3.1 MB | ########## | 100% Preparing transaction: ...working... done Verifying transaction: ...working... done Executing transaction: ...working... done
How to call functions and classes from other files
# To access other python scripts, you can import them as you would a library.
# For example, if you had a script called other_stuff.py in the same folder,
# you can import it as follows:
import other_stuff
# if there was a function called, do_stuff(a,b) which returned a + (b/2),
# you could now call it like:
a = other_stuff.do_stuff(2,7)
print(a)
5.5
How to open and work with Excel datasets in Python.
# The pandas library is a great package for dealing with raw data.
# It can import from csv or xlsx files directly, and handles
# a lot of the importing and data types for you.
import pandas as pd
data = pd.read_excel('ExampleDataClean.xlsx')
print(data.columns, data)
Index(['Date', 'Location', 'Field1', 'Field2', 'Field3', 'Field4', 'Field5', 'Field6', 'Field7', 'Field8', 'Field9', 'Field10'], dtype='object') Date Location Field1 Field2 Field3 Field4 Field5 \ 0 2020-05-06 US 0.087072 0.466163 0.256585 0.850972 0.024691 1 2020-05-07 US 0.374148 0.596247 0.460567 0.455865 0.473421 2 2020-05-09 MX 0.950790 0.786810 0.375649 0.651548 0.224263 3 2020-05-11 CA 0.956982 0.137989 0.949380 0.251682 0.422171 4 2020-05-13 US 0.275037 0.759614 0.623671 0.096792 0.265659 5 2020-05-15 CA 0.579684 0.597635 0.354101 0.926063 0.061220 6 2020-05-17 US 0.935200 0.917529 0.661647 0.260087 0.040231 7 2020-05-19 MX 0.782576 0.316358 0.387379 0.021592 0.390715 8 2020-05-21 MX 0.283339 0.928038 0.543262 0.318045 0.896379 9 2020-05-23 US 0.357202 0.342773 0.762433 0.097341 0.628032 10 2020-05-25 CA 0.071354 0.107643 0.787911 0.413408 0.876708 11 2020-05-27 CA 0.016469 0.364118 0.303169 0.654925 0.702061 12 2020-05-29 CA 0.844085 0.347905 0.430369 0.789135 0.326151 13 2020-05-31 US 0.621470 0.226082 0.096330 0.699755 0.608162 14 2020-06-02 MX 0.075084 0.893168 0.448265 0.001585 0.092250 15 2020-06-04 US 0.479846 0.586265 0.751123 0.731068 0.320718 16 2020-06-06 US 0.227084 0.188554 0.463362 0.728477 0.220309 17 2020-06-08 US 0.401304 0.951056 0.434225 0.758877 0.387425 18 2020-06-10 US 0.446687 0.553265 0.703386 0.075320 0.572195 Field6 Field7 Field8 Field9 Field10 0 0.170840 0.306948 0.224802 0.227992 0.423540 1 0.037419 0.164160 0.317844 0.357118 0.233253 2 0.364410 0.277153 0.687860 0.586596 0.062513 3 0.782476 0.938242 0.237547 0.456039 0.619374 4 0.174680 0.850576 0.956483 0.336973 0.135804 5 0.301500 0.766449 0.856508 0.824069 0.974019 6 0.086861 0.234838 0.124674 0.453847 0.309176 7 0.835033 0.681286 0.232524 0.841582 0.754590 8 0.957806 0.934876 0.111279 0.977557 0.263052 9 0.125401 0.686620 0.188627 0.035643 0.983478 10 0.379938 0.816157 0.613449 0.978133 0.097426 11 0.658342 0.041644 0.410201 0.881190 0.438716 12 0.398118 0.793277 0.739230 0.561713 0.419796 13 0.007754 0.395009 0.142180 0.111411 0.494485 14 0.170952 0.360447 0.141528 0.524062 0.287555 15 0.210069 0.992240 0.572826 0.811075 0.336953 16 0.300855 0.083729 0.329310 0.069730 0.878799 17 0.187967 0.394746 0.848277 0.216059 0.038225 18 0.774101 0.265008 0.099812 0.121811 0.634335
How to read in CSV files
# In Python, there are many ways to read in a csv file.
# One of the easiest is to use pandas like above.
# you would call read_csv() like so:
import pandas as pd
data = pd.read_csv('ExampleDataClean2.csv')
print(data.columns, data)
# If your data is dirtier than this and contains blank cells, use the keep_default_na parameter to compensate.
Index(['Timestamp', 'Val1', 'Val2', 'Val3'], dtype='object') Timestamp Val1 Val2 Val3 0 15:50:40.94 7741335 pig attached -0.656250 6.699250 1 15:50:41.08 7741335 pig attached -0.562500 6.770688 2 15:50:41.23 7741335 pig attached -1.265625 7.151688 3 15:50:41.38 7741335 pig attached -0.656250 10.191750 4 15:50:41.51 7741335 pig attached 1.062500 12.961937 .. ... ... ... ... 398 15:51:38.99 7741335 pig attached -0.437500 7.381875 399 15:51:39.14 7741335 pig attached 0.500000 6.905625 400 15:51:39.27 7741335 pig attached -1.343750 6.770688 401 15:51:39.41 7741335 pig attached 1.015625 6.667500 402 15:51:39.56 7741335 pig attached -1.000000 6.611937 [403 rows x 4 columns]
How to analyze parts of your data
# Use slicing to look at specific components of your data.
import pandas as pd
import numpy as np
data = pd.read_csv('ExampleDataClean2.csv')
print(data.columns, data)
# Now we'll calculate the min and max of Val2 and Val3. NOTE: Unlike R and Matlab, python is 0 indexed.
# 0 indexing means the first value starts at index 0, not 1.
minVal2, minVal3 = data[['Val2', 'Val3']].min()
maxVal2, maxVal3 = data[['Val2', 'Val3']].max()
print(f'min of Val2: {minVal2}, max of Val2: {maxVal2}\n' +
f'min of Val3: {minVal3}, max of Val3: {maxVal3}.')
# You can also apply your own function to the pandas dataframe like so:
# Let's take the inverse root of the values in the last column:
res = data['Val3'].apply(lambda x: 1/np.sqrt(x) if x !=0 else 0)
print(res)
# For column 3, we'll run into some negatives. we can filter those out in our function
res2 = data['Val2'].apply(lambda x: 1/np.sqrt(x) if x > 0 else -1/np.sqrt(x) if x > 0 else 0)
print(res2)
Index(['Timestamp', 'Val1', 'Val2', 'Val3'], dtype='object') Timestamp Val1 Val2 Val3 0 15:50:40.94 7741335 pig attached -0.656250 6.699250 1 15:50:41.08 7741335 pig attached -0.562500 6.770688 2 15:50:41.23 7741335 pig attached -1.265625 7.151688 3 15:50:41.38 7741335 pig attached -0.656250 10.191750 4 15:50:41.51 7741335 pig attached 1.062500 12.961937 .. ... ... ... ... 398 15:51:38.99 7741335 pig attached -0.437500 7.381875 399 15:51:39.14 7741335 pig attached 0.500000 6.905625 400 15:51:39.27 7741335 pig attached -1.343750 6.770688 401 15:51:39.41 7741335 pig attached 1.015625 6.667500 402 15:51:39.56 7741335 pig attached -1.000000 6.611937 [403 rows x 4 columns] min of Val2: -1.625, max of Val2: 1.1875 min of Val3: 6.3103125, max of Val3: 20.5105. 0 0.386355 1 0.384312 2 0.373935 3 0.313239 4 0.277757 ... 398 0.368058 399 0.380538 400 0.384312 401 0.387274 402 0.388898 Name: Val3, Length: 403, dtype: float64 0 0.000000 1 0.000000 2 0.000000 3 0.000000 4 0.970143 ... 398 0.000000 399 1.414214 400 0.000000 401 0.992278 402 0.000000 Name: Val2, Length: 403, dtype: float64
How to plot data in Python
# There are a lot of great libraries for plotting (matplotlib, bokeh, seaborn, plot.ly, ...)
# Matplotlib is a staple though, and very extensible. We'll use it here.
%matplotlib notebook
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
import datetime
import dateutil
data = pd.read_csv('ExampleDataClean2.csv')
print(data.columns, data)
# we can print directly from pandas like so:
data.plot(use_index=True, y=['Val2', 'Val3'], style='o-')
# we can also extract that data and plot it separately:
xs = data['Timestamp'].tolist()
ys = data['Val3'].tolist()
# convert datetime strings to dates
xs = [dateutil.parser.parse(s)+datetime.timedelta(0,60-np.floor(float((xs[0].split(':')[-1])))) for s in xs]
# set up new plot
fig = plt.figure(2)
ax = fig.add_subplot()
# set x axis to datetime
ax.xaxis_date()
ax.xaxis.set_major_locator(mdates.SecondLocator(interval=5))
# format the date so we don't get super long strings
date_fmt = mdates.DateFormatter('%S.0')
ax.xaxis.set_major_formatter(date_fmt)
# plot the data
ax.plot(xs, ys, linestyle='-', linewidth=1, color='darkgrey')
# do some formatting to clean up the look of the plots
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.xaxis.set_tick_params(top='off', direction='out', width=1, labelsize=10)
ax.yaxis.set_tick_params(right='off', direction='out', width=1, labelsize=10)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
ax.set_xlabel(r'Time, $sec$', fontsize=20)
# display the plot
plt.show()
Index(['Timestamp', 'Val1', 'Val2', 'Val3'], dtype='object') Timestamp Val1 Val2 Val3 0 15:50:40.94 7741335 pig attached -0.656250 6.699250 1 15:50:41.08 7741335 pig attached -0.562500 6.770688 2 15:50:41.23 7741335 pig attached -1.265625 7.151688 3 15:50:41.38 7741335 pig attached -0.656250 10.191750 4 15:50:41.51 7741335 pig attached 1.062500 12.961937 .. ... ... ... ... 398 15:51:38.99 7741335 pig attached -0.437500 7.381875 399 15:51:39.14 7741335 pig attached 0.500000 6.905625 400 15:51:39.27 7741335 pig attached -1.343750 6.770688 401 15:51:39.41 7741335 pig attached 1.015625 6.667500 402 15:51:39.56 7741335 pig attached -1.000000 6.611937 [403 rows x 4 columns]