Skip to Main Content

University Library

LibGuides

Basic Data Analysis in Python

This guide will go over how to utilize the Python programming language for basic data analysis.

Quick Note on Formatting

Python projects typically begin with a "docstring." This is a statement of purpose that appears on the very first line of the project and is about a sentence in length. It is enclosed by three sets of quotation marks. After the docstring, there are two blank lines to start the actual code. Check out the image below for an example. 

Importing our Package and Creating our Function

With all of that out of the way, we can begin creating the code for the data! At the end of each section, there will be an image showing what the final product should look like; check your work against it before moving on! Keep in mind that your exact results will likely be different if you are creating your own list.

First things first, we need to important the statistics package to be able to use the calculations we are looking for. There are multiple ways to do this. You can import the specific commands calling for them. The construction for this would be "from statistics import mode, median, mean." You can also import the entire package by simply typing "import statistics." Doing it this way is a little more complex. We will be exploring both ways in this LibGuide; for now, import the specific functions using the first construction and hit enter on your keyboard three times to create two blank lines.

From here, we need to create our main function. It is good programming practice to place all of your code into functions for readability. While you can subdivide your code into multiple functions, we will only be using one. To create a function, all you need to do is write "def" followed by a word, parentheses, and a colon. So since this is our main function, we can just write "def main():" and this will creature our function.

Mode

And now we can put our actual code in! For this set of code, we'll create a list of numbers. If you hit enter after the function, PyCharm will automatically indent your cursor. From here, write in a name for your list (try to avoid calling it just "list" though!). Type a space, an equals sign (=) and then another space. In Python, lists are indicated by a pair of square brackets: []. If you type just the left one on your keyboard, then Python will automatically recognize that you are attempting to create a list. Inside the brackets, write in whatever set of numbers you want, separating each number with a comma and a space. We will be using the same list for all of these exercises.

With all of this out of the way, the actual data analysis part is very simple. Since we imported the "mode" function, all we need to do is type in mode. As you type it, you should receive a pop-up on PyCharm showing the function; hit enter, and it will autofill it. If not, finish typing mode and place parentheses after it since we need to tell it to analysis our list. To do this, just type whatever you named your list into the parentheses. PyCharm should try to autofill it as well! Try to run it by clicking the green play button in the upper right corner, and you get...nothing. This is because we have not told Python to tell us anything nor have we told it to run the "main" function.

This is fixed by simply placing another set of parentheses around the "mode" command (including before the word "mode") and typing the word "print" in front of/to the left of it. Afterwards, hit enter three times to create another set of three blank lines, hit backspace to close the indentation, and type in "main()". We are just telling Python to run the function that we have named "main()," so it is important that we do not place another colon. We are not defining it again. 

Hit run again, and you should see your number pop up towards the bottom of your screen! 

 

Median

Median follows the exact same format as the mode function. Hit enter at the end of the mode function to create a new blank line. From here, copy the same construction of the code, but replace "mode" with "median." Make sure to hit enter at the end of the code to create a new blank line! The call to main needs to have two blank lines before it!

Now, your results should show two numbers, with your median placed right below the mode.

Mean

Just like median and mode, we follow the same construction here, just replacing the "median"/"mode" with "mean." This will give you a third number underneath the under two.