import pandas as pd
NBA_Games=pd.read_csv("../../Data/Week 2/NBA_Games.csv")
NBA_Games.head()
-- To assess the variable type in Python, we use the “dtypes” command.
NBA_Games.dtypes
In data analysis, we often convert categorical variable into dummy variable, if the observation belongs to the specified category, the dummy variable indicating the category would equal to 1, otherwise it equals to 0.
The variable "WL" only carries two values, win or lose. We will create dummy variables to capture the categories.
We can use the “pd.get_dummies” function to convert a categorical variable to dummy variable. This function will also omit any missing value.
dummy=pd.get_dummies(NBA_Games, columns=['WL'])
dummy.columns
Notice that two variables are created, WL_L and WL_W. WL_L=1 if the team lost and WL_L=0 if the team won. The original variable WL is deleted.
NBA_Games=pd.concat([NBA_Games, dummy['WL_W']], axis=1)
NBA_Games.head()
NBA_Games.rename(columns={'WL_W':'WIN'}, inplace=True)
NBA_Games.head()
In sports, we often have to work with date and time data.
NBA_Games['GAME_DATE'].dtype
The date variable is originally stored as an object. In this case, each date is treated equally without ordering.
import datetime
NBA_Games['GAME_DATE']=pd.to_datetime(NBA_Games['GAME_DATE'])
NBA_Games['GAME_DATE'].head()
We can use the “describe()” command to calculate summary statistics. This will return basic summary statistics for all the numerical variables which include the total number of observations (count), the average, standard deviation, min and max, median, and the first and third quartiles of the values of the variable.
NBA_Games.describe()
NBA_Games.describe(include='all')
NBA_Games['PTS'].describe()
NBA_Games['FGM'].mean()
NBA_Games['FGM'].median()
NBA_Games['FGM'].std()
Find the mean of field goals attempted;
Find the median of 3-point field goals made;
Find the standard deviation of the number of rebounds
#Your Code Here
NBA_Games.groupby(['WL']).mean()
NBA_Games.groupby(['WL'])['PTS'].mean()
NBA_Games['GAME_DATE'].describe()
NBA_Games.hist(column='PTS')
NBA_Games.hist(column='PTS', bins=20)
For example, we can narrow the bin to 0.9 width.
NBA_Games.hist(column='PTS', bins=20, rwidth=0.9)
NBA_Games.to_csv("../../Data/Week 2/NBA_Games2.csv", index=False)