import pandas as pd
NBA_Games=pd.read_csv("../../Data/Week 2/NBA_Games2.csv")
NBA_Games.head()
We will compare the success rates of two-point field goals and three-point field goals to demonstrate the difference between central tendency and variation.
NBA_Games['FG_PCT'].describe()
NBA_Games['FG3_PCT'].describe()
We can see that the average success rate of 2-point field goals is about 45.27% while the average success rate of 3-point field goals is 35.07%. That means that the overall success rate of 2-point field goals is about 10% higher than the overall success rate of 3-point field goals. The median of 2-point field goal success rate is 45.20%, while the median 3-point field goal success rate is 35.00%. This means half of the teams have 2-point field-goal success rates less than 45% and half of the teams have 3-point field goal success rate of less than 35%.
The standard deviation for 2-point field goal success rate is 0.056, while the standard deviation for 3-point field goal success rate is 0.09956. This means that there is a greater variation in 3-point field goals than 2-point field goals.
The options "sharex" and "sharey" ask if we want to restrict the same range of x and same range of y for the two histograms
NBA_Games.hist(column=['FG_PCT','FG3_PCT'], bins=20, sharex=True, sharey=True)
import matplotlib.pyplot as plt
NBA_Games[['FG_PCT','FG3_PCT']].plot.hist(alpha=0.3, bins=20)
plt.xlabel('Field Goal Percentage')
plt.ylabel('Frequency')
plt.title("Distributions of Field Goal Percentages", fontsize=15)
plt.savefig('FG_PCT_Distributions.png')
We can also change the colors of the graphs using the "color" option
NBA_Games.hist(by='WL', column='FG_PCT', color='red', bins=15, sharex=True, sharey=True)
plt.savefig('FG_PCT_WL.png')
#Your Code Here
Let's first change the data type of "GAME_DATE" from object to datetime.
import datetime
NBA_Games['GAME_DATE']=pd.to_datetime(NBA_Games['GAME_DATE'])
NBA_Games['GAME_DATE'].head()
Extract Pistons' game data in the 2017-2018 season.
Note that for date variable, we can use the >, =, < operators. When we specify the condition of the date, we need to use ""
Pistons_Games=NBA_Games[(NBA_Games.NICKNAME == 'Pistons')&(NBA_Games.SEASON_ID==22017)& (NBA_Games.GAME_DATE>='2017-10-17')]
display(Pistons_Games)
Pistons_Games.plot(x='GAME_DATE', y='PTS')
plt.savefig('PISTONS_PTS_TIME.png')
#Your Code Here
We can create a scatter plot using the "plot.scatter" function with the number of assists in the horizontal axis and the number of field goals made in the vertical axis.
NBA_Games.plot.scatter(x='AST', y='FGM')
import seaborn as sns
sns.regplot(x='AST', y='FGM', data=NBA_Games, marker='.')
plt.xlabel('Assists')
plt.ylabel('Field Goals Made')
plt.title("Relationship between the Numbers of Assists and Field Goals Made", fontsize=15)
As we can see from the graph, as the number of assists increase, the number of field goals made also increases. In this case, we say there is a positive relationship between the two variables, or a positive correlation.
NBA_Games['AST'].corr(NBA_Games['FGM'])
The correlation coefficient between the number of assist and field goal made is 0.70 so there is a positive correlation between the two.
sns.regplot(x='AST', y='FGA', data=NBA_Games, marker='.')
plt.xlabel('Assists')
plt.ylabel('Field Goals Attempted')
plt.title("Relationship between the Numbers of Assists and Field Goals Attempted", fontsize=15)
NBA_Games['AST'].corr(NBA_Games['FGA'])
Both the graph and the correlation coefficient suggest that there is only a slight positive relationship between the two.
In this case, we can use lmplot() instead of regplot().
sns.lmplot(x='AST', y='FGA', hue='WL', data=NBA_Games)
plt.xlabel('Assists')
plt.ylabel('Field Goals Made')
plt.title("Relationship between the Numbers of Assists and Field Goals Made", fontsize=15)
We will specify the method to be pearson.
NBA_Games.corr(method='pearson')