sasamoto.blogg.se - An introduction to statistical learning [df

#An introduction to statistical learning [df full#

Sol: From the plot, it is evident that displacement, weight and horsepower can play a significant role in the prediction of mpg. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer. (f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Sns.pairplot(auto, vars=, hue='cylinders') Vehicles are somehow distinguishable by origin as well. Similarly, the relationships of horsepower, weight and displacement with all the other variables follow a trend. The relationship of mpg with displacement, weight and horsepower is somewhat predictable. It is evident that vehicles with higher number of cylinders have higher displacement, weight and horsepower, while lower acceleration and mpg. Sol: Two scatterplots of all the quantitative variables segregated by cylinders and origin is shown below.

#An introduction to statistical learning [df full#

(e) Using the full data set, investigate the predictors graphically, using scatterplots or other tools of your choice. © What is the mean and standard deviation of each quantitative predictor? scribe()].loc] Print("Range of mpg: " + str(auto.min()) + " - " + str(auto.max())) Print("Range of acceleration: " + str(auto.min()) + " - " + str(auto.max())) Print("Range of horsepower: " + str(auto.min()) + " - " + str(auto.max())) Print("Range of weight: " + str(auto.min()) + " - " + str(auto.max())) (b) What is the range of each quantitative predictor? print("Range of displacement: " + str(auto.min()) + " - " + str(auto.max())) Sol: Quantitative: displacement, weight, horsepower, acceleration, mpg (a) Which of the predictors are quantitative, and which are qualitative? auto = pd.read_csv("data/Auto.csv")Īuto = auto.astype(int) Make sure that the missing values have been removed from the data. This exercise involves the Auto data set studied in the lab. Plt.tight_layout() #Stop subplots from overlapping Sns.distplot(college, bins=20, kde=False, color='yellow', hist_kws=dict(edgecolor='black', linewidth=1))Īx.set_xlabel('Percent of alumni who donate')Īx.set_title('Percent of alumni who donate') Sns.distplot(college, bins=20, kde=False, color='blue', hist_kws=dict(edgecolor='black', linewidth=1)) Sns.distplot(college, bins=20, kde=False, color='green', hist_kws=dict(edgecolor='black', linewidth=1))Īx.set_xlabel('Percent of faculty with Ph.D.’s')Īx.set_title('Percent of faculty with Ph.D.’s') Sns.distplot(college, bins=20, kde=False, color='r', hist_kws=dict(edgecolor='black', linewidth=1)) # produce some histograms with differing numbers of bins for a few of the quantitative vari- ables. Sns.boxplot(x="Elite", y="Outstate", data=college) Print("Number of elite universities are: " +str(college.sum())) # Create a new qualitative variable, called Elite, by binning the Top10perc variable.Ĭollege.loc > 50, 'Elite'] = 1 Sns.boxplot(x="Private", y="Outstate", data=college)Īx.set_ylabel('Outstate Tution (in USD)')Īx.set_title('Outstate Tution vs University Type') # produce side-by-side boxplots of Outstate versus Private. # produce a scatterplot matrix of the first ten columns or variables of the data.