where expectation is taken w.r.t. all possible datasets, $\mathbb{E}[f(S)]=\mathbb{E}_S[f(S)]=\int f(S)p(S)dS$.
(Example that follows is inspired by Yaser Abu-Mostafa's CS 156 Lecture titled "Bias-Variance Tradeoff"
# polyfit_sin() generates 5 samples of the form (x,y) where y=sin(pi*x)
# then it tries to fit a degree=0 polynomial (i.e. a constant func.) to the data
# Ignore return values for now, we will return to these later
_, _, _, _ = polyfit_sin(degree=0, iterations=1, num_points=5, show=True)
# Estimate two points of sin(pi * x) with a constant 5 times
_, _, _, _ = polyfit_sin(degree=0, iterations=5)
# Estimate two points of sin(pi * x) with a constant 100 times
_, _, _, _ = polyfit_sin(degree=0, iterations=100)
MSE, errs, mean_coeffs, coeffs_list = polyfit_sin(degree=0, iterations=100, num_points=3, show=False)
biases, variances = calculate_bias_variance(coeffs_list,RANGEXS,TRUEYS)
plot_bias_and_variance(biases,variances,RANGEXS,TRUEYS,np.polyval(np.poly1d(mean_coeffs), RANGEXS))
poly_degree, results_list = 0, []
MSE, errs, mean_coeffs, coeffs_list = polyfit_sin(poly_degree, 500,num_points = 5,show=False)
biases, variances = calculate_bias_variance(coeffs_list,RANGEXS,TRUEYS)
sns.barplot(x='type', y='error',hue='poly_degree', data=pd.DataFrame([{'error':np.mean(biases), 'type':'bias','poly_degree':0},
{'error':np.mean(variances), 'type':'variance','poly_degree':0}]))
<AxesSubplot:xlabel='type', ylabel='error'>
MSE, _, _, _ = polyfit_sin(degree=3, iterations=1)
_, _, _, _ = polyfit_sin(degree=3,iterations=5,num_points=5,show=True)
# Estimate two points of sin(pi * x) with a line 50 times
_, _, _, _ = polyfit_sin(degree=3, iterations=50)
MSE, errs, mean_coeffs, coeffs_list = polyfit_sin(3,500,show=False)
biases, variances = calculate_bias_variance(coeffs_list,RANGEXS,TRUEYS)
plot_bias_and_variance(biases,variances,RANGEXS,TRUEYS,np.polyval(np.poly1d(mean_coeffs), RANGEXS))
sns.barplot(x='type', y='error',hue='poly_degree',data=pd.DataFrame(results_list))
<AxesSubplot:xlabel='type', ylabel='error'>
Choosing the model with the right complexity
For a model $M(\alpha)$ with complexity parameter $\alpha$
Repeat for different $\alpha$'s and find the one with minimum error
AKA in deep learning: https://openai.com/blog/deep-double-descent/
A simple experiment: Consider a quantity $y$ and its measurements (i.e. features) $x_1,x_2,\cdots,x_F$
We will fit a series of models for $y$ using 100 samples with increasing number of features $M$.
Intuition:
But is this the case?
f = plt.figure(figsize=(6,4))
plt.plot(nins, ets, 'b-', label='Training')
plt.plot(nins, ess, 'r-', label='Test')
plt.plot([nfea, nfea], [0,1], 'k--')
plt.legend()
plt.ylim([0,1])
plt.xlabel('# of features')
plt.ylabel('Average error')
Text(0, 0.5, 'Average error')
In fact, beyond the Bias-Variance curve, the test error still drops, after a short increase.