HW5 (due Monday 10/10 noon)
Contents
HW5 (due Monday 10/10 noon)#
Dow chemical process [100 pts]#
We’re going to use the same Dow process dataset that you used for the last homework and in class. You’re going to use what we learned this week to fit neural networks to the dataset, and use dimensionality reduction to improve your previous fits. I think this homework should take less time than normal since I know you all are very busy with your projects!
We’ll use the same code to load the dataset.
import pandas as pd
import numpy as np
df = pd.read_excel('impurity_dataset-training.xlsx')
def is_real_and_finite(x):
if not np.isreal(x):
return False
elif not np.isfinite(x):
return False
else:
return True
all_data = df[df.columns[1:]].values #drop the first column (date)
numeric_map = df[df.columns[1:]].applymap(is_real_and_finite)
real_rows = numeric_map.all(axis=1).copy().values #True if all values in a row are real numbers
X = np.array(all_data[real_rows,:-5], dtype='float') #drop the last 5 cols that are not inputs
y = np.array(all_data[real_rows,-3], dtype='float')
Train/validation/test split (same as HW4, copy/paste from solutions if you want)#
Split the dataset into an 80/10/10 train/val/test split.
Supervised regression with PCA with two components#
Use PCA like we did in class to generate the first two principal components using the training dataset (X_train
). Use these as features as inputs for your best model from HW4. Calculate the validation error, and compare to your results from HW4.
Tip
If you’re not sure which model to use, you’re welcome to use the HW4 solutions which will be posted on Wednesday!), or just use the random forest regressor from sklearn.
Supervised regression with PCA with multiple components#
Try varying the number of components in PCA from 1 to 10. What’s the best validation error you can achieve? Make a plot of validation error (y) vs the number of components (x).
Neural network with three PCA components#
In class we saw that three PCA components explained most of the data. Generate the first three PCA components, and fit a neural network using MLPRegressor with 2 layers of 10 hidden nodes each. Report your validation error.
Tip
The MLPRegressor documentation is your friend!
Varying neural network choices#
Try varying the number of hidden nodes, the number of layers, and the activation function. Describe the effects you see.
What’s the best validation error you can achieve?
Polynomial features with LASSO#
Using polynomials up to second order, fit a LASSO model. Print the validation MAE and make a parity plot for your model compared to the experiments!