NOTE: This is a lab project accompanying the following book [MLF] and it should be used together with the book.
[MLF] H. Jiang, "Machine Learning Fundamentals: A Concise Introduction", Cambridge University Press, 2021. (bibtex)
The purpose of this lab is to explore more complex structures in neural networks beyond simple fully-connected networks. In particular, we focus on deep convolutional nerual networks (CNNs) for image classification as CNNs have become the dominant model for many computer vision tasks. Instead of implementing CNNs from scratch as what has been done in the previous Labs, we introduce some popular deep learning toolkits, such as Tensorflow and Pytorch, and use some examples to show how to use these toolkits to conveniently build various CNN structures and efficiently train/evaluate them with available training/test data.
Prerequisites: N/A
The most important feature in these popular deep learning toolkits (either Tensorflow or Pytorch) is to provide some flexible ways for us to specify various networks structures. These toolkits usually come up with many different syntaxes from various levels for this purpose. Some low-level syntaxes allow us to conveniently customize neural networks in any way we prefer while other high-level syntaxes offer legible and flexible interfaces to configure popular network structures in the literature. These toolkits allow us to directly use many popular building blocks introduced in [MLF] without reinventing the wheel, such as full connection, convolution, activation, softmax, attension, feedback and normalization layers. On the other hand, it also provides nice interfaces for us to implement any new modules.
Another advantage to use these toolkits is that they come up with automatic differentiation (AD) module so that we do not need to explicitly implement error back-propagation. The learning process is almost totally automatic as long as we specify some key ingredients, such as a loss function, an optimization algorithm and relevant hyperparameters. Finally, these toolkits also provide a full support to allow us to flexibly switch hardware devices between CPUs, GPUs and even TPUs for the training/testing processes.
In this Lab, we only introduce how to use the high-level Keras style syntax to build deep convolutional neural networks for image classification tasks. When we use the Keras interface to build any complex neural networks, it usually consists of the following three steps:
Define: we use some highly legible syntax to clearly define the structure of neural networks in a layer by layer manner. In this step, we need to specify all network details in a static structure.
Compile: we compile the previously defined static network by associating it with some dynamic components, such as a loss function, an optimizer along with its hyperparameters, a hardware device to be used (CPUs or GPUs), an evaluation matric, etc.
Fit: we fit the compiled model to the available training data (as well as the corresponding target labels). It will run the specified optimizer and use the automatically derived gradients from AD to learn the model on the specified hardware device.
In the following, we will use several examples to show how to do these three steps for convolutional neural networks using Tensorflow and Pytorch.
Use Tensorflow to re-implement the fully-connected neural networks and compare it with various implementations in last Lab in terms of classification accuracy and running speed.
Here we can use an integer (between 0 and 9) as the target label for each image. For this case, we need to specify the CE loss function as "sparse_categorical_crossentropy" in Tensorflow. If we use the one-hot vector as the target label for each image, we need to specify the CE loss function as "categorical_crossentropy" in Tensorflow. Note that Tensorflow uses GPUs by default as long as GPUs are available.
!pip install fsspec
!pip install -U -q datasets
#load MINST images
from datasets import load_dataset
import numpy as np
trainset = load_dataset('mnist', split='train')
train_data = trainset['image']
train_label = trainset['label']
testset = load_dataset('mnist', split='test')
test_data = testset['image']
test_label = testset['label']
train_data = np.array(train_data, dtype='float')/255 # norm to [0,1]
train_data = np.reshape(train_data,(60000,28*28))
y_train = np.array(train_label, dtype='short')
test_data = np.array(test_data, dtype='float')/255 # norm to [0,1]
test_data = np.reshape(test_data,(10000,28*28))
y_test = np.array(test_label, dtype='short')
#reshape each input vector (784) into a 28*28*1 image
X_train = np.reshape(train_data, (-1,28,28,1))
X_test = np.reshape(test_data, (-1,28,28,1))
# convert MNIST labels into 10-D one-hot vectors
Y_train = np.zeros((y_train.size, y_train.max()+1))
Y_train[np.arange(y_train.size),y_train] = 1
Y_test = np.zeros((y_test.size, y_test.max()+1))
Y_test[np.arange(y_test.size),y_test] = 1
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, Y_train.shape, Y_test.shape)
# use tensorflow to implement a fully-connected neural networks (same structure as Lab5)
#
# use integers as target labels and specify CE loss as "sparse_categorical_crossentropy"
import numpy as np
import tensorflow as tf
from tensorflow import keras
tf.random.set_seed(42)
np.random.seed(42)
# define the model structure using Keras
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28,28]),
keras.layers.Dense(500, activation="relu"),
keras.layers.Dense(250, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
# compile model by attaching with loss/optimizer/metric
model.compile(loss="sparse_categorical_crossentropy", # CE loss for integer label
optimizer=keras.optimizers.SGD(learning_rate=1e-1),
metrics=["accuracy"])
# fit to training data to learn the model
history = model.fit(X_train, y_train, epochs=10, # y_train: integer labels
validation_data=(X_test, y_test)) # y_test: integer labels
# use tensorflow to implement a fully-connected neural networks
# use one-hot target labels and specify CE loss as "categorical_crossentropy"
import numpy as np
import tensorflow as tf
from tensorflow import keras
tf.random.set_seed(42)
np.random.seed(42)
# define the model structure using Keras (same network structure as Lab5)
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28,28]),
keras.layers.Dense(500, activation="relu"),
keras.layers.Dense(250, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
# compile model by attaching with loss/optimizer/metric
model.compile(loss="categorical_crossentropy", # CE loss for one-hot vector label
optimizer=keras.optimizers.SGD(learning_rate=1e-1),
metrics=["accuracy"])
# fit to training data to learn the model
history = model.fit(X_train, Y_train, epochs=10, # Y_train: one-hot vector labels
validation_data=(X_test, Y_test)) # Y_test: one-hot vector labels
# show the learning curves
import pandas as pd
import matplotlib.pyplot as plt
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1)
plt.show()
model.summary()
# show the GPU type used in the above computation
!nvidia-smi
Use Tensorflow to implement the convolutional neural networks as structrued on page 200, and evaluate its performance using the MNIST data set and compare it with the fully-connected neural networks in the previous example.
# use tensorflow to implement a convolutional neural network on page 200
import numpy as np
import tensorflow as tf
from tensorflow import keras
tf.random.set_seed(42)
np.random.seed(42)
# define the model structure using Keras
model = keras.models.Sequential([
keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', \
padding='same', input_shape=[28, 28, 1]),
keras.layers.Conv2D(filters=64, kernel_size=3, activation='relu', padding='same'),
keras.layers.Conv2D(filters=64, kernel_size=3, activation='relu', padding='same'),
keras.layers.MaxPooling2D(pool_size=2),
keras.layers.Flatten(),
keras.layers.Dense(units=7744, activation='relu'),
keras.layers.Dense(units=128, activation='relu'),
keras.layers.Dense(units=10, activation='softmax'),
])
# compile model by attaching loss/optimizer/metric components
model.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.SGD(learning_rate=3e-2),
metrics=["accuracy"])
# learning a model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test))
# show the learning curves
import pandas as pd
import matplotlib.pyplot as plt
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1)
plt.show()
From the above results, we can see that this simple CNN yields better performance than FCNNs as its best classification accuracy on the test set is 99.13%.
In the above implementation, padding='same' indciates that proper zero-paddings are added prior to convolution so that the generated outputs have the same dimensions as the inputs. This is clear from the following model summary:
model.summary()
Use Tensorflow to implement a deeper convolutional neural networks as in Figure 8.23 on page 169, and evaluate its performance using the MNIST data set.
# use tensorflow to implement a convolutional neural networks in Figure 8.23 on page 169
import numpy as np
import tensorflow as tf
from tensorflow import keras
tf.random.set_seed(42)
np.random.seed(42)
# define the model structure using Keras
model = keras.models.Sequential([
keras.layers.Conv2D(filters=64, kernel_size=3, activation='relu', \
padding='same', input_shape=[28, 28, 1]),
keras.layers.Conv2D(filters=64, kernel_size=3, activation='relu', padding='same'),
keras.layers.MaxPooling2D(pool_size=2),
keras.layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
keras.layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
keras.layers.MaxPooling2D(pool_size=2),
keras.layers.Conv2D(filters=256, kernel_size=3, activation='relu', padding='same'),
keras.layers.Conv2D(filters=256, kernel_size=3, activation='relu', padding='same'),
keras.layers.Conv2D(filters=256, kernel_size=3, activation='relu', padding='same'),
keras.layers.MaxPooling2D(pool_size=2),
keras.layers.Flatten(),
keras.layers.Dense(units=4096, activation='relu'),
keras.layers.Dense(units=4096, activation='relu'),
keras.layers.Dense(units=1000, activation='relu'),
keras.layers.Dense(units=10, activation='softmax'),
])
# compile model by attaching with loss/optimizer/metric
model.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.SGD(learning_rate=5e-2),
metrics=["accuracy"])
# learning a model
history = model.fit(X_train, y_train, epochs=10,
validation_data=(X_test, y_test))
model.summary()
In general, Pytorch follows a similar pipeline of model construction as Tensorflow. Refer to an online Tutorial for more details. In the following examples, we use a keras-style package for Pytorch, namely torchkeras. As a result, we can similarly follow the above three steps in building CNNs using Pytorch.
Use Pytorch to implement the convolutional neural networks as Example 6.2, and evaluate its performance using the MNIST data set.
# Convert training/test data from numpy arrays to pytorch tensors/datasets
import torch
import numpy as np
from torch.utils.data import TensorDataset, DataLoader
X_train_ts = torch.Tensor(X_train.reshape(-1,1,28,28))
train_dataset = torch.utils.data.TensorDataset(X_train_ts, torch.Tensor(y_train).long())
X_test_ts = torch.Tensor(X_test.reshape(-1,1,28,28))
test_dataset = torch.utils.data.TensorDataset(X_test_ts, torch.Tensor(y_test).long())
dl_train = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=2)
dl_valid = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=2)
print(len(dl_train))
print(len(dl_valid))
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR
# define CNN structure and its forward pass layer-by-layer
class Net(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.ModuleList([
nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
nn.ReLU(),
nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 3),
nn.ReLU(),
nn.Conv2d(in_channels=64,out_channels=64,kernel_size = 3),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2,stride = 2),
nn.Flatten(),
nn.Linear(7744,128),
nn.ReLU(),
nn.Linear(128,10),
nn.Softmax(dim=1)
]
)
def forward(self,x):
for layer in self.layers:
x = layer(x)
return x
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
#loss = F.cross_entropy(output, target)
loss = F.nll_loss(torch.log(output), target)
loss.backward()
optimizer.step()
if batch_idx % 1000 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(torch.log(output), target, reduction='sum').item() # sum up batch loss
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
# use pyorch to implement a convolutional neural networks on page 200
device = torch.device("cuda")
#device = torch.device("cpu")
model = Net().to(device)
#optimizer = optim.Adadelta(model.parameters(), lr=0.1)
optimizer= torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = StepLR(optimizer, step_size=1, gamma=0.9)
for epoch in range(1, 11):
train(model, device, dl_train, optimizer, epoch)
test(model, device, dl_valid)
scheduler.step()
Use Pytorch's Keras wrapper, torchkeras
, to re-implement the convolutional neural networks as Example 6.4, and evaluate its performance using the MNIST data set.
# install keras packages for pytorch
!pip install -U torchkeras
# use pyorch to implement a convolutional neural networks on page 200
import torch
from torch import nn
# define CNN structure and its forward pass layer-by-layer
class CnnModel(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.ModuleList([
nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
nn.ReLU(),
nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 3),
nn.ReLU(),
nn.Conv2d(in_channels=64,out_channels=64,kernel_size = 3),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2,stride = 2),
nn.Flatten(),
nn.Linear(7744,128),
nn.ReLU(),
nn.Linear(128,10),
nn.LogSoftmax(dim=1)
]
)
def forward(self,x):
for layer in self.layers:
x = layer(x)
return x
# running CNNs
import torchkeras
# define the metrics computed during the learnin process
class Accuracy(nn.Module):
def __init__(self):
super().__init__()
self.correct = nn.Parameter(torch.tensor(0.0),requires_grad=False)
self.total = nn.Parameter(torch.tensor(0.0),requires_grad=False)
def forward(self, preds: torch.Tensor, targets: torch.Tensor):
preds = preds.argmax(dim=-1)
m = (preds == targets).sum()
n = targets.shape[0]
self.correct += m
self.total += n
return m/n
def compute(self):
return self.correct.float() / self.total
def reset(self):
self.correct -= self.correct
self.total -= self.total
# compile the model by attaching various dynamic components
net = CnnModel()
model = torchkeras.KerasModel(net,
loss_fn = nn.NLLLoss(),
optimizer= torch.optim.SGD(net.parameters(), lr=0.1),
scheduler = StepLR(optimizer, step_size=1, gamma=0.9),
metrics_dict = {"acc":Accuracy()}
)
# train CNNs by fitting to the training data
dfhistory=model.fit(train_data=dl_train,
val_data=dl_valid,
epochs=10
)
# retrieve data from training history to plot learning curves
#
import matplotlib.pyplot as plt
def plot_metric(dfhistory, metric):
train_metrics = dfhistory["train_"+metric]
val_metrics = dfhistory['val_'+metric]
epochs = range(1, len(train_metrics) + 1)
plt.plot(epochs, train_metrics, 'bo--')
plt.plot(epochs, val_metrics, 'ro-')
plt.title('Training and validation '+ metric)
plt.xlabel("Epochs")
plt.ylabel(metric)
plt.legend(["train_"+metric, 'val_'+metric])
plt.show()
# retrieve and plot the accuracy curves
plot_metric(dfhistory,'acc')
print(dfhistory["train_acc"])
print(dfhistory["val_acc"])
print(dfhistory["train_acc"].max(), dfhistory['val_acc'].max())
# retrieve and plot the loss curves
plot_metric(dfhistory,'loss')
#print(dfhistory["train_loss"])
#print(dfhistory["val_loss"])
print(dfhistory["train_loss"].min(), dfhistory['val_loss'].min())
# evaluate model
model.evaluate(dl_valid,quiet=False)
Use Tensorflow or Pytorch to implement a CNN model as in Figure 8.23 on page 169 and evaluate it on the CIFAR10 data set. Vary the structures in this CNN model slightly to see whether you can further improve the performance on the CIFAR10 test set.
Use JAX and its automatic differenttiation to implement CNNs from scratch. Use your implementation to build the same CNN model as in Example 6.2 and evaluate it on the MNIST data set. Compare your JAX implementation with TensorFlow or Pytorch in terms of classification accuracy and running speed.