Image Segmentation Model on AWS SageMaker using TensorFlow and UNet Architecture
Learn image segmentation how to to train and deploy on AWS SageMaker
Overview
Amazon SageMaker is a machine learning service where data scientists can train and deploy their machine learning models. It provides an integrated Jupyter authoring notebook environment to train and test their models. It makes the work of both data scientists and ML software engineers much easier.
SageMaker enables jobs to preprocess, and post-process the data. The user can also do feature engineering, and evaluate their models.
In this project, I’ve built an image segmentation model in Tensorflow on SageMaker using the UNet model architecture. The project will also be deployed SageMaker.
Tech Stack
Language: Python
Libraries: tensorflow, pandas, numpy, scikit-learn, patchify, matplotlib, segmentation models, boto3
About the Tech
TensorFlow
TensorFlow is a software library or framework, designed by the Google team to implement machine learning and deep learning concepts in the easiest manner. It combines the computational algebra of optimization techniques for easy calculation of many mathematical expressions.
TensorFlow is well-documented and includes plenty of machine learning libraries. It offers a few important functionalities and methods for the same.
TensorFlow is also called a “Google” product. It includes a variety of machine learning and deep learning algorithms. TensorFlow can train and run deep neural networks for handwritten digit classification, image recognition, word embedding and creation of various sequence models.
Image Segmentation
Image segmentation is a method in which a digital image is broken down into various subgroups called Image segments which helps in reducing the complexity of the image to make further processing or analysis of the image simpler. Segmentation in easy words is assigning labels to pixels.
All picture elements or pixels belonging to the same category have a common label assigned to them. For example: Let’s take a problem where the picture has to be provided as input for object detection.
Rather than processing the whole image, the detector can be inputted with a region selected by a segmentation algorithm. This will prevent the detector from processing the whole image thereby reducing inference time.
Semantic Image Segmentation
The goal of semantic image segmentation is to label each pixel of an image with a corresponding class of what is being represented. Because we’re predicting for every pixel in the image, this task is commonly referred to as dense prediction. Note that unlike the previous tasks, the expected output in semantic segmentation are not just labels and bounding box parameters. The output itself is a high resolution image (typically of the same size as input image) in which each pixel is classified to a particular class. Thus it is a pixel level image classification.
Convolutional Neural Net
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.
The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.
Architecture
An image is nothing but a matrix of pixel values, right? So why not just flatten the image (e.g. 3x3 image matrix into a 9x1 vector) and feed it to a Multi-Level Perceptron for classification purposes? In cases of extremely basic binary images, the method might show an average precision score while performing prediction of classes but would have little to no accuracy when it comes to complex images having pixel dependencies throughout.
A ConvNet is able to successfully capture the Spatial and Temporal dependencies in an image through the application of relevant filters. The architecture performs a better fitting to the image dataset due to the reduction in the number of parameters involved and reusability of weights. In other words, the network can be trained to understand the sophistication of the image better. The role of the ConvNet is to reduce the images into a form which is easier to process, without losing features which are critical for getting a good prediction. This is important when we are to design an architecture which is not only good at learning features but also is scalable to massive datasets.
Pooling Layer :The Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the computational power required to process the data through dimensionality reduction. Furthermore, it is useful for extracting dominant features which are rotational and positional invariant, thus maintaining the process of effectively training of the model. There are two types of Pooling: Max Pooling and Average Pooling. Max Pooling returns the maximum value from the portion of the image covered by the Kernel. On the other hand, Average Pooling returns the average of all the values from the portion of the image covered by the Kernel.
Transposed Convolutional Layer :The transposed Convolutional Layer is also (wrongfully) known as the Deconvolutional layer. A deconvolutional layer reverses the operation of a standard convolutional layer i.e. if the output generated through a standard convolutional layer is deconvolved, you get back the original input. The transposed convolutional layer is similar to the deconvolutional layer in the sense that the spatial dimension generated by both are the same. Transposed convolution doesn’t reverse the standard convolution by values, rather by dimensions only.
Code & Process
GitHub Link to AWS SageMaker Notebook
Detailed step by step instructions are listed the notebook code with data needed to train, run and deploy the model:
https://github.com/adduggan/SageMaker-Image-Segmentation/blob/main/Image_Segmentation.ipynb
Requirements
boto3==1.21.3
matplotlib==3.4.3
numpy==1.21.5
patchify==0.2.3
scikit_learn==1.1.1
segmentation_models==1.0.1
tensorflow==2.7.0
Python Code
If you'd like to just grab the raw code and use portions for your unique project, grab anything you need from the below code:
Loading Data from S3
# !pip install patchify
# !pip install segmentation-models==1.0.1
# Importing Packages :
import os
import sys
import boto3
import subprocess
import io
import matplotlib.pyplot as plt
from patchify import patchify
import numpy as np
import segmentation_models as sm
from sklearn.model_selection import train_test_split
Segmentation Models: using `keras` framework.
# Loading The Data:
s3 = boto3.resource(
service_name='s3',
region_name='us-east-1',
aws_access_key_id='AKIASCVPXXOPVBCKOLUF',
aws_secret_access_key='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')
# Defining the DATA Location in the S3 bucket :
default_location = "s3://appledatabucket/Apple/"
print(default_location)
print(os.listdir())
training_dir="Model"
Getting Data From the Bucket
# Function To Load The Data
def loading_data(default_location):
img_data_array = []
mask_data_stack = []
print("Reading the images")
s3_bucket = "appledatabucket"
keys = []
for obj in s3.Bucket(s3_bucket).objects.all():
keys.append(obj.key)
for key in keys:
file_stream = io.BytesIO()
s3.Bucket(s3_bucket).Object(key).download_fileobj(file_stream)
if ".jpg" in key and "Apple" in key:
print(key)
img = plt.imread(file_stream, format='jpg')
print(img.shape)
img_data_array.append(img)
elif ".tiff" in key and "Apple" in key:
mask = plt.imread(file_stream, format='tiff')
print(mask.shape)
mask_data_stack.append(mask)
return img_data_array, mask_data_stack
# Getting Images and Mask:
img_data_array, mask_data_stack = loading_data(default_location)
Patching the Images
# Function to Patch the Images:
def Image_Patching(img_data_array):
all_img_patches = []
shapes = []
for img in range(len(img_data_array)):
large_image = img_data_array[img]
shapes.append(large_image.shape)
patches_img = patchify(large_image, (128, 128, 3), step=128)
for i in range(patches_img.shape[0]):
for j in range(patches_img.shape[1]):
single_patch_img = patches_img[i, j, :, :]
single_patch_img = (single_patch_img.astype('float32')) / 255.
all_img_patches.append(single_patch_img)
images = np.array(all_img_patches)
images = np.reshape(images, (730, 128, 128, 3))
return images
#Getting the Image Patches:
images = Image_Patching(img_data_array)
Patching the Mask
# Function to patch the images:
def mask_patching(mask_data_stack):
all_mask_patches = []
for img in range(len(mask_data_stack)):
large_mask = mask_data_stack[img]
patches_mask = patchify(large_mask, (128, 128), step=128)
for i in range(patches_mask.shape[0]):
for j in range(patches_mask.shape[1]):
single_patch_mask = patches_mask[i, j, :, :]
single_patch_mask = single_patch_mask / 255.
all_mask_patches.append(single_patch_mask)
masks = np.array(all_mask_patches)
masks = np.expand_dims(masks, -1)
return masks
#Getting the Mask Patches:
masks = mask_patching(mask_data_stack)
#Printing the Shapes of Images:
print("---Shape of the Images and Masks---")
print(images.shape)
print(masks.shape)
print("Pixel values in the mask are: ", np.unique(masks))
# Building The Model:
BACKBONE = 'resnet34'
preprocess_input1 = sm.get_preprocessing(BACKBONE)
images1 = preprocess_input1(images) # Preprocessing the Image data in corresponding to the 'Resnet34' specification.
print(images1.shape)
print(masks.shape)
Splitting the Dataset
# Splitting Data to Train and Test:
X_train, X_test, y_train, y_test = train_test_split(images1,
masks,
test_size=0.25, random_state=42)
#Sanity check, view few mages
import random
import numpy as np
image_number = random.randint(0, len(X_train))
plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.imshow(X_train[image_number, :,:,: ])
plt.subplot(122)
plt.imshow(np.reshape(y_train[image_number], (128,128)), cmap='gray')
plt.show()
Augmentation of Image and Patches
seed=24
from tensorflow.keras.preprocessing.image import ImageDataGenerator
#Defining the ImageDataGenerator Parameters:
img_data_gen_args = dict(rotation_range=90,
width_shift_range=0.3,
height_shift_range=0.3,
shear_range=0.5,
zoom_range=0.3,
horizontal_flip=True,
vertical_flip=True,
fill_mode='reflect')
mask_data_gen_args = dict(rotation_range=90,
width_shift_range=0.3,
height_shift_range=0.3,
shear_range=0.5,
zoom_range=0.3,
horizontal_flip=True,
vertical_flip=True,
fill_mode='reflect',
preprocessing_function = lambda x: np.where(x>0, 1, 0).astype(x.dtype))
#Putting Images to the Generator for Augmentation:
image_data_generator = ImageDataGenerator(**img_data_gen_args) # Initialising the Image Generator Model
image_data_generator.fit(X_train, augment=True, seed=seed)
image_generator = image_data_generator.flow(X_train, seed=seed)
valid_img_generator = image_data_generator.flow(X_test, seed=seed)
#Putting Masks to the Generator for Augmentation:
mask_data_generator = ImageDataGenerator(**mask_data_gen_args)
mask_data_generator.fit(y_train, augment=True, seed=seed)
mask_generator = mask_data_generator.flow(y_train, seed=seed)
valid_mask_generator = mask_data_generator.flow(y_test, seed=seed)
x = image_generator.next()
y = mask_generator.next()
for i in range(0,1):
image = x[i]
mask = y[i]
plt.subplot(1,2,1)
plt.imshow(image[:,:,0])
plt.subplot(1,2,2)
plt.imshow(mask[:,:,0])
plt.show()
Image and Patches as Generator
def image_mask_generator(image_generator, mask_generator):
train_generator = zip(image_generator, mask_generator)
for (img, mask) in train_generator:
yield (img, mask)
train_data_generator = image_mask_generator(image_generator, mask_generator)
validation_datagen = image_mask_generator(valid_img_generator, valid_mask_generator)
Compliling and Training the Model
# Defining the Model:
print("Loading the Model.....")
sm.set_framework('tf.keras')
sm.framework()
model = sm.Unet(BACKBONE, encoder_weights='imagenet')
model.compile('Adam', loss=sm.losses.bce_jaccard_loss, metrics=[sm.metrics.iou_score])
#printing the Model:
print(model.summary())
steps_epoch=50
valid_step=50
epochs=50
# Training :
history = model.fit(train_data_generator,
validation_data = validation_datagen,
steps_per_epoch = steps_epoch,
validation_steps= valid_step,
epochs= epochs)
def iou_score(model):
y_pred = model.predict(X_test)
y_pred_thresholded = y_pred > 0.5
intersection = np.logical_and(y_test, y_pred_thresholded)
union = np.logical_or(y_test, y_pred_thresholded)
iou_score = np.sum(intersection) / np.sum(union)
return iou_score
score = iou_score(model)
print("IoU socre is: ", score)
#plot the training and validation accuracy and loss at each epoch
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'y', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
acc = history.history['iou_score']
#acc = history.history['accuracy']
val_acc = history.history['val_iou_score']
#val_acc = history.history['val_accuracy']
plt.plot(epochs, acc, 'y', label='Training IOU')
plt.plot(epochs, val_acc, 'r', label='Validation IOU')
plt.title('Training and validation IOU')
plt.xlabel('Epochs')
plt.ylabel('IOU')
plt.legend()
plt.show()
#IOU
y_pred=model.predict(X_test)
y_pred_thresholded = y_pred > 0.5
#Printing the IOU Score
intersection = np.logical_and(y_test, y_pred_thresholded)
union = np.logical_or(y_test, y_pred_thresholded)
iou_score = np.sum(intersection) / np.sum(union)
print("IoU socre is: ", iou_score)
test_img_number = random.randint(0, len(X_test)-1)
test_img = X_test[test_img_number]
test_img_input=np.expand_dims(test_img, 0)
ground_truth=y_test[test_img_number]
prediction = model.predict(test_img_input)
prediction = prediction[0,:,:,0]
plt.figure(figsize=(16, 8))
plt.subplot(231)
plt.title('Testing Image')
plt.imshow(test_img[:,:,0], cmap='gray')
plt.subplot(232)
plt.title('Testing Label')
plt.imshow(ground_truth[:,:,0], cmap='gray')
plt.subplot(233)
plt.title('Prediction on test image')
plt.imshow(prediction)
plt.show()