Building the Xception Model

 

In recent years, there have been many breakthroughs in the development of Deep Learning using Convolutional Neural Networks (CNN). One of the latest and most accurate models is the Xception architecture. This model, developed by François Chollet in 2017, overtook the previous records in both speed and accuracy held by the Inception architecture. Here we will take a deeper dive into the construction of the Xception architecture to see exactly how it is constructed.

The architecture for the Xception model is based on depthwise separable convolution layers and consists of three major sections: Entry Flow, Middle Flow, and Exit Flow.

Image from Xception: Deep Learning with Depthwise Separable Convolutions

Entry Flow

To construct the Xception mode, we will use the Keras libraries Layers and Models. These two libraries will provide us with all the tools needed to realize the model above into actual code.

from keras import layers
from keras import models

 

We can build the layers on to each other by creating each layer and reassigning it to the same variable with the model our_layers = (new_layer)(our_layers). For simplicity, we will just use the variable x to stack the layers.

In the Entry Flow section of the image, the first two Convolution layers are shown with the parameters. The first number is the filter parameter which defines the number of output filters in the convolution. The dimensions represent the kernal_size parameter for the height and width of the convolution window. We can plug these values directly into the parameters for the Conv2D parameters. The use_bias parameter defaults to True; we will want to turn this off because this model does not use a bias vector. Finally, we can take advantage of the **kwarg to create the name parameter so we can keep track of where we are in the diagram.

Using what we know, each time a Convolutional layer is added, we want to add a normalization layer. The convolution layers will create the specified number of filters, and each filter will provide output on different scales. Because each of these different filters will have individual scaling on their activations, we want to normalize the scales across all filters to ensure the network is able to easily compare the values. To do this, we will use the BatchNormalization function.

The axis parameter for the BatchNormalization has a default of -1; the value chosen for the axis will be determined by a few different factors such as the backend (Theano or Tensorflow) and type of data being analyzed by the network. Theano orders the dimension by (batches, channels, width, height). Tensorflow uses the order (batches, width, height, channels). When selecting the axis argument, you are setting which axis to use for the mean and standard deviation calculations. Conv2D layers need to be normalized along the channels. Since the channel is the last value in Tensorflow, we want to use the default value. The name can be added to help us keep track of our progress.

We can then add the Activation layer after the normalization. The activation layer will transform the weighted input from the node to determine the final output for the node. By using the ReLU function, we are telling the network to use stochastic gradient descent with backpropagation of errors for training. ReLU is the activation always used for exclusive image recognition.

This completes the first module section. When looking at the diagram defining our flow, you can see the convolution layer wrapping around the next module section. This represents the linear residual connection layer, which wraps each module section of the model. These connections provide direct gradients to a network without passing through non-linear activations. These blocks will affect both the gradients of the network as well as the forward output values while providing a direct connection through the network. Researchers are still trying to understand exactly how they work, but they have been proven to improve the network.

    img_input = layers.Input(shape=input_shape)

x = layers.Conv2D(32, (3, 3),
strides=(2, 2),
use_bias=False,
name=’block1_conv1′)(img_input)
x = layers.BatchNormalization(name=’block1_conv1_bn’)(x)
x = layers.Activation(‘relu’, name=’block1_conv1_act’)(x)
x = layers.Conv2D(64, (3, 3), use_bias=False,

name=’block1_conv2′)(x)
x = layers.BatchNormalization(name=’block1_conv2_bn’)(x)
x = layers.Activation(‘relu’, name=’block1_conv2_act’)(x)

 

When constructing our residual layers, we will want to look at our next planned normal layer to understand some of our weights. We will use 128 features to match the features for the next normal layer, but the residual layers will use significantly smaller kernel sizes for the convolutional windows. We also want to define the padding for the residual layers. The default value for padding is “valid”; however, for the residual layers, we want to denote the value as “same”.

    residual = layers.Conv2D(128, (1, 1),
strides=(2, 2),
padding=’same’,
use_bias=False)(x)
residual = layers.BatchNormalization()(residual)

 

Now we can start moving into our next module block. From this point on, we will choose to use the SeparableConv2D in place of the Conv2D to construct our convolution layer. The SeparableConv2D will first perform a depth wise spatial convolution, which acts separately on each input channel, followed by a pointwise convolution to mix the resulting output channels. The filters will again double in number, while we keep the kernel size the same. From this point on we want to ensure that we are not dropping inputs, so we will define the padding with “same”. This will then be followed up with the BatchNormalization and ReLU activation layers.

Following the layout from the diagram, we can see that each module is comprised of two-layer sets of SeparableConv2D layers with the same number of features. For the second part of the module, we will use MaxPooling2D in place of the Activation layer.

    x = layers.SeparableConv2D(128, (3, 3),
padding=’same’,
use_bias=False,
name=’block2_sepconv1′)(x)
x = layers.BatchNormalization(name=’block2_sepconv1_bn’)(x)
x = layers.Activation(‘relu’, name=’block2_sepconv2_act’)(x)
x = layers.SeparableConv2D(128, (3, 3),
padding=’same’,
use_bias=False,
name=’block2_sepconv2′)(x)
x = layers.BatchNormalization(name=’block2_sepconv2_bn’)(x)x = layers.MaxPooling2D((3, 3),
strides=(2, 2),
padding=’same’,
name=’block2_pool’)(x)
x = layers.add([x, residual])

 

This module will then be followed up by another residual layer looking ahead to the next module to see the number of filters.

    residual = layers.Conv2D(256, (1, 1), strides=(2, 2),
padding=’same’, use_bias=False)(x)
residual = layers.BatchNormalization()(residual)

 

At this point, we can just rinse and repeat following the layout of the diagram. All the convolutional layers are using the same kernel size, and the filters are doubling in size at every subsequent module until we reach the Middle flow.

    x = layers.Activation(‘relu’, name=’block3_sepconv1_act’)(x)
x = layers.SeparableConv2D(256, (3, 3),
padding=’same’,
use_bias=False,
name=’block3_sepconv1′)(x)
x = layers.BatchNormalization(name=’block3_sepconv1_bn’)(x)
x = layers.Activation(‘relu’, name=’block3_sepconv2_act’)(x)
x = layers.SeparableConv2D(256, (3, 3),
padding=’same’,
use_bias=False,
name=’block3_sepconv2′)(x)
x = layers.BatchNormalization(name=’block3_sepconv2_bn’)(x)x = layers.MaxPooling2D((3, 3), strides=(2, 2),
padding=’same’,
name=’block3_pool’)(x)
x = layers.add([x, residual])

 

The next residual layer is created with the number of filters being determined by the next module.

    residual = layers.Conv2D(728, (1, 1),
strides=(2, 2),
padding=’same’,
use_bias=False)(x)
residual = layers.BatchNormalization()(residual)

 

At the end of this last module in the Entry Flow, we will add the two lists of tensors and complete the flow. Now we can move on to construct our Middle Flow.

    x = layers.Activation(‘relu’, name=’block4_sepconv1_act’)(x)
x = layers.SeparableConv2D(728, (3, 3),
padding=’same’,
use_bias=False,
name=’block4_sepconv1′)(x)
x = layers.BatchNormalization(name=’block4_sepconv1_bn’)(x)
x = layers.Activation(‘relu’, name=’block4_sepconv2_act’)(x)
x = layers.SeparableConv2D(728, (3, 3),
padding=’same’,
use_bias=False,
name=’block4_sepconv2′)(x)
x = layers.BatchNormalization(name=’block4_sepconv2_bn’)(x)x = layers.MaxPooling2D((3, 3), strides=(2, 2),
padding=’same’,
name=’block4_pool’)(x)
x = layers.add([x, residual])

Middle Flow

The easiest way to construct the Middle Flow will be to use a for loop to create our eight layers with their residuals. At the start of each loop, we will save the current state of the flow into the residual variable, and then go on to create the repeated module. Each module in the Middle Flow will have the same number of filters, so we do not need to continually increase the filters for each repeat. At the end of the loop, we will add the residual back in along with the newly created module.

    for i in range(8):
residual = x
prefix = ‘block’ + str(i + 5)x = layers.Activation(‘relu’, name=prefix + ‘_sepconv1_act’)(x)
x = layers.SeparableConv2D(728, (3, 3),
padding=’same’,
use_bias=False,
name=prefix + ‘_sepconv1’)(x)
x = layers.BatchNormalization(name=prefix + ‘_sepconv1_bn’)(x)
x = layers.Activation(‘relu’, name=prefix + ‘_sepconv2_act’)(x)
x = layers.SeparableConv2D(728, (3, 3),
padding=’same’,
use_bias=False,
name=prefix + ‘_sepconv2’)(x)
x = layers.BatchNormalization(name=prefix + ‘_sepconv2_bn’)(x)
x = layers.Activation(‘relu’, name=prefix + ‘_sepconv3_act’)(x)
x = layers.SeparableConv2D(728, (3, 3),
padding=’same’,
use_bias=False,
name=prefix + ‘_sepconv3’)(x)
x = layers.BatchNormalization(name=prefix + ‘_sepconv3_bn’)(x)x = layers.add([x, residual])

 

Now that we have the middle layer constructed with the eight repetitions, we can add the final residual layer to wrap around the entire Middle Flow. This final residual layer is outside of the above for loop and will be referenced by the next residual layer created in the Exit Flow.

    residual = layers.Conv2D(1024, (1, 1), strides=(2, 2),
padding=’same’, use_bias=False)(x)
residual = layers.BatchNormalization()(residual)

Exit Flow

Here in the Exit Flow, we will create the final two modules and complete the entire model. The first module of the Exit Flow will maintain the same number of filters as the last module in the Entry Flow and all the modules in the Middle Flow. All the other parameters will remain constant. This will also cap off our residual layers as we connect them with the layers.add() call.

    x = layers.Activation(‘relu’, name=’block13_sepconv1_act’)(x)
x = layers.SeparableConv2D(728, (3, 3),
padding=’same’,
use_bias=False,
name=’block13_sepconv1′)(x)
x = layers.BatchNormalization(name=’block13_sepconv1_bn’)(x)
x = layers.Activation(‘relu’, name=’block13_sepconv2_act’)(x)
x = layers.SeparableConv2D(1024, (3, 3),
padding=’same’,
use_bias=False,
name=’block13_sepconv2′)(x)
x = layers.BatchNormalization(name=’block13_sepconv2_bn’)(x)x = layers.MaxPooling2D((3, 3),
strides=(2, 2),
padding=’same’,
name=’block13_pool’)(x)
x = layers.add([x, residual])

 

For the second to last convolution layer, we will again double the number of filters, but we maintain all the other parameters.

    x = layers.SeparableConv2D(1536, (3, 3),
padding=’same’,
use_bias=False,
name=’block14_sepconv1′)(x)
x = layers.BatchNormalization(name=’block14_sepconv1_bn’)(x)
x = layers.Activation(‘relu’, name=’block14_sepconv1_act’)(x)

 

In the last convolution layer, we will do one final increase in the number of filters.

    x = layers.SeparableConv2D(2048, (3, 3),
padding=’same’,
use_bias=False,
name=’block14_sepconv2′)(x)
x = layers.BatchNormalization(name=’block14_sepconv2_bn’)(x)
x = layers.Activation(‘relu’, name=’block14_sepconv2_act’)(x)

 

With GlobalAveragePooling2D, we will finally combine all the filters to be able to produce our final output. At this point, there is the option of creating more fully connected layers.

    x = layers.GlobalAveragePooling2D()(x)

 

The Model can then be defined with the Input layer as img_input and the layers we have created.

    model = models.Model(img_input, x, name=’xception’)

 

Reference

François Chollet in 2017, Xception: Deep Learning with Depthwise Separable Convolutions

 

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *