Building a general usable framework for computer vision task competitions

Winter 2020

Qishen Ha

Machine Learning Engineer

As an HP Z Data Science Global Ambassador, Qishen Ha's content is sponsored and he was provided with HP products.

Hi, my name is Qishen Ha. I am currently working as a Machine Learning Engineer for LINE Corp., the 5th ranked Kaggle Grandmaster in the world, focusing mainly on computer vision problems like image classification, semantic segmentation or object detection.

I am very honored to be an HP Z Data Science Global Ambassador, and I am very grateful to HP Z for giving me this opportunity and providing me with the HP Z8 G4 workstation and ZBook Studio. This has increased my competitiveness in the Kaggle competitions.

Today I want to talk about my code framework used for the competition. In particular, the computer vision competition.

My Public Notebooks

Over the years I've made a number of CNN training notebooks publicly available, either as a baseline model or as a minimal version of the top solution (because the full version is too large in terms of code and training).

If any of you have read these notebooks of mine, you will see that although they use different data and train for different tasks, they all have the same basic framework. This is called a generic framework. Using such a framework, when we encounter a new competition, we can train a new baseline model in the shortest possible time and it is also very easy to improve or maintain it afterwards.

I will now summarize the framework I used in these notebooks and introduce each module. This is of course the framework I am used to and I think it is very easy to work with. If you already have a framework that you are comfortable with, there is no need to copy mine exactly, but just use it for ideas.

Introducing my framework

These are the basic generic modules in my framework:

Dataset
Augmentation
Model
Loss Function

I will now introduce them one by one.

Dataset

Dataset defines how we read the data, how we pre-process the data, how we read the labels and how we deal with them. This picture shows one of the most basic code structures of a dataset.

This is a simple image classification task. We use cv2 to read the image into memory, then pass through augmentation and pre-process it, and then finally return the processed image and label.

This is a very generic code style and requires only very minor modifications when we need to adapt it to the image segmentation task. The following picture shows the most important modifications.

Lastly, we adjust the data type and dimension of the mask, replace row.label and return it.

This way we can easily modify the dataset, read the data we want, pre-process it as we wish and so on.

Augmentation

You may have noticed that in the dataset there is a parameter called “transforms” which contains the augmentation methods that we will use, and these methods are defined in the section on augmentation.

This picture shows a simple definition of augmentations. In training we use horizontal flip and resize, while in validation we only use resize.

Augmentations

If we want to add more augmentation methods to this, we can simply add to it, as shown below.

Augmentations

Like this, we have added random rotation and blur to the training process.

Model

In this subsection we need to define the structure of the model. Let's still take the simplest example - the model structure for the image classification task - as a reference.

Typically, in an image classification task, we create an imagenet pre-trained model, such as efficientnet, as a backbone, delete its own linear layer of 1000 classes (the imagenet dataset is a 1000-class dataset), and add our own linear layer of n classes. As shown in this figure.

If we wanted to add a dropout before the last layer of FC, we could simply write it like the following.

Another common scenario is that the input image may not be RGB 3 channels, but 4 channels or more. In this case we can change the input of the first convolution layer of the backbone to what we want, as in the image below.

Here n_ch is the number of channels we have as input. By writing like this we not only change the number of input channels to what we want, but we also keep using the imagenet's pretrained weights for the first conv layer.

Loss Function

The easiest way to define a Loss Function is as follows. This is also the most common way.

Loss

However, it is also very easy to change to a complex look, such as the following.

Loss

In this loss function, we use cross entropy loss for the first four outputs and BCE loss for the others, and add loss weight to balance the two losses, which makes the logic more complex but does not require much code change. We used this loss to win first place in the RANZCR competition.

Conclusion

These are the four basic modules of my framework, all of which are designed to be very easy to extend. Combined they form the framework that I use. When using this framework for experiments, I keep a notebook for each experiment, which is useful for analyzing the results and reproducing them.

For more information, you can go to my Kaggle homepage and find the notebooks I've shared, and I'm sure you'll find more useful information in these notebooks.

Have a Question?
Contact Sales Support. 

Follow HP Z on Social Media |

Instagram

YouTube

Facebook

Business Sales Support

Connect With a Dedicated Sales Advisor

Monday - Friday

7:00am - 7:00pm (CST) 

Government Sales Support 

Federal

1-800-727-5472

State and Local 

1-800-727-5472

Monday - Friday

7:00am - 7:00pm (CST) 

Education Sales Support 

K-12 Education

1-800-727-5472

Higher Education

1-800-727-5472

 Need Support for Your HP Z Workstation? 

Product may differ from images depicted.

The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

HP Z HP Z HP Z

Building a general usable framework for computer vision task competitions

Qishen Ha

My Public Notebooks

Introducing my framework

Dataset

Augmentation

Augmentations

Augmentations

Model

Loss Function

Loss

Loss

Loss

Conclusion

Have a Question?
Contact Sales Support.

Business Sales Support

Government Sales Support

Education Sales Support

Need Support for Your HP Z Workstation?

Disclaimers

Select Your Country/Region and Language

Your Cart is Empty

Building a general usable framework for computer vision task competitions

My Public Notebooks

Introducing my framework

Augmentations

Augmentations

Conclusion

Have a Question? Contact Sales Support.

Business Sales Support

Government Sales Support

Education Sales Support

Need Support for Your HP Z Workstation?

Disclaimers

Select Your Country/Region and Language

Have a Question?
Contact Sales Support. 

Government Sales Support 

Education Sales Support 

 Need Support for Your HP Z Workstation?