Mask R-CNN - Single Class Instance Segmentation Tutorial

Hey everyone. Finally I have time to write a blogpost on Mask-RCNN which I'm using for a dental project. Mask R-CNN is an implementation of instance segmentation.

On how it Mask R-CNN works, my apprentice Mansour made a very nice slide:

But if you’re reading this, you’re probably looking to get the repo running right? Let’s start with cloning the original repo:

It has great documentation on how and what is Mask-RCNN, but getting it running wasn’t as straightforward. Let’s go step-by-step.

Preparing the Environment

Unfortunately the original repo still uses Tensorflow 1, it took some time get it all sorted out. The default repo had the following requirements.txt:

Tensorflow is the most troublesome to get right, once you have it installed the rest of the libraries should play well with each other. If not, you can simply pip uninstall / install to update them. I’ve run the training on two machines.


For a GPU:

If you’re not using that GPU configuration, you can check from here:

Running an Example

The balloon example is the best one for single class classification. Navigate to samples > balloon.

Read the thoroughly. Download the .h5 and dataset and place them in the base folder, then run through the two jupyter notebooks (inspect_balloon_data.ipynb & inspect_balloon_model.ipynb). Make sure that you've changed the paths in the notebooks to reflect where you put the .h5 and dataset. We’re not running any training yet, but it’s good to know the basic repo codes are working.

Once that’s done, try running the training command.

If the environment is setup correctly, it should train a new .h5 file. Running on GPU took me around 1.5 hours on default configuration (30 epoch 100 steps) with a GTX 2060. You can reduce this by going to the file and modifying the STEPS_PER_EPOCH and VALIDATION_STEPS.

Ran into an error? Go refer to the bottom section. If not, congratulations! You try reruning the inspect balloon_model notebook again but this time redirect the BALLOON_WEIGHTS_PATH to the newly created .h5 file and run through the cells.

Creating Your Own (Single Class) Instance Segmentation

Honestly once you’ve gotten the balloon training working, you can be relieved since that’s the most troublesome part. Modifying the code for your own purposes isn’t as difficult. I’ve modified mine for a dental project, the first phase be segmenting teeth from the background.

Start of by duplicating the balloon folder. You’ll be mostly looking into modifying the file. You can change all “balloon” into “teeth” or whatever you’re using it for.

To train and test using your own dataset, first you first divide them into train and val folders. Refer to the balloon dataset on how they're structured. The mask r-cnn doesn't mind a different resolution.

For the annotator, you can either modify the code to ingest your own annotation tool OR you can use the annotator tool they use. If you’re going to use theirs, then you should use VIA < 1.6. We used this one:\~vgg/software/via/via-1.0.6.html

You’d want the annotated csv to have this output format:

Export as a json file and ensure that the output generated follows that convention, where each instance has its own number ( reformats the json into something readable). Note that both train and val needs their own via_region_data.json file.

Once you have the dataset prepared, simply run the

command, and it should start training a new model for you. If successful, you'll have a new h5 in your logs folder! Hurrah, you’ve got a Mask R-CNN working.

Some stuff you can get from the repo:

Instance Segmentation

Feature Map


Hopefully you’ll never need to consult this section, but these are the common mistakes that we found when I had my apprentice run the code in his machine.

Running the jupyter notebook / training script from a different directory than expected. Do NOT install from pip or manual install a wheel, the mrcnn is already in the base directory with a specific set of and other scripts.

  • Read the Readmes thoroughly. The original repo provided almost everything you need to run their examples, from .h5 to dataset. Are you sure you have everything?
  • Double check on how you point to the dataset or weights. Make sure their directories are pointing to the right folder.
  • Try running the balloon training from the balloon folder

Running on CPU are you? The first step / epoch took me a long time too, around 30 mins. A full train (100 steps / 30 epoch) on CPU would take 4 days non-stop. Switch to GPU, it’s 100x faster.

Introduction to Speech Emotion Recognition