Aerial Image Segmentation with Deep Learning on PyTorch

Aerial Image Labeling addresses a core topic in remote sensing: the automatic pixel-wise labelling of aerial imagery. The UNet leads to more advanced design in Aerial Image Segmentation. Future updates will gradually apply those methods to this repository.

I created the Github Repo used only one sample (kitsap11.tif ) from the public dataset (Inria Aerial Image Labelling ) to demonstrate the power of deep learning.

The original sample has been preprocessed into 1000×1000 with a 1.5-meter resolution. The following image shows the models prediction on the RGB images.

Programming details are updated on Github repos.

Dataset Features

The Inria Aerial Image Labeling addresses a core topic in remote sensing: the automatic pixel-wise labelling of aerial imagery (link to paper). Coverage of 810 km² (405 km² for training and 405 km² for testing) Aerial orthorectified colour imagery with a spatial resolution of 0.3 m Ground truth data for two semantic classes: building and not building (publicly disclosed only for the training subset) The images cover dissimilar urban settlements, ranging from densely populated areas (e.g., San Francisco’s financial district) to alpine towns (e.g,. Lienz in Austrian Tyrol).

Instead of splitting adjacent portions of the same images into the training and test subsets, different cities are included in each of the subsets. For example, images over Chicago are included in the training set (and not on the test set) and images over San Francisco are included on the test set (and not on the training set). The ultimate goal of this dataset is to assess the generalization power of the techniques: while Chicago imagery may be used for training, the system should label aerial images over other regions, with varying illumination conditions, urban landscape and time of the year.

The dataset was constructed by combining public domain imagery and public domain official building footprints.

Notification

The full data set is about 21 GB. In this repo, I select the following image as examples:

  • RGB: AerialImageDataset/train/images/kitsap11.tif (75MB)
  • GT: AerialImageDataset/train/gt/kitsap11.tif (812KB)

The original *.tif (GeoTIFF) image can be converted to a png image with the following code and the gdal package.

Reference

Github: https://github.com/cuicaihao/aerial-image-segmentation

Google Publish A Survey Paper of Efficient Transformers

In this paper, the authors propose a taxonomy of efficient Transformer models, characterizing them by the technical innovation and primary use case.

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of “X-former” models have been proposed – Reformer, Linformer, Performer, Longformer, to name a few – which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored “X-former” models, providing an organized and comprehensive overview of existing work and models across multiple domains.

In this paper, the authors propose a taxonomy of efficient Transformer models, characterizing them by the technical innovation and primary use case. Specifically, they review Transformer models that have applications in both language and vision domains, attempting to consolidate the literature across the spectrum. They also provide a detailed walk-through of many of these models and draw connections between them.

Paper Link: Efficient Transformers: A Survey

In the section 2, authors reviewed the background of the well-established Transformer architecture. Transformers are multi-layered architectures formed by stacking Transformer blocks on top of one another.

I really like the 2.4 section, when the authors summarised the the differences in the mode of usage of the Transformer block. Transformers can primarily be used in three ways, namely:

  1. Encoder-only (e.g., for classification)
  2. Decoder-only (e.g., for language modelling, GPT2/3)
  3. Encoder-decoder (e.g., for machine translation)

In section 3, they provide a high-level overview of efficient Transformer models and present a characterization of the different models in the taxonomy with respect to core techniques and primary use case. This is the core part of this paper covering 17 different papers’ technical details.

Summary of Efficient Transformer Models presented in chronological order of their first public disclosure.

In the last section, authors address the state of research pertaining to this class of efficient models on model evaluation, design trends, and more discussion on orthogonal efficiency effort, such as Weight Sharing, Quantization / Mixed precision, Knowledge Distillation, Neural Architecture Search (NAS) and Task Adapters.

In sum, this is a really good paper summarised all the important work around the Transformer model. It is also a good reference for researcher and engineering to be inspired and try these techniques for different models in their own projects.

FYI, here is my early post The Annotated Transformer: English-to-Chinese Translator with source code on GitHub, which is an “annotated” version of the 2017 Transformer paper in the form of a line-by-line implementation to build an English-to-Chinese translator via PyTorch ML framework. 

-END-

Reference:

Efficient Transformers: A Survey (https://arxiv.org/abs/2009.06732)

Annotated-Transformer-English-to-Chinese-Translator

The Transformer from “Attention is All You Need” has been on a lot of people’s minds since 2017.

In this repo, I present an “annotated” version of the Transformer Paper in the form of a line-by-line implementation to build an English-to-Chinese translator with PyTorth deep learning framework.

Visit my blog for details and more background: https://cuicaihao.com/the-annotated-transformer-english-to-chinese-translator/ or visit my GitHub for the Jupyter Notebook (Annotated_Transformer_English_to_Chinese_Translator)

Street View Image Segmentation with PyTorch and Facebook Detectron2 (CPU+GPU)

In this post, I would like to share my practice with Facebook’s new Detectron2 package on macOS without GPU support for street view panoptic segmentation.  If you want to create the following video by yourself, this post is all you need. This demo video clip is from my car’s dashcam footages from Preston, Melbourne. I used the PyTorch and Detectron2 to create this video with segmentation masks.

Continue reading “Street View Image Segmentation with PyTorch and Facebook Detectron2 (CPU+GPU)”

How to Build an Artificial Intelligent System (II)

This post is following upgrade with respect to the early post How to Build an Artificial Intelligent System (I) The last one is focused on introducing the six phases of the building an intelligent system, and explaining the details of the Problem Assesment phase.

In the following content, I will address the rest phases and key steps during the building process.  Readers can download the keynotes here: Building an Intelligent System with Machine Learning.

Continue reading “How to Build an Artificial Intelligent System (II)”

How to Build an Artificial Intelligent System (I)

Phase 1: Problem assessment – Determine the problem’s characteristics.

What is an intelligent system?

The process of building Intelligent knowledge-based system has been called knowledge engineering since the 80s. It usually contains six phases: 1. Problem assessment; 2. Data and knowledge acquisition; 3. Development of a prototype system; 4. Development of a complete system; 5. Evaluation and revision of the system; 6. Integration and maintenance of the system [1].

Continue reading “How to Build an Artificial Intelligent System (I)”

Roads from Above: Augmenting Civil Engineering & Geospatial Workflows with Machine Learning

Road from Above is partly based on my Australia Postgraduate Intern Projects (Computer Vision and Machine Learning for Feature Extraction) within Aureon Group in Melbourne.

Aurecon’s experts, across Cape Town, Melbourne and Auckland offices, have been teamed up to develop and test approaches that capture and validate new and existing measurements of the metropolitan road network. Due to the confidentiality, we reduced the resolutions of the aerial images and only opened limited results on the public domain at https://roadsfromabove.netlify.com/. Thanks to Greg More, the design of this website got the best feedback from the workshop (Visualization for AI Explainability) of the IEEE VIS 2018 conference in Berlin, Germany

Visualization for AI Explainability: Projections and Dimensionality Reduction. The goal of this workshop is to initiate a call for “explainable” that explain how AI techniques work using visualizations. We believe the VIS community can leverage their expertise in creating visual narratives to bring new insight into the often obfuscated complexity of AI systems.

Road from Above

Continue reading “Roads from Above: Augmenting Civil Engineering & Geospatial Workflows with Machine Learning”

MacOS X: Installing TensorFlow from Sources [TF Binary Attached]

When I am using TensorFlow on my MacBook Air, I always get annoyed by the warnings comes from nowhere, so I followed the documentation below to build TensorFlow sources into a TensorFlow binary and installed it successfully.  In theory, this will make the TF running faster on my machine.

Here is the document:

If you are a Mac user, you could download the TF binary from here:

Then, you could use conda to initialize an environment with Python=3.6 and install TF by typing:

sudo pip install tensorflow-1.8.0-py2-none-any.whl

Continue reading “MacOS X: Installing TensorFlow from Sources [TF Binary Attached]”

A Taste of TensorFlow on My Android Phone (III)

This is the 3rd post about my implementation of TensorFlow Apps on my Android Phone.

This time I fixed one small bug in the app of  “TF Detect” so the object tracking function could work. The project is compiled by cmake with NDK Archives in this version. You can download the new “apk files here:  Tensorflow_Demo_Debug.apk.

“Once the app is installed it can be started via the “TF Classify”, “TF Detect”, “TF Stylize”, and “TF Speech” icons, which have the orange TensorFlow logo as their icon. 

Continue reading “A Taste of TensorFlow on My Android Phone (III)”

%d bloggers like this: