Google Publish A Survey Paper of Efficient Transformers

In this paper, the authors propose a taxonomy of efficient Transformer models, characterizing them by the technical innovation and primary use case.

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of “X-former” models have been proposed – Reformer, Linformer, Performer, Longformer, to name a few – which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored “X-former” models, providing an organized and comprehensive overview of existing work and models across multiple domains.

In this paper, the authors propose a taxonomy of efficient Transformer models, characterizing them by the technical innovation and primary use case. Specifically, they review Transformer models that have applications in both language and vision domains, attempting to consolidate the literature across the spectrum. They also provide a detailed walk-through of many of these models and draw connections between them.

Paper Link: Efficient Transformers: A Survey

In the section 2, authors reviewed the background of the well-established Transformer architecture. Transformers are multi-layered architectures formed by stacking Transformer blocks on top of one another.

I really like the 2.4 section, when the authors summarised the the differences in the mode of usage of the Transformer block. Transformers can primarily be used in three ways, namely:

  1. Encoder-only (e.g., for classification)
  2. Decoder-only (e.g., for language modelling, GPT2/3)
  3. Encoder-decoder (e.g., for machine translation)

In section 3, they provide a high-level overview of efficient Transformer models and present a characterization of the different models in the taxonomy with respect to core techniques and primary use case. This is the core part of this paper covering 17 different papers’ technical details.

Summary of Efficient Transformer Models presented in chronological order of their first public disclosure.

In the last section, authors address the state of research pertaining to this class of efficient models on model evaluation, design trends, and more discussion on orthogonal efficiency effort, such as Weight Sharing, Quantization / Mixed precision, Knowledge Distillation, Neural Architecture Search (NAS) and Task Adapters.

In sum, this is a really good paper summarised all the important work around the Transformer model. It is also a good reference for researcher and engineering to be inspired and try these techniques for different models in their own projects.

FYI, here is my early post The Annotated Transformer: English-to-Chinese Translator with source code on GitHub, which is an “annotated” version of the 2017 Transformer paper in the form of a line-by-line implementation to build an English-to-Chinese translator via PyTorch ML framework. 



Efficient Transformers: A Survey (


The Transformer from “Attention is All You Need” has been on a lot of people’s minds since 2017.

In this repo, I present an “annotated” version of the Transformer Paper in the form of a line-by-line implementation to build an English-to-Chinese translator with PyTorth deep learning framework.

Visit my blog for details and more background: or visit my GitHub for the Jupyter Notebook (Annotated_Transformer_English_to_Chinese_Translator)

Street View Image Segmentation with PyTorch and Facebook Detectron2 (CPU+GPU)

In this post, I would like to share my practice with Facebook’s new Detectron2 package on macOS without GPU support for street view panoptic segmentation.  If you want to create the following video by yourself, this post is all you need. This demo video clip is from my car’s dashcam footages from Preston, Melbourne. I used the PyTorch and Detectron2 to create this video with segmentation masks.

Continue reading “Street View Image Segmentation with PyTorch and Facebook Detectron2 (CPU+GPU)”

How to Build an Artificial Intelligent System (II)

This post is following upgrade with respect to the early post How to Build an Artificial Intelligent System (I) The last one is focused on introducing the six phases of the building an intelligent system, and explaining the details of the Problem Assesment phase.

In the following content, I will address the rest phases and key steps during the building process.  Readers can download the keynotes here: Building an Intelligent System with Machine Learning.

Continue reading “How to Build an Artificial Intelligent System (II)”

How to Build an Artificial Intelligent System (I)

Phase 1: Problem assessment – Determine the problem’s characteristics.

What is an intelligent system?

The process of building Intelligent knowledge-based system has been called knowledge engineering since the 80s. It usually contains six phases: 1. Problem assessment; 2. Data and knowledge acquisition; 3. Development of a prototype system; 4. Development of a complete system; 5. Evaluation and revision of the system; 6. Integration and maintenance of the system [1].

Continue reading “How to Build an Artificial Intelligent System (I)”

Roads from Above: Augmenting Civil Engineering & Geospatial Workflows with Machine Learning

Road from Above is partly based on my Australia Postgraduate Intern Projects (Computer Vision and Machine Learning for Feature Extraction) within Aureon Group in Melbourne.

Aurecon’s experts, across Cape Town, Melbourne and Auckland offices, have been teamed up to develop and test approaches that capture and validate new and existing measurements of the metropolitan road network. Due to the confidentiality, we reduced the resolutions of the aerial images and only opened limited results on the public domain at Thanks to Greg More, the design of this website got the best feedback from the workshop (Visualization for AI Explainability) of the IEEE VIS 2018 conference in Berlin, Germany

Visualization for AI Explainability: Projections and Dimensionality Reduction. The goal of this workshop is to initiate a call for “explainable” that explain how AI techniques work using visualizations. We believe the VIS community can leverage their expertise in creating visual narratives to bring new insight into the often obfuscated complexity of AI systems.

Road from Above

Continue reading “Roads from Above: Augmenting Civil Engineering & Geospatial Workflows with Machine Learning”

MacOS X: Installing TensorFlow from Sources [TF Binary Attached]

When I am using TensorFlow on my MacBook Air, I always get annoyed by the warnings comes from nowhere, so I followed the documentation below to build TensorFlow sources into a TensorFlow binary and installed it successfully.  In theory, this will make the TF running faster on my machine.

Here is the document:

If you are a Mac user, you could download the TF binary from here:

Then, you could use conda to initialize an environment with Python=3.6 and install TF by typing:

sudo pip install tensorflow-1.8.0-py2-none-any.whl

Continue reading “MacOS X: Installing TensorFlow from Sources [TF Binary Attached]”

A Taste of TensorFlow on My Android Phone (III)

This is the 3rd post about my implementation of TensorFlow Apps on my Android Phone.

This time I fixed one small bug in the app of  “TF Detect” so the object tracking function could work. The project is compiled by cmake with NDK Archives in this version. You can download the new “apk files here:  Tensorflow_Demo_Debug.apk.

“Once the app is installed it can be started via the “TF Classify”, “TF Detect”, “TF Stylize”, and “TF Speech” icons, which have the orange TensorFlow logo as their icon. 

Continue reading “A Taste of TensorFlow on My Android Phone (III)”

Sharing My Data Science Notebook (Python & TensorFlow) on GitHub

Python is a great general-purpose programming language on its own, but with the help of a few popular libraries (Numpy, SciPy, Matplotlib, TensorFlow) it becomes a powerful environment for scientific computing and data analysis.

I start sharing my notebooks on learning Python and TensorFlow, here is my GitHub repository:  Data_Science_Python.





Continue reading “Sharing My Data Science Notebook (Python & TensorFlow) on GitHub”