• Type:

CH Show HN: Larq – Binarized Neural Network Inference with MLIR and TFLite

Tests PyPI - Python Version PyPI PyPI - License Join the community on Spectrum

Larq Compute Engine (LCE) is a highly optimized inference engine for deploying
extremely quantized neural networks, such as
Binarized Neural Networks (BNNs). It currently supports various mobile platforms
and has been benchmarked on a Pixel 1 phone and a Raspberry Pi.
LCE provides a collection of hand-optimized TensorFlow Lite
custom operators for supported instruction sets, developed in inline assembly or in C++
using compiler intrinsics. LCE leverages optimization techniques
such as tiling to maximize the number of cache hits, vectorization to maximize
the computational throughput, and multi-threading parallelization to take
advantage of multi-core modern desktop and mobile CPUs.

Key Features

  • Effortless end-to-end integration from training to deployment:

    • Tight integration of LCE with Larq and
      TensorFlow provides a smooth end-to-end training and deployment experience.

    • A collection of Larq pre-trained BNN models for common machine learning tasks
      is available in Larq Zoo
      and can be used out-of-the-box with LCE.

    • LCE provides a custom MLIR-based model converter which
      is fully compatible with TensorFlow Lite and performs additional
      network level optimizations for Larq models.

  • Lightning fast deployment on a variety of mobile platforms:

    • LCE enables high performance, on-device machine learning inference by
      providing hand-optimized kernels and network level optimizations for BNN models.

    • LCE currently supports ARM64-based mobile platforms such as Android phones
      and Raspberry Pi boards.

    • Thread parallelism support in LCE is essential for modern mobile devices with
      multi-core CPUs.

Performance

The table below presents single-threaded performance of Larq Compute Engine on
different versions of a novel BNN model called Quicknet (trained on ImageNet dataset, soon to be released in Larq Zoo)
on a Pixel 1 phone (2016)
and a Raspberry Pi 4 Model B (BCM2711) board:

Model Top-1 Accuracy RPi 4 B, ms (1 thread) Pixel 1, ms (1 thread)
Quicknet (.h5) 58.3 % 60.5 27.9
Quicknet-Large (.h5) 62.5 % 89.9 41.8

For reference, dabnn (the other main BNN library) reports an inference time of 61.3 ms for Bi-RealNet (56.4% accuracy) on the Pixel 1 phone,
while LCE achieves an inference time of 54.0 ms for Bi-RealNet on the same device.
They furthermore present a modified version, BiRealNet-Stem, which achieves the same accuracy of 56.4% in 43.2 ms.

The following table presents multi-threaded performance of Larq Compute Engine on
a Pixel 1 phone and a Raspberry Pi 4 Model B (BCM2711)
board:

Model Top-1 Accuracy RPi 4 B, ms (4 threads) Pixel 1, ms (4 threads)
Quicknet (.h5) 58.3 % 37.9 19.1
Quicknet-Large (.h5) 62.5 % 55.8 28.0

Benchmarked on February 14th, 2020 with LCE custom
TFLite Model Benchmark Tool
(see here)
and BNN models with randomized weights and inputs.

Getting started

Follow these steps to deploy a BNN with LCE:

  1. Pick a Larq model

    You can use Larq to build and train your own model or pick a pre-trained model from Larq Zoo.

  2. Convert the Larq model

    LCE is built on top of TensorFlow Lite and uses the TensorFlow Lite FlatBuffer format to convert and serialize Larq models for inference. We provide an LCE Converter with additional optimization passes to increase the speed of execution of Larq models on supported target platforms.

  3. Build LCE

    The LCE documentation provides the build instructions for Android and ARM64-based boards such as Raspberry Pi. Please follow the provided instructions to create a native LCE build or cross-compile for one of the supported targets.

  4. Run inference

    LCE uses the TensorFlow Lite Interpreter to perform an inference. In addition to the already available built-in TensorFlow Lite operators, optimized LCE operators are registered to the interpreter to execute the Larq specific subgraphs of the model. An example to create and build an LCE compatible TensorFlow Lite interpreter for your own applications is provided here.

Next steps

Read More

Previous Post

CH Show HN: API for Sending Handwritten Letters

Next Post

CH Show HN: Parsrus – Parse JSON and XML for Golang Web APIs

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top