Cloud Platform &
HW Accelerator for Edge AI

Develop on Cloud -> Deploy at the Edge

Universal Inference Runtime

Develop for the edge in minutes, not months.​

DeGirum’s Universal Inference Runtime, DgRT, abstracts model inferencing to a function call that is agnostic to the HW, the model, the connectivity (local vs remote) and the OS (Windows and Linux). DgRT helps Edge developers making API calls similar to cloud APIs.

Install our PySDK

Runtime Features

DgRT handles the pre-processing of the model input as well as the post-processing of the output, freeing the developers to focus on the application SW.

DgRT’s easy-to-use asynchronous inference call provides higher performance by pipelining the pre-processing and post-processing steps and batching the inputs as needed.

DgRT multiplexes the HW for multiple models and clients efficiently and maximizes the HW utilization and performance.


Prerequisites: Python ver. 3.8 should be installed in your system and configured to be the default Python installation.

To install DeGirum PySDK from the DeGirum index server use the following command:

   python -m pip install degirum --extra-index-url
Quick Start
import degirum as dg

#For details check the documentation

Model Designer

DeGirum’s Model Designer, DgGraph, is a GUI based tool to design and visualize machine learning models. Designing, compiling and optimizing models for different edge hardware is a difficult and time-consuming task. DgGraph solves this problem by ensuring that the model is defined in a way that is compatible with downstream tools for compilation and optimization. The model code is auto-generated and is self-contained in a single file, making it easier to debug and customize.

Launch the tool!

Designer Features

DgGraph provides a drag-and-drop interface to add layers to a model. The layer parameters can be customized in the GUI. DgGraph also provides pre-defined subnets (frequently used layer blocks) to enable compact model definition. DgGraph provides a library of popular models that can serve as templates for new model design. The models designed in DgGraph can be exported as Keras models and used in TensorFlow.

DeGirum Edge AI Accelerator​

DeGirum’s first generation AI accelerator, ORCA, is a flexible and powerful hardware AI accelerator with up to 4 TOPS of inference capacity. ORCA’s ability to efficiently process pruned networks, combined with dedicated DRAM, enables it to efficiently multiplex multiple models without losing accuracy and performance. Early samples are now available.

Products: M.2 Boards, Dongles, AI Boxes

Accelerator Features

ORCA provides application developers the ability to create rich, sophisticated, and highly functional products, at the power and price suitable for the edge. Orca is powered by a very efficient compute architecture, a near lossless ML model context switching capability and support for pruned models. Orca’s support for DRAM allows the multiplexing of different ML models, enabling developers to service scenarios that go beyond simple point applications. Orca’s capability to process pruned models essentially multiplies the compute and bandwidth resources, allowing the processing of larger, more accurate models to enable real-time cloud-like quality applications on the edge


Upcoming edge HW solutions optimize the hardware for models that fit on the chip. The cost is driven down by removing the support for expensive DRAM. As a result, these solutions perform poorly on models exceeding a certain size and/or when multiple models are used. On the other hand, edge HW solutions that provide flexibility to run multiple models do so by using very high bandwidth memories that increase system cost and power. In contrast, Orca solves the data movement problem by supporting pruned models, which have the added advantage of reducing the compute requirements also. By supporting DRAM and using pruned models, ORCA can support large models as well as switch models with little impact on performance

Model Pruning

It is well known that neural networks are heavily over parameterized and that pruned deep networks outperform their dense shallow counterparts. Exploiting pruned networks provides benefits along multiple dimensions: smaller model size, reduced compute load, lower memory bandwidth and lower energy.

DeGirum’s AutoSkip compute technology provides optimal support for pruned networks. This allows us to run deeper networks which deliver higher quality and performance.

Contact Us