DeGirum Simplifies Edge AI Development
for Everyone

Powerful, Flexible & Affordable Edge AI Hardware Accelerator
Lightning Fast Application Development with Intuitive Software
Instant Access to Edge AI Hardware through Cloud Platform
Request Live Demo

  • ORCA™ AI accelerator IC
  • Flexible | Powerful | Affordable
  • Efficient model multiplexing
  • Vision models | Speech models
  • Large models | Small models
  • Pruned models | Dense models
  • Floating point models
  • Quantized models
  • PySDK Development Kit
  • Intuitive APIs
  • Model pipelining
  • Images | Camera feeds
  • Audio streams | Video files
  • DeLight™ Cloud Platform
  • Access to edge AI Hardware
  • Prototype applications in minutes
  • Try Now Buy Later
  • Deploy locally by changing only one line in the code
Same Code Everywhere

DeGirum's PySDK is designed to enable developers to create AI applications in the Cloud without having to invest time and money on buying and setting up HW. Once the application is developed, they can deploy their applications at the edge by changing only one line in the code. The APIs work with a model zoo object which is a collection of models served by an AI server. PySDK supports connecting to AI servers in different locations with a unified API

DeGirum® Cloud Device Farm

In this option, the AI servers are running on the edge HW hosted by DeGirum in the cloud device farm. We expect this option to be the starting point for our customers.

Peer-to-Peer VPN

The AI server is running on a machine hosted by DeGirum® but is shared with the developer using peer-to-peer VPN. We expect this option to be used by our advanced customers who are developing real-time applications and need dedicated access to AI servers, but do not want to buy the HW yet.

Local Area Network

Developers who have access to machines equipped with AI accelerator, deploy an AI server and make it available to all clients in the same Local Area Network. We expect this option to be used by customers as they get closer to the application deployment. In some edge server use cases, this can be the final deployment option.


The AI server and the client (which runs the AI application) are running on the same machine. We expect this to be the most common option for deployment in which an on-device AI server runs inference jobs for multiple clients.

Direct Hardware Access

The application SW directly communicates with the HW accelerator without the server-client protocol. This option gives the maximum performance out of our HW.

DeGirum® Edge AI Accelerator

ORCA™, DeGirum's AI accelerator IC, is a flexible, efficient and affordable edge AI hardware. ORCA™ provides application developers the ability to create rich, sophisticated, and highly functional products, at the power and price suitable for the edge. ORCA™ is powered by a very efficient compute architecture, a near lossless ML model context switching capability and support for pruned models. Early samples are now available.

Products: M.2 Boards, USB Dongles, ORCA™ ASIC

Accelerator Features

ORCA's support for DRAM allows the multiplexing of different ML models, enabling developers to service scenarios that go beyond simple single-model applications. ORCA's capability to process pruned models essentially multiplies the compute and bandwidth resources, allowing the processing of larger, more accurate models to enable real-time cloud-like quality applications on the edge.


Upcoming edge HW solutions optimize the hardware for models that fit on the chip. The cost is driven down by removing the support for expensive DRAM. As a result, these solutions perform poorly on models exceeding a certain size and/or when multiple models are used. On the other hand, edge HW solutions that provide flexibility to run multiple models do so by using very high bandwidth memories that increase system cost and power. In contrast, ORCA™ solves the data movement problem by supporting pruned models, which have the added advantage of reducing the compute requirements also. By supporting DRAM and using pruned models, ORCA™ can support large models as well as switch models with little impact on performance

Model Pruning

It is well known that neural networks are heavily over parameterized and that pruned deep networks outperform their dense shallow counterparts. Exploiting pruned networks provides benefits along multiple dimensions: smaller model size, reduced compute load, lower memory bandwidth and lower energy.

DeGirum® AutoSkip compute technology provides optimal support for pruned networks. This allows us to run deeper networks which deliver higher quality and performance.

Contact Us