The deep learning field has been calling for optimized inference workloads more than ever, and this need has been met with Hidet. Hidet is an open-source deep learning compiler, developed by the dedicated team of engineers at CentML Inc, and is written in Python, aiming to refine the compilation process. This compiler offers total support for deep neural network (DNN) models, ranging from PyTorch and ONNX to efficient CUDA kernels. This functionality primarily targets NVIDIA GPUs, known for their superior prowess in computational tasks.
Hidet was conceptualized from research work detailed in the paper “Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs”. The paper highlights Hidet’s aim – to decrease inference latency for deep learning models. Reducing latency is a crucial aspect of ensuring optimal model serving across several platforms – from cloud-based services to more localized edge devices.
Hidet’s development was driven by the realization that crafting efficient tensor programs for deep learning operations is a complex undertaking. This is primarily due to the complicated nature of contemporary accelerators like NVIDIA GPUs and Google TPUs. Additionally, the rapidly expanding operator types amplify this complexity. While some existing deep learning compilers like Apache TVM use declarative scheduling primitives, Hidet adopts a unique strategy.
Hidet integrates the scheduling procedure within the tensor programs and introduces specific mappings termed ‘task mappings’. These task mappings allow coders to define the computation assignment and scheme directly within tensor programs. With this, the compiler permits fine-tuned adjustments at the program-statement level, greatly enriching the expressible optimizations. This technique, known as the task-mapping programming paradigm, is unique to Hidet.
Moreover, Hidet implements a post-scheduling fusion optimization, which automates the fusion process following scheduling. This feature allows developers to concentrate on scheduling individual operators while greatly reducing the engineering efforts involved in operator fusion. It also creates an effective hardware-centric schedule space that is neutral to program input size, which drastically cuts down the tuning time.
Hidet has undergone extensive testing using advanced convolution and transformer models. The compiler has outperformed various top-notch DNN inference frameworks, including ONNX Runtime and TVM compiler, which is equipped with AutoTVM and Ansor schedulers. On average, Hidet exhibited a 1.22x improvement, with a maximum performance gain of 1.48x in certain tests.
Hidet also displays its efficiency by significantly reducing tuning times. Compared to AutoTVM and Ansor, it cuts down tuning times by 20x and 11x respectively. As Hidet continues to advance, it is setting new benchmarks for efficiency and performance in the deep learning compilation sphere. By employing task mapping and fusion optimization, Hidet has the potential to become a critical tool in developers’ toolkits looking to push the limits of deep learning model serving.