TinyML¶

Rather than adding more compute power, focus on improving compute efficiency

Will mainly focus on the following applications: Speech, Computer Vision, NLP

Topics¶

Hardware
- Architecture & Dataflow
- Metrics and Analysis
- Efficiency
- Micro-architecture/Circuits
Model Optimization
Software: Optimize DNN operations through software compilation/kernel implementations
- Domain-specific compilers; eg: TVM
- Kernel implementations
- Mapping onto hardware
Systems
Environmental issues

Pre-Requisites¶

Computer architecture
Machine Learning
Python programming
PyTorch Basics

Reading¶


Textbook	Efficient processing of deep neural networks
Introduction	A New Golden Age for Computer Architecture - PDF - HTML
DNN Computations	TB Chap 1, 2 What’s the backward-forward FLOP ratio for Neural Networks? Optimizing RNNs in CuDNN 5 What are keys, queries, and values in attention mechanisms? Attention is all you need
Hardware	Book: Chapter 3 In-datacenter Performance Analysis of a Tensor Processing Unit Optional: Computer Architecture: A Quantitative Approach. Ch 7 Book: Chapter 5 Think Fast: a TSP for Accelerating Deep Learning Workloads FYI: OCP Microscaling (MX) Format Specification Book: Chapter 8 Serving DNNs in Real-time with Project Brainwav e Ten Lessons from 3 Generations Shaped Google TPUv4i Optional: EIE: Efficient Inference Engine on Compressed DNN Optional: Survey on sparse hardware acceleration
Microarchitecture	Deep learning with INT8 on Xilinx devices On-Chip Memory Design for Low-Power CNN Accelerators Optional: Making Floating Point Math Highly Efficient for AI Hardware Optional: Book: Chapter 10
Quantization	Book: Chapter 7 Quantization and Training of NNs for Efficient INT-only Inference Training DNNs with 8-bit Floating-Point Numbers
Pruning	Book: Chapter 8 Learning Both Weights and Connections for Efficient NNs The Lottery Ticket Hypothesis
TinyML	TinyML Progress, Challenges and Roadmap
Knowledge Distillation	Distilling the Knowledge in a Neural Network Knowledge Distillation: A Survey
Neural Architecture Search	Book: Chapter 9 Neural Architecture Search with Reinforcement Learning BRP-NAS: Prediction-based NAS using GCNs AutoML Codesign of a CNN and its Hardware Accelerator
Kernel Computation	Book: Chapter 4 Fast algorithms for CNNs End-to-end ASR Model Compression using Reinforcement Learning Optional: TNet
Mapping	Book: Chapter 6 Optimizing RNNs on GPUs DLA: Compiler and FPGA Overlay for DNN Inference Acceleration
TVM	TVM: An Automated Optimizing Compiler for Deep Learning
Pre-/Postprocessing	AI Tax: The Hidden Cost of AI Data-Center Applications Rethinking Data Storage and Preprocessing in Datacenters Faster Neural Networks Straight from JPEG
Distributed Training	Horovod: Fast and Easy Distributed Deep Learning in Tensorflow Large Scale Distributed Deep Learning
Federated Learning	Google AI Blog Post on FL Communication Efficient Learning of DNNs from Decentralized Data Towards Federated Learning at Scale: System Design
Ethical/Environmental Issues	Chasing Carbon: The Elusive Environmental Footprint of Computing On the Dangers of Stochastic Parrots: Can Language Models be Too Big The Carbon Footprint of ML Training will Plateau, then Shrink

References¶

Machine Learning Hardware and Systems (Cornell Tech, Spring 2022)
Videos
Material
TinyML and Efficient Deep Learning Computing | EfficientML.ai - MIT HAN Lab
Tiny Machine Learning | UPenn
AutoDL | Applied Deep Learning | Maziar Raissi
TinyML - Digikey

Current Video¶

https://www.youtube.com/watch?v=QF0S29IXTWk&list=PL7rtKJAz_mPe6kAbiH6Ucq02Vpa95qvBJ&index=79

Last Updated: 2025-07-26 ; Contributors: AhmedThahir, web-flow