Skip to content

Introduction¶

Human Perception of Sound¶

Dataset¶

Building¶

Who are the users
What do they need
What task are they trying to solve
How do they interact with the system
- Distance
- Environment
  - Background Noise
  - Reverb
Quality Control
- Only keep whatever a human can understand

Industry-Standard¶

Google Speed Commands dataset
- Recorded as individual words, not sentences
- 1000-4000 examples of each word

Good Characteristics of Model¶


Volume Invariance

Pre-Processing¶

What aspects of the signal should you sent to the neural network

Align on start point
Normalization of amplitude
Denoise
Convert to frequencies, using Fast Fourier transform
1. Extract features
2. Sliding window
Cut on end point

Word	Volume	Waveform	Spectrogram	MFCC
Yes	Loud
	Quiet
No	Loud
	Quiet

Mel Filterbanks¶

Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow

Comments