Bayesian Learning¶

Focus marginalization rather than optimization

Rather than use a single setting of parameters \(w\), use all settings weighted by their posterior probabilities in a Bayesian model average

Advantages¶

Automatically calibrated complexity even with highly flexible models

Computationally-expensive for high dimensions

Given new instance \(x\)

Consider \(v=\{v_1, v_2 \}=\{\oplus, \ominus \}\)

The optimal classifier is given by

\[ \underset{v_j \in V}{\arg \max} \sum_{h_i \in H} \textcolor{hotpink}{P(v_j | h_i)} \ P(h_i | D) \]

Very costly to implement. We need to calculate a lot of probabilities

Consider we have multiple independent hypotheses

Lower accuracy

One more point in slide

Called as ‘Naive’ classifier, due to following assumptions

Calculate posterior probability, based on assumption that all input attributes are conditionally-independent

Last Updated: 2024-05-14 ; Contributors: AhmedThahir, web-flow