Generative Classification

Lecture: S3-GenerativeBayesClassify
Version: current
Required Read: text NBC notebook
Recorded Videos: M1 + M2 + (Extra M3 )

Att: the following markdown text was generated from the corresponding powerpoint lecture file automatically. Errors and misformatting, therefore, do exist (a lot!)!

Notebook to run:

Study Guide for Generative Bayes Classifiers

This study guide provides a review of the core concepts related to Generative Bayes Classifiers, as detailed in the source material. It includes a short-answer quiz to test comprehension, a set of essay questions for deeper analysis, and a comprehensive glossary of key terms.

Quiz: Short-Answer Questions

Instructions: Answer the following ten questions based on the provided source material. Each answer should be approximately two to three sentences.

What are the three major types of classification approaches described in the material?
What is the fundamental goal of a Bayes Classifier when presented with a new data sample for testing?
Explain the primary difference between a generative and a discriminative probabilistic model for classification.
What is the core “naïve” assumption of the Naïve Bayes Classifier, and why is it crucial for the model’s functionality?
How does the Naïve Bayes conditional independence assumption affect the number of parameters the model needs to learn?
Describe the learning or training phase for a Naïve Bayes Classifier that handles discrete input attributes.
What is the Maximum A Posteriori (MAP) rule, and what role does it play in Generative Bayes Classifiers?
When using maximum likelihood estimates for training, what significant problem can arise if the training data is not perfectly representative?
What is the purpose of “smoothing” in the context of training a Naïve Bayes model?
According to the “Play Tennis” example, how many parameters were required for a general Generative Bayes Classifier compared to a Naïve Bayes Classifier?

Answer Key

The three major types of classification approaches are Discriminative classifiers, which directly estimate a decision boundary (e.g., SVM); Generative classifiers, which build a statistical model for the data (e.g., Naïve Bayes); and Instance-based classifiers, which use observations directly without a model (e.g., K-nearest neighbors).

The goal of a Bayes Classifier during testing is to predict the class for a given sample x. Specifically, it seeks to find the class c that maximizes the posterior probability p(c

x1, x2, …, xp).

A generative model builds a statistical model for each class, modeling the class-conditional probability P(X

C) and the prior probability P(C). A discriminative model, in contrast, directly estimates the posterior probability P(C

X) or learns a direct mapping from inputs to class labels without modeling the underlying data distribution.

The core assumption is that all input attributes (features) are conditionally independent of one another, given the class. This assumption is crucial because it simplifies the problem of estimating the joint probability P(X1, …, Xp

C) by allowing it to be calculated as the product of individual conditional probabilities P(Xi

C).

The assumption dramatically reduces the number of parameters. Instead of estimating a parameter for every possible combination of attribute values (multiplicative complexity), the model only needs to estimate parameters for each attribute conditioned on the class (additive complexity), making it feasible to train with much less data.

The learning phase for a discrete Naïve Bayes Classifier involves estimating probabilities from the training data using frequencies (maximum likelihood estimates). For each class ci, the model estimates the prior probability P(C=ci), and for every attribute value xjk, it estimates the conditional probability P(Xj=xjk

C=ci).

The MAP rule is a decision principle used to select the most probable class for a new instance. In Generative Bayes Classifiers, it means choosing the class cj that maximizes the product of the class-conditional probability P(x1, …, xp

cj) and the prior probability P(cj).

If a specific combination of an attribute value and a class never appears in the training data, its maximum likelihood probability estimate will be zero. This “zero probability” can then nullify the entire posterior probability calculation for that class, regardless of the evidence from other attributes.
Smoothing is a technique used to avoid the problem of zero probabilities. It adjusts the counts used for probability estimation, for example, by adding a small value (like 1) to the numerator, which ensures that no estimated conditional probability is ever exactly zero.
For the “Play Tennis” example, the general Generative Bayes Classifier required estimating 72 parameters. In contrast, the Naïve Bayes Classifier, with its conditional independence assumption, only required the estimation of 20 parameters.

Essay Questions

Instructions: The following questions are designed to encourage a deeper synthesis of the source material. Answers are not provided.

Compare and contrast the standard Generative Bayes Classifier and the Naïve Bayes Classifier. Discuss the trade-offs between model complexity, data requirements, and the validity of underlying assumptions, using the “Play Tennis” dataset as a specific case study.
The source material frames classification as a process involving a Task, Representation, Score Function, and Search/Optimization. Elaborate on how each of these components is realized within the Naïve Bayes Classifier framework for discrete attributes.
Explain the mathematical justification for the Naïve Bayes classification rule. Begin with the primary goal of any Bayes Classifier, apply Bayes’ Rule, and then explicitly incorporate the conditional independence assumption. Conclude by explaining why the P(X) term from Bayes’ Rule can be disregarded during the final MAP classification step.
Discuss the “zero probability” problem in detail. Explain why this is a critical issue for models trained with maximum likelihood estimation and how the smoothing techniques presented in the source material provide a robust solution to avoid overfitting and improve model generalization.
While the document primarily focuses on discrete attributes, it briefly mentions Gaussian Naïve Bayes. Based on the fundamental structure of the Naïve Bayes model, hypothesize how a Gaussian Naïve Bayes classifier would differ from the discrete model in its learning (parameter estimation) and testing phases. What kind of feature data would this variant be suited for?

Glossary of Key Terms

Term	Definition
Bayes’ Rule	A fundamental theorem of probability used to find a conditional probability. It states: `P(C \\| X) = P(X \\| C) * P(C) / P(X)`.
Bayes Classifier (BC)	A probabilistic classification approach that treats feature attributes and class labels as random variables, aiming to predict the class c that maximizes the posterior probability `p(c \\| x1, x2, ..., xp)`.
Conditional Independence Assumption	The core simplifying assumption of the Naïve Bayes Classifier, which posits that all feature attributes are independent of each other given the class label.
Discriminative Classifier	A type of classifier that directly estimates a decision boundary or models the posterior probability `P(C \\| X)` without explicitly modeling the data distribution.
Generative Bayes Classifier (GBC)	A classifier that works by building a generative statistical model. It models the prior probability of each class P(C) and the class-conditional probability of the data `P(X \\| C)`.
Instance-based Classifier	A classification approach that uses training observations directly to make predictions without building an explicit model. An example is the K-nearest neighbors (KNN) algorithm.
Learning Phase (Training)	The process of estimating the parameters of a model from a training dataset. For Naïve Bayes, this involves calculating the prior and conditional probabilities from data frequencies.
Maximum A Posteriori (MAP) Rule	A decision rule for classification that selects the class with the highest posterior probability given the observed data. It is equivalent to finding the class c that maximizes the product `P(X \\| c) * P(c)`.
Maximum Likelihood Estimates (MLE)	A method for estimating model parameters by finding the parameter values that maximize the likelihood of making the observations given the parameters. In this context, it involves using the frequencies in the data to estimate probabilities.
Naïve Bayes Classifier (NBC)	A type of Generative Bayes Classifier that simplifies learning by making the strong (naïve) assumption that all features are conditionally independent given the class.
Posterior Probability	The probability of a class C after observing the data X, denoted `P(C \\| X)`.
Prior Probability	The initial probability of a class C before any data has been observed, denoted P(C).
Smoothing	A technique used to prevent zero probabilities during model training, typically when a specific feature-value and class combination is absent from the training data. It adjusts probability estimates to ensure no probability is exactly zero.
Testing Phase	The process of using a trained model to assign a class label to new, previously unseen data instances.

2025 Fall UVA CS - Machine Learning