Problem 1
Problem 1.1 (Gaussian Density Operations).Let
be the density of for (with ), defined on . Answer the following:
Let
Is
a valid density? If so, identify the distribution explicitly, including its parameters. Let
Is
a valid density? If so, describe its distributional form. Is it generally a single Gaussian? Let
, where and are independent. What is the distribution of ? Give its mean and variance. Let
where . Derive the density of and state its support.
Problem 2
Problem 1.2 (KL Divergence).
Consider two discrete distributions over
: Hand-calculate
and . Now let the sample space be
and consider Compute
and . Use the convention that , because , and that for . For a discrete distribution
on , define , which is the set of all elements with positive probability.
- Assume
, must it be true that ? Briefly justify. - Assume
, must it be true that ? If yes, explain; if not, give a counterexample and explain why. Consider the following divergence:
Here
, and and are the densities of and respectively. Answer the following questions:
- Is this divergence a valid notion of discrepancy? Explain your reasoning.
- Under what conditions does this divergence reduce to the KL divergence (either
or )?
Problem 3
Problem 1.3 (Categorical MLE with Softmax).Let
be i.i.d. observations taking values in . We parameterize the (unconditional) categorical probabilities via a softmax: Here
are unconstrained parameters. Note: the softmax is invariant to adding a constant to all coordinates, i.e. for any . Exercise:
Write down the log-likelihood
for this model (you may express it using the empirical counts ). Compute the gradient
and set it to zero to derive the maximum likelihood estimator . Discuss: Is the parameter unique? Is the induced distribution unique? Directly evaluating exponentials can overflow/underflow. For each expression below, state whether it is numerically stable (in standard 64-bit floating point) and explain briefly:
, , . (Clarify whether it may produce
Inf/NaNor loss of significance.)Describe a numerically stable way to compute the softmax for a general vector
, and give a stable formula for the log-likelihood.
Problem 4
Problem 1.4 (Energy-Based Models with Langevin Sampling).We want to fit a dataset
with an energy-based model on of the form Here
is the (unnormalized) log-density (i.e., negative energy), and is the partition function. To ensure integrability, we use with
a neural network (e.g. MLP) parameterized by , , and a scalar so the Gaussian term is isotropic ( ). We write ; for simplicity, treat as fixed hyperparameters (e.g. , ) unless you wish to tune them manually. You will implement a toy MLE pipeline in
and test it on the provided dataset in the starter Colab. https://colab.research.google.com/drive/1aNetPvIM2LH2PinAKQxVs_Utwpyy4uYn?usp=sharing
Sampling (Langevin vs. grid). Exact sampling from
is not available. Besides a provided brute-force grid sampler (on a bounded D window with discretization), implement Langevin Algorithm: with stepsize
. It is expected that approximately follows when the number of steps is very large and the step size is very small. Note: since does not depend on , . Initialize, e.g., ; run multiple times. Task: Implement Langevin dynamics and qualitatively compare to the grid sampler.
MLE training. Define the average log-likelihood
where
is the empirical distribution of the dataset. The negative log-likelihood is therefore In the lecture we have shown that
Task: Implement gradient descent on the negative log-likelihood
, equivalently gradient ascent on : where the model expectation is approximated with samples from your Langevin sampler at current
. Train the model until it fits the toy data well (e.g., samples visually match data).
Optional Problems
Optional Problem 1
Problem 1.5 (Laplace MLE).Consider a dataset
generated from a distribution with density function: This is known as the Laplace (or double exponential) distribution.
Exercise:
- Write down the log-likelihood function
for this distribution. - Find the maximum likelihood estimator
by maximizing . - Show that
can be written as a simple function of .
Optional Problem 2
Problem 1.6 (Exponential MLE).Consider a dataset
of nonnegative numbers generated from an exponential distribution with density: Exercise:
- Write down the log-likelihood function
. - Find the maximum likelihood estimator
by maximizing . - Show that
, where is the sample mean.
Optional Problem 3
Problem 1.7 (Poisson MLE).Consider a dataset
of non-negative integers following a Poisson distribution with probability mass function: Exercise:
- Write down the log-likelihood function
. - Find the maximum likelihood estimator
. - Prove that
equals the sample mean of the observations.