Homework 1: Probability and MLE

Problem 1

Problem 1.1 (Gaussian Density Operations).

Let be the density of for (with ), defined on . Answer the following:

Let

Is a valid density? If so, identify the distribution explicitly, including its parameters.

Let

Is a valid density? If so, describe its distributional form. Is it generally a single Gaussian?

Let , where and are independent. What is the distribution of ? Give its mean and variance.

Let where . Derive the density of and state its support.

Problem 2

Problem 1.2 (KL Divergence).

Consider two discrete distributions over :

Hand-calculate and .

Now let the sample space be and consider

Compute and . Use the convention that , because , and that for .

For a discrete distribution on , define , which is the set of all elements with positive probability.

Assume , must it be true that ? Briefly justify.

Assume , must it be true that ? If yes, explain; if not, give a counterexample and explain why.

Consider the following divergence:

Here , and and are the densities of and respectively. Answer the following questions:

Is this divergence a valid notion of discrepancy? Explain your reasoning.

Under what conditions does this divergence reduce to the KL divergence (either or )?

Problem 3

Problem 1.3 (Categorical MLE with Softmax).

Let be i.i.d. observations taking values in . We parameterize the (unconditional) categorical probabilities via a softmax:

Here are unconstrained parameters. Note: the softmax is invariant to adding a constant to all coordinates, i.e. for any .

Exercise:

Write down the log-likelihood for this model (you may express it using the empirical counts ).

Compute the gradient and set it to zero to derive the maximum likelihood estimator . Discuss: Is the parameter unique? Is the induced distribution unique?

Directly evaluating exponentials can overflow/underflow. For each expression below, state whether it is numerically stable (in standard 64-bit floating point) and explain briefly:

,

,

.

(Clarify whether it may produce Inf/NaN or loss of significance.)

Describe a numerically stable way to compute the softmax for a general vector , and give a stable formula for the log-likelihood.

Problem 4

Problem 1.4 (Energy-Based Models with Langevin Sampling).

We want to fit a dataset with an energy-based model on of the form

Here is the (unnormalized) log-density (i.e., negative energy), and is the partition function. To ensure integrability, we use

with a neural network (e.g. MLP) parameterized by , , and a scalar so the Gaussian term is isotropic (). We write ; for simplicity, treat as fixed hyperparameters (e.g. , ) unless you wish to tune them manually.

You will implement a toy MLE pipeline in and test it on the provided dataset in the starter Colab.

https://colab.research.google.com/drive/1aNetPvIM2LH2PinAKQxVs_Utwpyy4uYn?usp=sharing

Sampling (Langevin vs. grid). Exact sampling from is not available. Besides a provided brute-force grid sampler (on a bounded D window with discretization), implement Langevin Algorithm:

with stepsize . It is expected that approximately follows when the number of steps is very large and the step size is very small. Note: since does not depend on , . Initialize, e.g., ; run multiple times.

Task: Implement Langevin dynamics and qualitatively compare to the grid sampler.

MLE training. Define the average log-likelihood

where is the empirical distribution of the dataset. The negative log-likelihood is therefore

In the lecture we have shown that

Task: Implement gradient descent on the negative log-likelihood , equivalently gradient ascent on :

where the model expectation is approximated with samples from your Langevin sampler at current . Train the model until it fits the toy data well (e.g., samples visually match data).

Optional Problems

Optional Problem 1

Problem 1.5 (Laplace MLE).

Consider a dataset generated from a distribution with density function:

This is known as the Laplace (or double exponential) distribution.

Exercise:

Write down the log-likelihood function for this distribution.

Find the maximum likelihood estimator by maximizing .

Show that can be written as a simple function of .

Optional Problem 2

Problem 1.6 (Exponential MLE).

Consider a dataset of nonnegative numbers generated from an exponential distribution with density:

Exercise:

Write down the log-likelihood function .

Find the maximum likelihood estimator by maximizing .

Show that , where is the sample mean.

Optional Problem 3

Problem 1.7 (Poisson MLE).

Consider a dataset of non-negative integers following a Poisson distribution with probability mass function:

Exercise:

Write down the log-likelihood function .

Find the maximum likelihood estimator .

Prove that equals the sample mean of the observations.