.. _ch3_distributions:

====================================
Chapter 3: Common Distributions
====================================

Imagine you are the operations analyst for a busy hospital emergency department.
Every shift, you face questions that are fundamentally statistical: How many
patients will arrive in the next hour?  How long until the next trauma case?
What fraction of arrivals need intensive care?  Are the vital signs of this
patient abnormally high?

Each of these questions points to a different probability distribution.  This
chapter introduces the distributions you will encounter most often in
likelihood-based inference, and we will use this **hospital ED scenario** as a
running thread that connects them all.  By the end, you will see that the
Poisson, Exponential, Binomial, Normal, and Gamma are not isolated formulas ---
they are different views of the same underlying reality.

For every distribution we provide the probability mass/density function, the
parameters and their meaning, the support, the mean (derived), the variance
(derived), the moment generating function, and code that computes and verifies
each quantity.

When you encounter a new dataset and need to choose a model, you will come back
here.  When you need to derive a likelihood function in :ref:`ch4_likelihood`,
the formulas in this chapter are the starting point.

.. contents:: Chapter Contents
   :local:
   :depth: 2
   :class: this-will-duplicate-information-and-it-is-still-useful-here


.. _sec_discrete_distributions:

3.1 Discrete Distributions
============================


3.1.1 Bernoulli Distribution
------------------------------

A single trial with two outcomes: success (1) or failure (0).  This is the
simplest possible random variable --- a single yes/no question --- and it is
the building block for many more complex distributions.

Back at the hospital, each arriving patient either is or is not a trauma case.
If we label trauma as "success" (:math:`X=1`) and non-trauma as "failure"
(:math:`X=0`), a single patient's classification is a Bernoulli trial.

**PMF.**

.. math::

   p(x) = p^x (1-p)^{1-x}, \qquad x \in \{0, 1\}.

**Parameters.** :math:`p \in [0,1]` (probability of success).

**Support.** :math:`\{0, 1\}`.

Let us compute this PMF directly and verify it against SciPy.

.. code-block:: python

   # Bernoulli PMF: compute by hand and verify with SciPy
   import numpy as np
   from scipy import stats

   p = 0.15  # probability a patient is a trauma case

   # PMF from the formula: p(x) = p^x * (1-p)^(1-x)
   for x in [0, 1]:
       pmf_manual = p**x * (1 - p)**(1 - x)
       pmf_scipy  = stats.bernoulli.pmf(x, p)
       print(f"  P(X={x}) = {pmf_manual:.4f}  (scipy: {pmf_scipy:.4f})")

**Mean.**

.. math::

   E[X] = 0 \cdot (1-p) + 1 \cdot p = p.

**Variance.**

.. math::

   E[X^2] = 0^2(1-p) + 1^2 p = p, \qquad
   \operatorname{Var}(X) = p - p^2 = p(1-p).

**MGF.**

.. math::

   M_X(t) = (1-p) + p\,e^t.

.. code-block:: python

   # Bernoulli: verify E[X] and Var(X) by simulation
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   p = 0.15  # trauma probability
   samples = stats.bernoulli.rvs(p, size=100_000)

   E_theory   = p
   Var_theory = p * (1 - p)
   print(f"Bernoulli(p={p}) -- 'Is this patient a trauma case?'")
   print(f"  Theory:     E[X] = {E_theory:.4f},  Var(X) = {Var_theory:.4f}")
   print(f"  Simulation: E[X] = {samples.mean():.4f},  Var(X) = {samples.var():.4f}")

So the Bernoulli answers the simplest hospital question: for one patient, is it
a trauma case or not?  When we observe many patients and count the total number
of trauma cases, we arrive at the Binomial.


3.1.2 Binomial Distribution
------------------------------

Now suppose :math:`n` patients arrive during one shift, and each independently
has probability :math:`p = 0.15` of being a trauma case.  The total number of
trauma cases :math:`X` follows a Binomial distribution:
:math:`X \sim \text{Bin}(n, p)`.

The binomial coefficient :math:`\binom{n}{x}` accounts for all the different
orderings in which :math:`x` successes can occur within :math:`n` trials.

**PMF.**

.. math::

   p(x) = \binom{n}{x} p^x (1-p)^{n-x}, \qquad x \in \{0, 1, \dots, n\}.

**Parameters.** :math:`n \in \{1,2,\dots\}` (number of trials),
:math:`p \in [0,1]` (success probability).

.. code-block:: python

   # Binomial PMF: how many trauma cases in n=30 arrivals?
   import numpy as np
   from scipy import stats
   from scipy.special import comb

   n = 30    # patients arriving during one shift
   p = 0.15  # trauma probability

   print(f"Bin(n={n}, p={p}): PMF for selected x values")
   print(f"{'x':>4s}  {'formula':>10s}  {'scipy':>10s}")
   for x in range(0, 11):
       pmf_manual = comb(n, x, exact=True) * p**x * (1 - p)**(n - x)
       pmf_scipy  = stats.binom.pmf(x, n, p)
       print(f"{x:4d}  {pmf_manual:10.6f}  {pmf_scipy:10.6f}")

**Mean derivation.**  Since :math:`X = \sum_{i=1}^n X_i` where each
:math:`X_i \sim \text{Bernoulli}(p)`, by linearity of expectation:

.. math::

   E[X] = \sum_{i=1}^n E[X_i] = np.

**Variance derivation.**  The :math:`X_i` are independent, so

.. math::

   \operatorname{Var}(X) = \sum_{i=1}^n \operatorname{Var}(X_i) = np(1-p).

**MGF.**  Using independence:

.. math::

   M_X(t) = \prod_{i=1}^n M_{X_i}(t) = \left[(1-p) + pe^t\right]^n.

.. code-block:: python

   # Binomial: verify mean and variance by simulation
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   n, p = 30, 0.15
   samples = stats.binom.rvs(n, p, size=100_000)

   E_theory   = n * p
   Var_theory = n * p * (1 - p)
   print(f"Bin(n={n}, p={p}) -- 'How many trauma cases this shift?'")
   print(f"  Theory:     E[X] = {E_theory:.2f},  Var(X) = {Var_theory:.4f}")
   print(f"  Simulation: E[X] = {samples.mean():.2f},  Var(X) = {samples.var():.4f}")

   # Text histogram: frequency of each count
   print(f"\n  Distribution of trauma counts (out of {n} patients):")
   for x in range(0, 12):
       bar = '#' * int(stats.binom.pmf(x, n, p) * 200)
       print(f"    x={x:2d}: {bar} {stats.binom.pmf(x, n, p):.4f}")

**Relationship to Bernoulli.** The Binomial with :math:`n=1` is the Bernoulli.
We will see in Section 3.3 that when :math:`n` is large and :math:`p` is small,
the Binomial is well-approximated by the Poisson; and when :math:`n` is large
with moderate :math:`p`, by the Normal (CLT).


3.1.3 Poisson Distribution
----------------------------

Here is perhaps the most important discrete distribution for our hospital
scenario.  Patient arrivals to the ED happen at random but at a roughly
constant average rate --- say :math:`\lambda = 4.5` patients per hour.  The
count of arrivals in any given hour follows a Poisson distribution:
:math:`X \sim \text{Pois}(\lambda)`.

What's the intuition?  If events are independent and the rate does not change,
the count in a fixed time interval is Poisson.  This applies to patient
arrivals, phone calls, mutations on a chromosome, and many other settings.

**PMF.**

.. math::

   p(x) = \frac{e^{-\lambda}\,\lambda^x}{x!}, \qquad x \in \{0, 1, 2, \dots\}.

**Parameter.** :math:`\lambda > 0` (rate, average number of events).

.. code-block:: python

   # Poisson PMF: patient arrivals per hour (lambda = 4.5)
   import numpy as np
   from scipy import stats
   from math import factorial, exp

   lam = 4.5  # average patients per hour

   print(f"Poisson(lambda={lam}): PMF for x = 0..12")
   print(f"{'x':>4s}  {'formula':>10s}  {'scipy':>10s}")
   for x in range(13):
       pmf_manual = exp(-lam) * lam**x / factorial(x)
       pmf_scipy  = stats.poisson.pmf(x, lam)
       print(f"{x:4d}  {pmf_manual:10.6f}  {pmf_scipy:10.6f}")

**Mean derivation.**

.. math::

   E[X] = \sum_{x=0}^{\infty} x \frac{e^{-\lambda}\lambda^x}{x!}
        = e^{-\lambda} \sum_{x=1}^{\infty} \frac{\lambda^x}{(x-1)!}
        = e^{-\lambda} \lambda \sum_{k=0}^{\infty} \frac{\lambda^k}{k!}
        = e^{-\lambda} \lambda\, e^{\lambda} = \lambda.

**Variance derivation.**  First compute :math:`E[X(X-1)]`:

.. math::

   E[X(X-1)] = \sum_{x=2}^{\infty} x(x-1)\frac{e^{-\lambda}\lambda^x}{x!}
   = e^{-\lambda}\lambda^2 \sum_{k=0}^{\infty} \frac{\lambda^k}{k!} = \lambda^2.

Then :math:`E[X^2] = E[X(X-1)] + E[X] = \lambda^2 + \lambda`, and

.. math::

   \operatorname{Var}(X) = \lambda^2 + \lambda - \lambda^2 = \lambda.

So the Poisson has the notable property that its **mean equals its variance**.
This is a quick diagnostic: if your count data have a sample mean close to the
sample variance, the Poisson may be a good fit.

.. code-block:: python

   # Poisson: verify mean = variance = lambda by simulation
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   lam = 4.5  # patients per hour
   samples = stats.poisson.rvs(lam, size=100_000)

   print(f"Poisson(lambda={lam}) -- 'Patient arrivals per hour'")
   print(f"  Theory:     E[X] = {lam:.2f},  Var(X) = {lam:.2f}")
   print(f"  Simulation: E[X] = {samples.mean():.2f},  Var(X) = {samples.var():.2f}")
   print(f"  Note: mean ~ variance is the hallmark of the Poisson")

   # Practical question: P(more than 8 patients in an hour)?
   P_gt_8 = 1 - stats.poisson.cdf(8, lam)
   sim_gt_8 = np.mean(samples > 8)
   print(f"\n  P(X > 8) = {P_gt_8:.4f}  (simulated: {sim_gt_8:.4f})")
   print(f"  This is the probability of a surge hour for staffing decisions.")

**MGF.**

.. math::

   M_X(t) = \sum_{x=0}^{\infty} e^{tx}\frac{e^{-\lambda}\lambda^x}{x!}
   = e^{-\lambda}\sum_{x=0}^{\infty}\frac{(\lambda e^t)^x}{x!}
   = e^{-\lambda}\,e^{\lambda e^t}
   = \exp\!\left(\lambda(e^t - 1)\right).

**Connection to Exponential.** If patient *counts* per hour are Poisson with
rate :math:`\lambda`, then the *time between* consecutive arrivals is
Exponential with the same rate :math:`\lambda`.  We will verify this connection
in code after introducing the Exponential distribution in Section 3.2.3.


3.1.4 Geometric Distribution
-------------------------------

Back in the ED, suppose each arriving patient independently has probability
:math:`p = 0.15` of being a trauma case.  You want to know: how many patients
do we see before the first trauma case?  That count follows a Geometric
distribution: :math:`X \sim \text{Geom}(p)`.

**PMF.**

.. math::

   p(x) = (1-p)^{x-1}\,p, \qquad x \in \{1, 2, 3, \dots\}.

**Parameter.** :math:`p \in (0,1]`.

.. code-block:: python

   # Geometric PMF: patients until first trauma case
   import numpy as np
   from scipy import stats

   p = 0.15  # trauma probability

   print(f"Geom(p={p}): P(first trauma on patient x)")
   print(f"{'x':>4s}  {'formula':>10s}  {'scipy':>10s}")
   for x in range(1, 16):
       pmf_manual = (1 - p)**(x - 1) * p
       pmf_scipy  = stats.geom.pmf(x, p)
       print(f"{x:4d}  {pmf_manual:10.6f}  {pmf_scipy:10.6f}")

**Mean derivation.**  Let :math:`q = 1-p`.

.. math::

   E[X] = \sum_{x=1}^{\infty} x\,q^{x-1}\,p = p \sum_{x=1}^{\infty} x\,q^{x-1}.

Using the identity :math:`\sum_{x=1}^{\infty} x\,r^{x-1} = \frac{1}{(1-r)^2}`
for :math:`|r| < 1`:

.. math::

   E[X] = p \cdot \frac{1}{(1-q)^2} = p \cdot \frac{1}{p^2} = \frac{1}{p}.

**Variance derivation.**  Similarly,
:math:`E[X(X-1)] = \frac{2q}{p^2}`, so
:math:`E[X^2] = \frac{2q}{p^2} + \frac{1}{p}`, and

.. math::

   \operatorname{Var}(X) = E[X^2] - (E[X])^2
   = \frac{2q}{p^2} + \frac{1}{p} - \frac{1}{p^2}
   = \frac{q}{p^2}
   = \frac{1-p}{p^2}.

**MGF.**

.. math::

   M_X(t) = \frac{pe^t}{1 - (1-p)e^t}, \qquad t < -\ln(1-p).

.. code-block:: python

   # Geometric: verify E[X] = 1/p and Var(X) = (1-p)/p^2
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   p = 0.15
   samples = stats.geom.rvs(p, size=100_000)

   E_theory   = 1 / p
   Var_theory = (1 - p) / p**2
   print(f"Geom(p={p}) -- 'How many patients until first trauma?'")
   print(f"  Theory:     E[X] = {E_theory:.2f},  Var(X) = {Var_theory:.2f}")
   print(f"  Simulation: E[X] = {samples.mean():.2f},  Var(X) = {samples.var():.2f}")
   print(f"  On average, you see {E_theory:.1f} patients before the first trauma case.")


3.1.5 Negative Binomial Distribution
---------------------------------------

The Geometric counts patients until the *first* trauma case.  What if you want
to know how many patients arrive until the *r-th* trauma case?  That is the
Negative Binomial: :math:`X \sim \text{NegBin}(r, p)`.

**PMF.**

.. math::

   p(x) = \binom{x-1}{r-1} p^r (1-p)^{x-r}, \qquad x \in \{r, r+1, r+2, \dots\}.

**Parameters.** :math:`r \in \{1,2,\dots\}` (number of successes needed),
:math:`p \in (0,1]`.

Note: When :math:`r = 1` the Negative Binomial reduces to the Geometric.

**Mean.**  Since :math:`X` is the sum of :math:`r` independent
:math:`\text{Geom}(p)` random variables,

.. math::

   E[X] = \frac{r}{p}.

**Variance.**

.. math::

   \operatorname{Var}(X) = \frac{r(1-p)}{p^2}.

**MGF.**

.. math::

   M_X(t) = \left(\frac{pe^t}{1 - (1-p)e^t}\right)^r, \qquad t < -\ln(1-p).

.. code-block:: python

   # Negative Binomial: patients until r=3 trauma cases
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   r, p = 3, 0.15
   # scipy NegBin counts failures before r-th success; shift for total trials
   rv = stats.nbinom(r, p)
   samples = rv.rvs(size=100_000) + r  # shift to total trials

   E_theory   = r / p
   Var_theory = r * (1 - p) / p**2
   print(f"NegBin(r={r}, p={p}) -- 'Patients until {r}rd trauma case'")
   print(f"  Theory:     E[X] = {E_theory:.2f},  Var(X) = {Var_theory:.2f}")
   print(f"  Simulation: E[X] = {samples.mean():.2f},  Var(X) = {samples.var():.2f}")

   # Verify NegBin(r=1,p) = Geom(p)
   negbin1 = stats.nbinom.rvs(1, p, size=100_000) + 1
   geom    = stats.geom.rvs(p, size=100_000)
   print(f"\n  NegBin(1, {p}) mean = {negbin1.mean():.2f}")
   print(f"  Geom({p})      mean = {geom.mean():.2f}")
   print(f"  These match: NegBin(r=1) = Geometric")


3.1.6 Hypergeometric Distribution
------------------------------------

Now a different sampling question.  The hospital pharmacy has :math:`N = 200`
vials of a certain drug, of which :math:`K = 50` are from a recalled batch.
A pharmacist randomly selects :math:`n = 30` vials for inspection (without
replacement).  How many recalled vials :math:`X` are in the sample?

This is sampling *without replacement*, so the draws are not independent ---
and that is exactly what the Hypergeometric models.

**PMF.**

.. math::

   p(x) = \frac{\binom{K}{x}\binom{N-K}{n-x}}{\binom{N}{n}},
   \qquad x \in \{\max(0, n+K-N), \dots, \min(n, K)\}.

**Parameters.** :math:`N` (population size), :math:`K` (number of successes in
population), :math:`n` (draw size).

**Mean.**

.. math::

   E[X] = n\,\frac{K}{N}.

*Derivation.*  Write :math:`X = \sum_{i=1}^n X_i` where :math:`X_i = 1` if
the :math:`i`-th drawn item is a success.  By symmetry,
:math:`E[X_i] = K/N` for every :math:`i`, so :math:`E[X] = nK/N`.

**Variance.**

.. math::

   \operatorname{Var}(X) = n\,\frac{K}{N}\,\frac{N-K}{N}\,\frac{N-n}{N-1}.

The factor :math:`(N-n)/(N-1)` is the **finite-population correction**;
it is close to 1 when :math:`n \ll N`, recovering the Binomial variance.

.. code-block:: python

   # Hypergeometric: recalled vials in a pharmacy sample
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   N, K, n = 200, 50, 30  # population, recalled, sample size
   rv = stats.hypergeom(N, K, n)
   samples = rv.rvs(size=100_000)

   E_theory   = n * K / N
   Var_theory = n * (K/N) * ((N-K)/N) * ((N-n)/(N-1))
   print(f"Hypergeometric(N={N}, K={K}, n={n})")
   print(f"  Theory:     E[X] = {E_theory:.2f},  Var(X) = {Var_theory:.4f}")
   print(f"  Simulation: E[X] = {samples.mean():.2f},  Var(X) = {samples.var():.4f}")

   # Compare to Binomial (sampling WITH replacement)
   p_binom = K / N
   Var_binom = n * p_binom * (1 - p_binom)
   print(f"\n  Binomial Var (with replacement):       {Var_binom:.4f}")
   print(f"  Hypergeometric Var (without):           {Var_theory:.4f}")
   print(f"  Finite-population correction factor:    {(N-n)/(N-1):.4f}")
   print(f"  The correction reduces variance because drawing without")
   print(f"  replacement gives less variable results.")

**MGF.**  A closed-form MGF exists but is rarely used; the Hypergeometric is
typically handled via its PMF directly.


.. _sec_continuous_distributions:

3.2 Continuous Distributions
==============================


3.2.1 Uniform Distribution
----------------------------

Before a patient arrives, you might have no information about when they will
show up within the next hour.  If every moment is equally likely, the arrival
time (measured in minutes within the hour) follows a Uniform distribution.
Write :math:`X \sim \text{Uniform}(a, b)`.

The uniform distribution is the simplest continuous distribution --- it
represents "maximum ignorance" over an interval.

**PDF.**

.. math::

   f(x) = \frac{1}{b - a}, \qquad a \leq x \leq b.

**Parameters.** :math:`a < b` (endpoints).

**Mean derivation.**

.. math::

   E[X] = \int_a^b x \cdot \frac{1}{b-a}\,dx
        = \frac{1}{b-a}\,\frac{x^2}{2}\bigg|_a^b
        = \frac{b^2 - a^2}{2(b-a)}
        = \frac{a+b}{2}.

**Variance derivation.**

.. math::

   E[X^2] = \int_a^b x^2 \cdot \frac{1}{b-a}\,dx
           = \frac{b^3 - a^3}{3(b-a)}
           = \frac{a^2 + ab + b^2}{3}.

.. math::

   \operatorname{Var}(X) = \frac{a^2+ab+b^2}{3} - \left(\frac{a+b}{2}\right)^2
   = \frac{(b-a)^2}{12}.

**MGF.**

.. math::

   M_X(t) = \frac{e^{tb} - e^{ta}}{t(b-a)}, \qquad t \neq 0.

.. code-block:: python

   # Uniform distribution: arrival time within the hour
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   a, b = 0, 60  # minutes within the hour
   rv = stats.uniform(loc=a, scale=b - a)
   samples = rv.rvs(size=100_000)

   E_theory   = (a + b) / 2
   Var_theory = (b - a)**2 / 12
   print(f"Uniform({a}, {b}) -- 'Arrival minute within the hour'")
   print(f"  Theory:     E[X] = {E_theory:.2f},  Var(X) = {Var_theory:.2f}")
   print(f"  Simulation: E[X] = {samples.mean():.2f},  Var(X) = {samples.var():.2f}")

   # PDF is constant: verify
   x_test = [10, 30, 50]
   for x in x_test:
       print(f"  f({x}) = {rv.pdf(x):.6f}  (should be {1/(b-a):.6f})")


3.2.2 Normal (Gaussian) Distribution
---------------------------------------

A patient's heart rate is a continuous measurement influenced by many small,
roughly independent factors: fitness level, anxiety, caffeine intake, pain.
The Central Limit Theorem (:ref:`ch2_random_variables`) tells us that such
sums of many small effects tend toward the Normal distribution.  A healthy
resting heart rate might be modeled as :math:`X \sim N(\mu, \sigma^2)` with
:math:`\mu = 80` bpm and :math:`\sigma = 12` bpm.

**PDF.**

.. math::

   f(x) = \frac{1}{\sqrt{2\pi}\,\sigma}
          \exp\!\left(-\frac{(x - \mu)^2}{2\sigma^2}\right),
          \qquad x \in \mathbb{R}.

**Parameters.** :math:`\mu \in \mathbb{R}` (mean), :math:`\sigma^2 > 0`
(variance).

**Support.** :math:`(-\infty, \infty)`.

.. code-block:: python

   # Normal PDF: heart rate distribution
   import numpy as np
   from scipy import stats

   mu    = 80   # mean heart rate (bpm)
   sigma = 12   # standard deviation

   # Compute PDF at several heart rate values
   print(f"Normal(mu={mu}, sigma={sigma}): heart rate PDF")
   print(f"{'HR (bpm)':>10s}  {'f(x)':>10s}")
   for hr in [50, 60, 68, 80, 92, 100, 110]:
       # PDF from the formula
       fx = (1 / (np.sqrt(2 * np.pi) * sigma)) * np.exp(-(hr - mu)**2 / (2 * sigma**2))
       print(f"{hr:10d}  {fx:10.6f}")

**Mean.**  By symmetry of the bell curve around :math:`\mu`, :math:`E[X] = \mu`.

*Formal derivation.*  Substitute :math:`z = (x-\mu)/\sigma`:

.. math::

   E[X] = \int_{-\infty}^{\infty} x \, f(x)\,dx
   = \int_{-\infty}^{\infty}(\sigma z + \mu)\frac{1}{\sqrt{2\pi}}e^{-z^2/2}\,dz
   = \sigma\underbrace{\int_{-\infty}^{\infty}z\,\frac{e^{-z^2/2}}{\sqrt{2\pi}}\,dz}_{= 0 \text{ (odd function)}}
     + \mu\underbrace{\int_{-\infty}^{\infty}\frac{e^{-z^2/2}}{\sqrt{2\pi}}\,dz}_{= 1}
   = \mu.

**Variance derivation.**

.. math::

   \operatorname{Var}(X) = E[(X-\mu)^2]
   = \int_{-\infty}^{\infty}(x-\mu)^2 f(x)\,dx
   = \sigma^2 \int_{-\infty}^{\infty} z^2 \frac{e^{-z^2/2}}{\sqrt{2\pi}}\,dz.

The last integral evaluates to 1 (by integration by parts or by recognizing
:math:`E[Z^2] = \operatorname{Var}(Z) + (E[Z])^2 = 1 + 0` for
:math:`Z \sim N(0,1)`).  Hence :math:`\operatorname{Var}(X) = \sigma^2`.

.. code-block:: python

   # Normal: verify moments and compute clinically useful probabilities
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   mu, sigma = 80, 12
   rv = stats.norm(loc=mu, scale=sigma)
   samples = rv.rvs(size=100_000)

   print(f"Normal(mu={mu}, sigma^2={sigma**2}) -- 'Resting heart rate'")
   print(f"  Theory:     E[X] = {mu},  Var(X) = {sigma**2}")
   print(f"  Simulation: E[X] = {samples.mean():.2f},  Var(X) = {samples.var():.2f}")
   print(f"\n  68-95-99.7 rule:")
   print(f"  P({mu-sigma} < X < {mu+sigma}) = "
         f"{rv.cdf(mu+sigma) - rv.cdf(mu-sigma):.4f}  (~0.6827)")
   print(f"  P({mu-2*sigma} < X < {mu+2*sigma}) = "
         f"{rv.cdf(mu+2*sigma) - rv.cdf(mu-2*sigma):.4f}  (~0.9545)")

   # Clinical question: what fraction of patients have HR > 100?
   P_high_hr = 1 - rv.cdf(100)
   print(f"\n  P(HR > 100 bpm) = {P_high_hr:.4f}")
   print(f"  About {P_high_hr:.1%} of healthy patients have resting HR above 100.")

**MGF.**

.. math::

   M_X(t) = \exp\!\left(\mu t + \frac{\sigma^2 t^2}{2}\right).

*Derivation.*  Complete the square in the exponent of the integral
:math:`E[e^{tX}]`:

.. math::

   M_X(t) &= \int_{-\infty}^{\infty} e^{tx}\,\frac{1}{\sqrt{2\pi}\sigma}
   e^{-(x-\mu)^2/(2\sigma^2)}\,dx \\
   &= \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}\sigma}
   \exp\!\left(-\frac{(x-\mu)^2 - 2\sigma^2 tx}{2\sigma^2}\right)dx.

Completing the square in the numerator:

.. math::

   (x-\mu)^2 - 2\sigma^2 tx = (x - (\mu + \sigma^2 t))^2 - 2\mu\sigma^2 t - \sigma^4 t^2.

The integral over the completed-square term equals :math:`\sqrt{2\pi}\sigma`,
leaving

.. math::

   M_X(t) = \exp\!\left(\mu t + \frac{\sigma^2 t^2}{2}\right). \quad \square

The **standard normal** :math:`Z \sim N(0,1)` has :math:`\mu = 0`,
:math:`\sigma^2 = 1`.


3.2.3 Exponential Distribution
---------------------------------

If patient arrivals follow a Poisson process with rate :math:`\lambda = 4.5`
per hour, then the *time between consecutive arrivals* follows an Exponential
distribution: :math:`T \sim \text{Exp}(\lambda)`.  This is the continuous
counterpart of the Geometric: both model "waiting times," but the Exponential
works in continuous time.

**PDF.**

.. math::

   f(x) = \lambda\,e^{-\lambda x}, \qquad x \geq 0.

**Parameter.** :math:`\lambda > 0` (rate).

.. code-block:: python

   # Exponential PDF: time between patient arrivals
   import numpy as np
   from scipy import stats

   lam = 4.5  # arrivals per hour (same lambda as our Poisson)

   print(f"Exp(lambda={lam}): PDF at selected times (hours)")
   print(f"{'t (hours)':>10s}  {'t (min)':>8s}  {'f(t)':>10s}")
   for t in [0.0, 0.05, 0.10, 0.20, 0.33, 0.50, 1.0]:
       ft = lam * np.exp(-lam * t)
       print(f"{t:10.2f}  {t*60:8.1f}  {ft:10.4f}")

**Mean derivation.**

.. math::

   E[X] = \int_0^{\infty} x\,\lambda e^{-\lambda x}\,dx.

Integration by parts with :math:`u = x`, :math:`dv = \lambda e^{-\lambda x}dx`:

.. math::

   E[X] = \left[-x\,e^{-\lambda x}\right]_0^{\infty}
   + \int_0^{\infty} e^{-\lambda x}\,dx
   = 0 + \frac{1}{\lambda} = \frac{1}{\lambda}.

**Variance derivation.**

.. math::

   E[X^2] = \int_0^{\infty} x^2 \lambda e^{-\lambda x}\,dx = \frac{2}{\lambda^2}

(by two applications of integration by parts). Therefore

.. math::

   \operatorname{Var}(X) = \frac{2}{\lambda^2} - \frac{1}{\lambda^2} = \frac{1}{\lambda^2}.

**MGF.**

.. math::

   M_X(t) = \frac{\lambda}{\lambda - t}, \qquad t < \lambda.

.. code-block:: python

   # Exponential: verify moments and the Poisson-Exponential connection
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   lam = 4.5  # patients per hour
   rv = stats.expon(scale=1/lam)  # scipy uses scale = 1/lambda
   samples = rv.rvs(size=100_000)

   E_theory   = 1 / lam
   Var_theory = 1 / lam**2
   print(f"Exp(lambda={lam}) -- 'Time between patient arrivals'")
   print(f"  Theory:     E[X] = {E_theory:.4f} hrs ({E_theory*60:.1f} min)")
   print(f"              Var(X) = {Var_theory:.4f}")
   print(f"  Simulation: E[X] = {samples.mean():.4f} hrs ({samples.mean()*60:.1f} min)")
   print(f"              Var(X) = {samples.var():.4f}")

   # KEY CONNECTION: Poisson count <-> Exponential inter-arrival time
   # Simulate Poisson arrivals by generating exponential gaps
   n_hours = 50_000
   gaps = stats.expon.rvs(scale=1/lam, size=n_hours * 20)  # generous
   arrival_times = np.cumsum(gaps)

   # Count arrivals in each 1-hour window
   counts = np.searchsorted(arrival_times, np.arange(1, n_hours + 1))
   counts = np.diff(np.concatenate([[0], counts]))

   print(f"\n  Poisson-Exponential connection:")
   print(f"  Generated Exp({lam}) inter-arrival times, counted per hour:")
   print(f"  Mean count per hour:     {counts.mean():.2f}  (should be ~{lam})")
   print(f"  Variance of counts:      {counts.var():.2f}  (should be ~{lam})")
   print(f"  Exponential gaps -> Poisson counts. Same process, two views.")

**Memoryless property.**  The Exponential is the only continuous distribution
with the memoryless property:

.. math::

   P(X > s + t \mid X > s) = P(X > t).

This says that given you have already waited :math:`s` units of time, the
distribution of *additional* waiting time is the same as if you had just
started waiting.

.. code-block:: python

   # Memoryless property: verify by simulation
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   lam = 4.5
   samples = stats.expon.rvs(scale=1/lam, size=500_000)

   s = 0.1  # already waited 0.1 hours (6 minutes)

   # P(X > s + t | X > s) should equal P(X > t)
   survived = samples[samples > s]  # condition on X > s
   additional_wait = survived - s   # remaining time

   t_test = 0.2  # test at t = 0.2 hours
   P_conditional = np.mean(additional_wait > t_test)
   P_fresh       = np.mean(samples > t_test)
   P_theory      = np.exp(-lam * t_test)

   print(f"Memoryless property: P(X > s+t | X > s) = P(X > t)")
   print(f"  s = {s}, t = {t_test}")
   print(f"  P(additional > {t_test} | survived {s}) = {P_conditional:.4f}")
   print(f"  P(X > {t_test}) from scratch             = {P_fresh:.4f}")
   print(f"  Theory: exp(-{lam}*{t_test})              = {P_theory:.4f}")


3.2.4 Gamma Distribution
--------------------------

The Exponential models the wait for *one* patient.  What about the total time
to see the next :math:`\alpha = 3` patients?  If each inter-arrival time is
independent :math:`\text{Exp}(\lambda)`, the sum of :math:`\alpha` such times
follows a Gamma distribution: :math:`X \sim \text{Gamma}(\alpha, \beta)` with
:math:`\beta = \lambda`.

**PDF.**

.. math::

   f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}\,x^{\alpha - 1}\,e^{-\beta x},
   \qquad x > 0,

where :math:`\Gamma(\alpha) = \int_0^{\infty} t^{\alpha-1}e^{-t}\,dt` is the
gamma function.

**Parameters.** :math:`\alpha > 0` (shape), :math:`\beta > 0` (rate).

(Some textbooks use a scale parameterization :math:`\theta = 1/\beta`; we use
the rate form here.)

.. code-block:: python

   # Gamma PDF: total waiting time for alpha=3 patients
   import numpy as np
   from scipy import stats
   from scipy.special import gamma as gamma_func

   alpha = 3    # number of patients to wait for
   beta  = 4.5  # arrival rate (same as our Poisson lambda)

   print(f"Gamma(alpha={alpha}, beta={beta}): 'Wait for {alpha} patients'")
   print(f"{'t (hours)':>10s}  {'formula':>10s}  {'scipy':>10s}")
   for t in [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.5, 2.0]:
       pdf_manual = (beta**alpha / gamma_func(alpha)) * t**(alpha-1) * np.exp(-beta * t)
       pdf_scipy  = stats.gamma.pdf(t, a=alpha, scale=1/beta)
       print(f"{t:10.2f}  {pdf_manual:10.4f}  {pdf_scipy:10.4f}")

**Mean derivation.**

.. math::

   E[X] = \int_0^{\infty} x \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1}
   e^{-\beta x}\,dx
   = \frac{\beta^\alpha}{\Gamma(\alpha)} \int_0^{\infty} x^{\alpha} e^{-\beta x}\,dx.

Substituting :math:`u = \beta x`:

.. math::

   = \frac{\beta^\alpha}{\Gamma(\alpha)} \cdot \frac{\Gamma(\alpha+1)}{\beta^{\alpha+1}}
   = \frac{\Gamma(\alpha+1)}{\beta\,\Gamma(\alpha)}
   = \frac{\alpha}{\beta},

using :math:`\Gamma(\alpha+1) = \alpha\,\Gamma(\alpha)`.

**Variance derivation.**  By the same method,
:math:`E[X^2] = \alpha(\alpha+1)/\beta^2`, so

.. math::

   \operatorname{Var}(X) = \frac{\alpha(\alpha+1)}{\beta^2} - \frac{\alpha^2}{\beta^2}
   = \frac{\alpha}{\beta^2}.

**MGF.**

.. math::

   M_X(t) = \left(\frac{\beta}{\beta - t}\right)^\alpha, \qquad t < \beta.

.. code-block:: python

   # Gamma: verify moments and connection to sum of Exponentials
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   alpha, beta = 3, 4.5
   rv = stats.gamma(a=alpha, scale=1/beta)
   gamma_samples = rv.rvs(size=100_000)

   # Also build Gamma samples as sum of alpha Exponentials
   exp_sums = np.sum(
       stats.expon.rvs(scale=1/beta, size=(100_000, alpha)),
       axis=1
   )

   E_theory   = alpha / beta
   Var_theory = alpha / beta**2
   print(f"Gamma(alpha={alpha}, beta={beta}) -- 'Wait for {alpha} patients'")
   print(f"  Theory:         E[X] = {E_theory:.4f} hrs ({E_theory*60:.1f} min)")
   print(f"                  Var(X) = {Var_theory:.4f}")
   print(f"  Gamma samples:  E[X] = {gamma_samples.mean():.4f},  Var = {gamma_samples.var():.4f}")
   print(f"  Sum of 3 Exp:   E[X] = {exp_sums.mean():.4f},  Var = {exp_sums.var():.4f}")
   print(f"  The sum of {alpha} Exp({beta}) variables IS Gamma({alpha}, {beta}).")

When :math:`\alpha = 1`, the Gamma reduces to the Exponential.  This makes
the Gamma a natural generalization: one event gives Exponential, multiple events
give Gamma.


3.2.5 Beta Distribution
--------------------------

Now suppose you want to estimate the trauma probability :math:`p` itself.
Before collecting data, you might express prior uncertainty about :math:`p`
using a distribution on :math:`[0,1]`.  The Beta distribution is the natural
choice.  Write :math:`X \sim \text{Beta}(\alpha, \beta)`.

In Bayesian statistics, the Beta serves as the conjugate prior for the Bernoulli
and Binomial likelihoods: if your prior on :math:`p` is Beta and you observe
Bernoulli data, the posterior on :math:`p` is also Beta.

**PDF.**

.. math::

   f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)},
   \qquad 0 < x < 1,

where :math:`B(\alpha,\beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}`
is the beta function.

**Parameters.** :math:`\alpha > 0`, :math:`\beta > 0` (shape parameters).

**Mean derivation.**

.. math::

   E[X] = \int_0^1 x \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}\,dx
        = \frac{1}{B(\alpha,\beta)}\int_0^1 x^{\alpha}(1-x)^{\beta-1}\,dx
        = \frac{B(\alpha+1,\beta)}{B(\alpha,\beta)}.

Using the relationship :math:`B(\alpha+1,\beta) = \frac{\alpha}{\alpha+\beta}\,B(\alpha,\beta)`:

.. math::

   E[X] = \frac{\alpha}{\alpha + \beta}.

**Variance derivation.**  Similarly,
:math:`E[X^2] = \frac{\alpha(\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)}`, so

.. math::

   \operatorname{Var}(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}.

**MGF.**  The Beta MGF does not have a simple closed form; it is given by a
confluent hypergeometric function.  In practice, one works directly with the
moments.

.. code-block:: python

   # Beta distribution: prior uncertainty about trauma rate p
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   alpha_param, beta_param = 3, 17  # prior: "3 traumas out of 20 patients seen"
   rv = stats.beta(alpha_param, beta_param)
   samples = rv.rvs(size=100_000)

   E_theory   = alpha_param / (alpha_param + beta_param)
   Var_theory = (alpha_param * beta_param) / (
       (alpha_param + beta_param)**2 * (alpha_param + beta_param + 1)
   )
   print(f"Beta(alpha={alpha_param}, beta={beta_param})")
   print(f"  Interpretation: prior belief about trauma rate p")
   print(f"  Theory:     E[X] = {E_theory:.4f},  Var(X) = {Var_theory:.6f}")
   print(f"  Simulation: E[X] = {samples.mean():.4f},  Var(X) = {samples.var():.6f}")

   # Show the shape: PDF at selected p values
   print(f"\n  PDF (shape of our belief about p):")
   for p in [0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.40]:
       bar = '#' * int(rv.pdf(p) * 5)
       print(f"    f({p:.2f}) = {rv.pdf(p):.4f}  {bar}")

   # 90% credible interval
   lo, hi = rv.ppf(0.05), rv.ppf(0.95)
   print(f"\n  90% prior interval for p: ({lo:.3f}, {hi:.3f})")


3.2.6 Chi-Squared Distribution
---------------------------------

If :math:`Z_1, \dots, Z_k` are independent standard normal random variables,
then

.. math::

   X = Z_1^2 + Z_2^2 + \cdots + Z_k^2 \sim \chi^2(k).

The chi-squared distribution with :math:`k` **degrees of freedom** is a
special case of the Gamma: :math:`\chi^2(k) = \text{Gamma}(k/2,\; 1/2)`.

In the hospital context, the chi-squared arises when testing whether observed
patient counts across categories (e.g., triage levels) match expected
proportions --- the classic goodness-of-fit test.

**PDF.**

.. math::

   f(x) = \frac{1}{2^{k/2}\,\Gamma(k/2)}\,x^{k/2-1}\,e^{-x/2},
   \qquad x > 0.

**Mean** (from the Gamma):

.. math::

   E[X] = \frac{k/2}{1/2} = k.

**Variance** (from the Gamma):

.. math::

   \operatorname{Var}(X) = \frac{k/2}{(1/2)^2} = 2k.

**MGF.**

.. math::

   M_X(t) = (1 - 2t)^{-k/2}, \qquad t < \tfrac{1}{2}.

.. code-block:: python

   # Chi-squared: verify definition by summing squared normals
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   k = 5  # degrees of freedom

   # Build chi-squared from its definition
   Z = np.random.standard_normal((100_000, k))
   chi2_from_def = np.sum(Z**2, axis=1)

   # Compare with scipy
   chi2_scipy = stats.chi2.rvs(k, size=100_000)

   print(f"Chi-squared(k={k})")
   print(f"  Theory:          E[X] = {k},  Var(X) = {2*k}")
   print(f"  Sum of Z^2:      E[X] = {chi2_from_def.mean():.2f},  "
         f"Var(X) = {chi2_from_def.var():.2f}")
   print(f"  Scipy samples:   E[X] = {chi2_scipy.mean():.2f},  "
         f"Var(X) = {chi2_scipy.var():.2f}")

   # Verify it is Gamma(k/2, 1/2)
   gamma_samples = stats.gamma.rvs(a=k/2, scale=2, size=100_000)  # scale=1/beta=1/(1/2)=2
   print(f"\n  Gamma({k/2}, 1/2):   E[X] = {gamma_samples.mean():.2f},  "
         f"Var(X) = {gamma_samples.var():.2f}")
   print(f"  chi^2({k}) and Gamma({k/2}, 1/2) are the same distribution.")


3.2.7 Student-t Distribution
-------------------------------

When you estimate a patient's mean heart rate from a small sample, you do not
know the true variance either.  Replacing the true :math:`\sigma` with the
sample standard deviation introduces extra uncertainty, and the resulting
statistic follows a Student-t rather than a Normal.

If :math:`Z \sim N(0,1)` and :math:`V \sim \chi^2(k)` are
independent, then

.. math::

   T = \frac{Z}{\sqrt{V/k}} \sim t(k).

The t-distribution looks like a normal but with heavier tails.  The fewer
degrees of freedom you have, the heavier the tails.  As :math:`k \to \infty`,
the t-distribution converges to the standard normal.

**PDF.**

.. math::

   f(x) = \frac{\Gamma\!\left(\frac{k+1}{2}\right)}
   {\sqrt{k\pi}\;\Gamma\!\left(\frac{k}{2}\right)}
   \left(1 + \frac{x^2}{k}\right)^{-(k+1)/2},
   \qquad x \in \mathbb{R}.

**Parameter.** :math:`k \in \{1,2,\dots\}` (degrees of freedom).

**Mean.**

.. math::

   E[T] = 0 \qquad \text{for } k > 1.

(The density is symmetric around zero.)

**Variance.**

.. math::

   \operatorname{Var}(T) = \frac{k}{k-2} \qquad \text{for } k > 2.

Note the variance exceeds 1 (heavier tails than the normal) and approaches 1
as :math:`k \to \infty`.

**MGF.**  The Student-t does not have an MGF (its moments of sufficiently high
order do not exist for small :math:`k`).  Its characteristic function is used
instead.

.. code-block:: python

   # Student-t: heavier tails shrink toward Normal as k grows
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   print("Student-t: variance and tail behavior vs degrees of freedom")
   print(f"{'k':>6s}  {'Var (theory)':>12s}  {'Var (simul)':>12s}  {'P(|T|>3)':>10s}  {'Normal P':>10s}")
   P_normal = 2 * (1 - stats.norm.cdf(3))
   for k in [3, 5, 10, 30, 100]:
       rv = stats.t(k)
       samp = rv.rvs(size=200_000)
       var_theory = k / (k - 2)
       P_tail = np.mean(np.abs(samp) > 3)
       print(f"{k:6d}  {var_theory:12.4f}  {samp.var():12.4f}  {P_tail:10.4f}  {P_normal:10.4f}")
   print(f"\nAs k increases, t(k) -> N(0,1): variance -> 1 and tails shrink.")


3.2.8 F-Distribution
-----------------------

A ratio of two independent chi-squared variables (each divided by their
degrees of freedom).  If :math:`U \sim \chi^2(d_1)` and :math:`V \sim \chi^2(d_2)`
are independent, then

.. math::

   F = \frac{U / d_1}{V / d_2} \sim F(d_1, d_2).

**PDF.**

.. math::

   f(x) = \frac{1}{B(d_1/2,\,d_2/2)}
   \left(\frac{d_1}{d_2}\right)^{d_1/2}
   \frac{x^{d_1/2 - 1}}{\left(1 + \frac{d_1}{d_2}x\right)^{(d_1+d_2)/2}},
   \qquad x > 0.

**Parameters.** :math:`d_1, d_2 > 0` (numerator and denominator degrees of
freedom).

**Mean.**

.. math::

   E[F] = \frac{d_2}{d_2 - 2} \qquad \text{for } d_2 > 2.

**Variance.**

.. math::

   \operatorname{Var}(F) = \frac{2\,d_2^2\,(d_1 + d_2 - 2)}{d_1\,(d_2-2)^2\,(d_2-4)}
   \qquad \text{for } d_2 > 4.

**MGF.**  Like the Student-t, the F-distribution does not have a standard MGF.

**Usage.**  ANOVA, comparing variances, overall significance in linear
regression (F-test).  Note that :math:`T^2 \sim F(1, k)` if :math:`T \sim t(k)`.


.. _sec_distribution_relationships:

3.3 Relationships Between Distributions
==========================================

The distributions above do not exist in isolation.  They form a rich family
tree, and in our hospital scenario every connection has a concrete
interpretation.  Understanding these connections deepens intuition and
simplifies many derivations.


3.3.1 Poisson--Exponential Duality
-------------------------------------

This is the connection at the heart of our hospital scenario.  Patient arrivals
per hour follow :math:`\text{Pois}(\lambda)`, and the time between arrivals
follows :math:`\text{Exp}(\lambda)`.  These are two views of the same Poisson
process.

.. code-block:: python

   # Poisson-Exponential duality: two views of hospital arrivals
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   lam = 4.5  # patients per hour

   # View 1: generate Poisson counts directly
   n_hours = 100_000
   poisson_counts = stats.poisson.rvs(lam, size=n_hours)

   # View 2: generate Exp inter-arrival times, count per hour
   # Simulate a long stream of arrivals
   total_events = int(lam * n_hours * 1.1)  # slightly more than expected
   inter_arrivals = stats.expon.rvs(scale=1/lam, size=total_events)
   arrival_times = np.cumsum(inter_arrivals)
   arrival_times = arrival_times[arrival_times < n_hours]

   # Count events in each hour
   hour_edges = np.arange(n_hours + 1)
   exp_counts, _ = np.histogram(arrival_times, bins=hour_edges)

   print(f"Poisson-Exponential duality (lambda={lam})")
   print(f"  Direct Poisson:      mean = {poisson_counts.mean():.3f}, "
         f"var = {poisson_counts.var():.3f}")
   print(f"  Exp inter-arrivals:  mean = {exp_counts.mean():.3f}, "
         f"var = {exp_counts.var():.3f}")
   print(f"  Theory:              mean = {lam:.3f}, var = {lam:.3f}")
   print(f"\n  Mean inter-arrival time: {inter_arrivals.mean():.4f} hrs "
         f"({inter_arrivals.mean()*60:.1f} min)")
   print(f"  Theory: 1/lambda = {1/lam:.4f} hrs ({60/lam:.1f} min)")


3.3.2 Binomial--Poisson Limit
-------------------------------

If :math:`X \sim \text{Bin}(n, p)` with :math:`n \to \infty` and
:math:`p \to 0` such that :math:`\lambda = np` remains constant, then

.. math::

   \binom{n}{x} p^x (1-p)^{n-x} \to \frac{e^{-\lambda}\lambda^x}{x!}.

Here is the key idea: when you have very many trials each with a very small
success probability, you are essentially counting rare events --- and rare
event counts follow the Poisson.

In the hospital, think of it this way: if the city has :math:`n = 100{,}000`
residents and each independently has a :math:`p = 0.000045` probability of
visiting the ED in a given hour, then the count of arrivals is
:math:`\text{Bin}(100000, 0.000045) \approx \text{Pois}(4.5)`.

*Derivation sketch.*  Write :math:`p = \lambda/n` and take the limit:

.. math::

   \binom{n}{x}\left(\frac{\lambda}{n}\right)^x\left(1-\frac{\lambda}{n}\right)^{n-x}
   &= \frac{n!}{x!(n-x)!}\,\frac{\lambda^x}{n^x}\,
      \left(1-\frac{\lambda}{n}\right)^n \left(1-\frac{\lambda}{n}\right)^{-x}.

As :math:`n \to \infty`:

- :math:`n! / [(n-x)!\,n^x] \to 1`,
- :math:`(1 - \lambda/n)^n \to e^{-\lambda}`,
- :math:`(1 - \lambda/n)^{-x} \to 1`.

So the limit is :math:`\frac{\lambda^x}{x!}e^{-\lambda}`, which is the Poisson
PMF.

.. code-block:: python

   # Binomial-to-Poisson limit: watch convergence as n grows
   import numpy as np
   from scipy import stats

   lam = 4.5   # fixed lambda = n*p
   x_eval = 4  # evaluate at x=4

   print(f"P(X={x_eval}): Bin(n, {lam}/n) vs Poisson({lam})")
   print(f"  Poisson:         {stats.poisson.pmf(x_eval, lam):.8f}")
   for n in [20, 50, 100, 1_000, 10_000, 100_000]:
       p = lam / n
       binom_pmf = stats.binom.pmf(x_eval, n, p)
       error = abs(binom_pmf - stats.poisson.pmf(x_eval, lam))
       print(f"  Bin({n:>7d}, {p:.6f}): {binom_pmf:.8f}  (error: {error:.2e})")
   print(f"\n  As n -> inf with np = {lam}, Binomial -> Poisson.")


3.3.3 Gamma--Exponential Relationship
----------------------------------------

- :math:`\text{Exp}(\lambda) = \text{Gamma}(1, \lambda)`.
- If :math:`X_1, \dots, X_n` are independent :math:`\text{Exp}(\lambda)`, then
  :math:`\sum X_i \sim \text{Gamma}(n, \lambda)`.

In the hospital: the wait for one patient is Exponential; the total wait for
:math:`n` patients is Gamma.

*Proof via MGFs.*

.. math::

   M_{\sum X_i}(t) = \prod_{i=1}^n \frac{\lambda}{\lambda - t}
   = \left(\frac{\lambda}{\lambda - t}\right)^n,

which is the MGF of :math:`\text{Gamma}(n, \lambda)`. :math:`\square`

.. code-block:: python

   # Gamma = sum of Exponentials: verify for several values of n
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   lam = 4.5  # patient arrival rate

   print(f"Sum of n Exp({lam}) vs Gamma(n, {lam}):")
   print(f"{'n':>4s}  {'Sum E[X]':>10s}  {'Gamma E[X]':>12s}  {'Theory':>10s}")
   for n_wait in [1, 3, 5, 10]:
       exp_sums = np.sum(
           stats.expon.rvs(scale=1/lam, size=(100_000, n_wait)), axis=1
       )
       gamma_samp = stats.gamma.rvs(a=n_wait, scale=1/lam, size=100_000)
       theory = n_wait / lam
       print(f"{n_wait:4d}  {exp_sums.mean():10.4f}  {gamma_samp.mean():12.4f}  {theory:10.4f}")


3.3.4 Chi-Squared as a Gamma Special Case
-------------------------------------------

.. math::

   \chi^2(k) = \text{Gamma}\!\left(\frac{k}{2},\;\frac{1}{2}\right).

This follows directly from comparing PDFs.


3.3.5 Normal--Chi-Squared Relationship
-----------------------------------------

If :math:`Z \sim N(0,1)`, then :math:`Z^2 \sim \chi^2(1)`.

*Proof.*  For :math:`x > 0`,

.. math::

   P(Z^2 \leq x) = P(-\sqrt{x} \leq Z \leq \sqrt{x})
   = 2\Phi(\sqrt{x}) - 1,

where :math:`\Phi` is the standard normal CDF.  Differentiating gives the
:math:`\chi^2(1)` density.


3.3.6 Beta--Gamma Relationship
---------------------------------

If :math:`X \sim \text{Gamma}(\alpha, 1)` and :math:`Y \sim \text{Gamma}(\beta, 1)`
are independent, then

.. math::

   \frac{X}{X + Y} \sim \text{Beta}(\alpha, \beta).


3.3.7 Student-t and F-Distribution Connection
------------------------------------------------

If :math:`T \sim t(k)`, then :math:`T^2 \sim F(1, k)`.

If :math:`F \sim F(d_1, d_2)`, then :math:`1/F \sim F(d_2, d_1)`.


3.3.8 Binomial--Normal Approximation (CLT)
--------------------------------------------

By the CLT, when :math:`n` is large,

.. math::

   \frac{X - np}{\sqrt{np(1-p)}} \;\dot\sim\; N(0, 1)
   \qquad \text{for } X \sim \text{Bin}(n, p).

In the hospital, the number of trauma cases in :math:`n = 200` patients
(:math:`p = 0.15`) is :math:`\text{Bin}(200, 0.15)`, but we can approximate
it with :math:`N(30, 25.5)`.

.. code-block:: python

   # Binomial-Normal approximation: trauma counts for large n
   import numpy as np
   from scipy import stats

   n, p = 200, 0.15
   mu_approx    = n * p
   sigma2_approx = n * p * (1 - p)

   print(f"Bin({n}, {p}) vs N({mu_approx:.0f}, {sigma2_approx:.1f})")
   print(f"{'x':>4s}  {'Bin PMF':>10s}  {'Normal':>10s}  {'Error':>10s}")
   for x in range(15, 46, 3):
       bp = stats.binom.pmf(x, n, p)
       na = stats.norm.pdf(x, loc=mu_approx, scale=np.sqrt(sigma2_approx))
       print(f"{x:4d}  {bp:10.6f}  {na:10.6f}  {abs(bp-na):10.6f}")

   # Practical check: P(X > 40)
   P_binom  = 1 - stats.binom.cdf(40, n, p)
   P_normal = 1 - stats.norm.cdf(40, loc=mu_approx, scale=np.sqrt(sigma2_approx))
   print(f"\n  P(X > 40): Binomial = {P_binom:.6f},  Normal approx = {P_normal:.6f}")


3.3.9 Summary of Relationships
---------------------------------

.. list-table::
   :header-rows: 1

   * - From
     - To
     - Mechanism
   * - Bernoulli
     - Binomial
     - Sum of :math:`n` i.i.d. copies
   * - Binomial
     - Poisson
     - :math:`n \to \infty`, :math:`p \to 0`, :math:`np = \lambda`
   * - Binomial
     - Normal
     - CLT (large :math:`n`)
   * - Poisson
     - Exponential
     - Counts vs. inter-arrival times (same process)
   * - Exponential
     - Gamma
     - Sum of i.i.d. copies / shape = 1 special case
   * - Gamma
     - Chi-squared
     - :math:`\alpha=k/2`, :math:`\beta=1/2`
   * - Normal
     - Chi-squared
     - Square of standard normal
   * - Chi-squared
     - Student-t
     - :math:`Z / \sqrt{V/k}`
   * - Chi-squared
     - F
     - Ratio of independent chi-squareds
   * - Student-t
     - F
     - :math:`T^2 \sim F(1,k)`
   * - Gamma
     - Beta
     - :math:`X/(X+Y)` for independent Gammas


3.4 Connecting the Hospital Story
====================================

Let us step back and see how all these distributions fit together in our
emergency department scenario.

.. code-block:: python

   # The hospital ED: one scenario, five distributions
   import numpy as np
   from scipy import stats

   np.random.seed(42)
   lam = 4.5   # patient arrival rate (per hour)
   p   = 0.15  # probability each patient is a trauma case
   mu_hr, sigma_hr = 80, 12  # heart rate parameters

   # Simulate one 8-hour shift
   n_hours = 8

   # 1. Poisson: patient counts per hour
   hourly_counts = stats.poisson.rvs(lam, size=n_hours)
   total_patients = hourly_counts.sum()

   # 2. Exponential: inter-arrival times (for the total_patients)
   inter_arrivals = stats.expon.rvs(scale=1/lam, size=total_patients)

   # 3. Binomial: how many are trauma cases?
   n_trauma = stats.binom.rvs(total_patients, p)

   # 4. Normal: heart rates for each patient
   heart_rates = stats.norm.rvs(loc=mu_hr, scale=sigma_hr, size=total_patients)

   # 5. Gamma: total wait time for next 3 patients
   wait_for_3 = stats.gamma.rvs(a=3, scale=1/lam)

   print("=" * 60)
   print("  HOSPITAL ED: ONE SHIFT SIMULATION")
   print("=" * 60)
   print(f"\n  Arrival rate:    lambda = {lam} patients/hour")
   print(f"  Trauma prob:     p = {p}")
   print(f"  Heart rate:      N({mu_hr}, {sigma_hr}^2)")
   print(f"\n  Hourly counts (Poisson):  {hourly_counts}")
   print(f"  Total patients (8 hrs):    {total_patients}")
   print(f"  Trauma cases (Binomial):   {n_trauma} out of {total_patients}")
   print(f"  Mean inter-arrival (Exp):  {inter_arrivals.mean()*60:.1f} min "
         f"(theory: {60/lam:.1f} min)")
   print(f"  Mean heart rate (Normal):  {heart_rates.mean():.1f} bpm")
   print(f"  High HR (>100 bpm):        {np.sum(heart_rates > 100)} patients")
   print(f"  Wait for next 3 (Gamma):   {wait_for_3*60:.1f} min")
   print(f"\n  Every number above came from a distribution in this chapter.")


3.5 Summary
=============

This chapter catalogued the probability distributions that form the
workhorses of likelihood-based inference, all tied together through a single
hospital emergency department scenario:

- **Discrete distributions** (Bernoulli, Binomial, Poisson, Geometric,
  Negative Binomial, Hypergeometric) model count data --- from individual
  patient classifications to arrival counts to pharmacy sampling.
- **Continuous distributions** (Uniform, Normal, Exponential, Gamma, Beta,
  Chi-squared, Student-t, F) model measurements on the real line or subsets
  thereof --- from waiting times to vital signs to prior beliefs.
- These distributions are connected through limiting arguments (Binomial to
  Poisson, CLT), algebraic relationships (sums of Exponentials give Gammas,
  Poisson counts correspond to Exponential gaps), and transformations
  (squaring a Normal gives a Chi-squared).

In :ref:`ch4_likelihood`, we will see how these distributions give rise to
likelihood functions, transforming them from descriptions of data into tools
for inference about unknown parameters.