.. _appendix_notation: ==================================== Appendix C: Notation and Glossary ==================================== This appendix provides a centralized reference for all mathematical notation and terminology used throughout this guide. When a symbol has multiple common meanings, the one adopted in this text is indicated. Mathematical Notation ====================== Sets and Spaces ---------------- .. list-table:: :widths: 25 75 :header-rows: 1 * - Symbol - Meaning * - :math:`\mathbb{R}` - The set of real numbers * - :math:`\mathbb{R}^n` - The set of real :math:`n`-dimensional column vectors * - :math:`\mathbb{R}^{n \times m}` - The set of real :math:`n \times m` matrices * - :math:`\mathbb{R}^+` - The set of strictly positive real numbers :math:`(0, \infty)` * - :math:`\mathbb{Z}` - The set of integers * - :math:`\mathbb{Z}^+` - The set of positive integers :math:`\{1, 2, 3, \ldots\}` * - :math:`\mathbb{N}_0` - The set of non-negative integers :math:`\{0, 1, 2, \ldots\}` * - :math:`\emptyset` - The empty set * - :math:`\mathcal{X}` - The sample space (set of all possible outcomes) * - :math:`\Theta` - The parameter space * - :math:`\mathcal{S}_n^{++}` - The cone of :math:`n \times n` symmetric positive definite matrices * - :math:`\in` - Element of * - :math:`\subset, \subseteq` - Proper subset, subset (or equal) * - :math:`\cup, \cap` - Union, intersection * - :math:`A^c` - Complement of set :math:`A` Probability Notation --------------------- .. list-table:: :widths: 25 75 :header-rows: 1 * - Symbol - Meaning * - :math:`P(A)` - Probability of event :math:`A` * - :math:`P(A \mid B)` - Conditional probability of :math:`A` given :math:`B` * - :math:`P(A \cap B)` - Joint probability of :math:`A` and :math:`B` * - :math:`f(x)` or :math:`f_X(x)` - Probability density function (PDF) of a continuous r.v. * - :math:`p(x)` or :math:`p_X(x)` - Probability mass function (PMF) of a discrete r.v. * - :math:`F(x)` or :math:`F_X(x)` - Cumulative distribution function (CDF) * - :math:`f(x \mid \theta)` - Density of :math:`X` given parameter :math:`\theta` * - :math:`f(\mathbf{x} \mid \boldsymbol{\theta})` - Joint density of data vector :math:`\mathbf{x}` given parameters * - :math:`\sim` - "is distributed as" (e.g., :math:`X \sim \mathcal{N}(\mu, \sigma^2)`) * - :math:`\stackrel{d}{\to}` - Convergence in distribution * - :math:`\stackrel{p}{\to}` - Convergence in probability * - :math:`\stackrel{a.s.}{\to}` - Almost sure convergence * - :math:`\perp\!\!\!\perp` - Statistical independence * - :math:`\text{i.i.d.}` - Independent and identically distributed Random Variables and Expectations ---------------------------------- .. list-table:: :widths: 25 75 :header-rows: 1 * - Symbol - Meaning * - :math:`X, Y, Z` - Random variables (uppercase) * - :math:`x, y, z` - Observed values / realizations (lowercase) * - :math:`\mathbf{X}` - Random vector or random matrix * - :math:`E[X]` or :math:`\mu` - Expected value (mean) of :math:`X` * - :math:`E[X \mid Y]` - Conditional expectation of :math:`X` given :math:`Y` * - :math:`\operatorname{Var}(X)` or :math:`\sigma^2` - Variance of :math:`X` * - :math:`\operatorname{Cov}(X, Y)` - Covariance of :math:`X` and :math:`Y` * - :math:`\operatorname{Corr}(X, Y)` or :math:`\rho` - Correlation of :math:`X` and :math:`Y` * - :math:`\boldsymbol{\Sigma}` - Covariance matrix * - :math:`M_X(t)` - Moment generating function of :math:`X` * - :math:`\phi_X(t)` - Characteristic function of :math:`X` * - :math:`E_\theta[\cdot]` - Expectation taken under the distribution indexed by :math:`\theta` Likelihood and Inference ------------------------- .. list-table:: :widths: 25 75 :header-rows: 1 * - Symbol - Meaning * - :math:`L(\theta)` or :math:`L(\theta ; \mathbf{x})` - Likelihood function * - :math:`\ell(\theta)` or :math:`\ell(\theta ; \mathbf{x})` - Log-likelihood function, :math:`\ell = \log L` * - :math:`\hat{\theta}` or :math:`\hat{\theta}_{\text{MLE}}` - Maximum likelihood estimator / estimate * - :math:`U(\theta)` or :math:`S(\theta)` - Score function, :math:`U(\theta) = \partial \ell / \partial \theta` * - :math:`\mathcal{I}(\theta)` - Fisher information (expected information) * - :math:`\mathcal{J}(\theta)` or :math:`J(\hat{\theta})` - Observed information, :math:`-\partial^2 \ell / \partial \theta^2` * - :math:`\Lambda` - Likelihood ratio statistic * - :math:`R(\theta)` - Profile likelihood or relative likelihood * - :math:`\ell_p(\psi)` - Profile log-likelihood for parameter of interest :math:`\psi` * - :math:`\text{se}(\hat{\theta})` - Standard error of estimator :math:`\hat{\theta}` * - :math:`\text{AIC}` - Akaike Information Criterion, :math:`-2\ell(\hat\theta) + 2p` * - :math:`\text{BIC}` - Bayesian Information Criterion, :math:`-2\ell(\hat\theta) + p\log n` Optimization Notation ---------------------- .. list-table:: :widths: 25 75 :header-rows: 1 * - Symbol - Meaning * - :math:`\nabla f` or :math:`\nabla_{\mathbf{x}} f` - Gradient of :math:`f` with respect to :math:`\mathbf{x}` * - :math:`\mathbf{H}` or :math:`\nabla^2 f` - Hessian matrix (matrix of second partial derivatives) * - :math:`\mathbf{J}` - Jacobian matrix * - :math:`\arg\max_\theta f(\theta)` - Value of :math:`\theta` that maximizes :math:`f` * - :math:`\arg\min_\theta f(\theta)` - Value of :math:`\theta` that minimizes :math:`f` * - :math:`\eta` - Learning rate / step size * - :math:`\theta^{(k)}` - Parameter value at iteration :math:`k` of an iterative algorithm * - :math:`\epsilon` - Convergence tolerance * - :math:`O(\cdot)` - Big-O notation (asymptotic upper bound) * - :math:`o(\cdot)` - Little-o notation (asymptotically negligible) * - :math:`O_p(\cdot)` - Stochastic big-O (bounded in probability) * - :math:`o_p(\cdot)` - Stochastic little-o (converges to zero in probability) Matrix Notation ---------------- .. list-table:: :widths: 25 75 :header-rows: 1 * - Symbol - Meaning * - :math:`\mathbf{A}, \mathbf{B}, \mathbf{C}` - Matrices (bold uppercase) * - :math:`\mathbf{x}, \mathbf{y}, \mathbf{z}` - Vectors (bold lowercase) * - :math:`\mathbf{I}` or :math:`\mathbf{I}_n` - Identity matrix (:math:`n \times n`) * - :math:`\mathbf{0}` - Zero vector or zero matrix * - :math:`\mathbf{1}` or :math:`\mathbf{1}_n` - Vector of ones * - :math:`\mathbf{A}^\top` - Transpose of :math:`\mathbf{A}` * - :math:`\mathbf{A}^{-1}` - Inverse of :math:`\mathbf{A}` * - :math:`\mathbf{A}^{-\top}` - :math:`(\mathbf{A}^{-1})^\top = (\mathbf{A}^\top)^{-1}` * - :math:`\det(\mathbf{A})` or :math:`|\mathbf{A}|` - Determinant of :math:`\mathbf{A}` * - :math:`\operatorname{tr}(\mathbf{A})` - Trace of :math:`\mathbf{A}` * - :math:`\operatorname{rank}(\mathbf{A})` - Rank of :math:`\mathbf{A}` * - :math:`\operatorname{diag}(\mathbf{A})` - Vector of diagonal entries of :math:`\mathbf{A}` * - :math:`\operatorname{diag}(\mathbf{x})` - Diagonal matrix with entries of :math:`\mathbf{x}` on the diagonal * - :math:`\lambda_i(\mathbf{A})` - :math:`i`-th eigenvalue of :math:`\mathbf{A}` * - :math:`\sigma_i(\mathbf{A})` - :math:`i`-th singular value of :math:`\mathbf{A}` * - :math:`\|\mathbf{x}\|` or :math:`\|\mathbf{x}\|_2` - Euclidean (L2) norm, :math:`\sqrt{\mathbf{x}^\top\mathbf{x}}` * - :math:`\|\mathbf{A}\|_F` - Frobenius norm, :math:`\sqrt{\operatorname{tr}(\mathbf{A}^\top\mathbf{A})}` * - :math:`\mathbf{A} \succ 0` - :math:`\mathbf{A}` is positive definite * - :math:`\mathbf{A} \succeq 0` - :math:`\mathbf{A}` is positive semi-definite * - :math:`\mathbf{A} \otimes \mathbf{B}` - Kronecker product of :math:`\mathbf{A}` and :math:`\mathbf{B}` Named Distributions -------------------- The following shorthand is used for standard distributions: .. list-table:: :widths: 30 70 :header-rows: 1 * - Notation - Distribution * - :math:`\text{Bernoulli}(p)` - Bernoulli with success probability :math:`p` * - :math:`\text{Bin}(n, p)` - Binomial with :math:`n` trials and success probability :math:`p` * - :math:`\text{Poisson}(\lambda)` - Poisson with rate :math:`\lambda` * - :math:`\text{Geom}(p)` - Geometric with success probability :math:`p` * - :math:`\text{NegBin}(r, p)` - Negative binomial with :math:`r` successes, probability :math:`p` * - :math:`\mathcal{U}(a, b)` - Uniform on :math:`[a, b]` * - :math:`\text{Exp}(\lambda)` - Exponential with rate :math:`\lambda` * - :math:`\mathcal{N}(\mu, \sigma^2)` - Normal with mean :math:`\mu` and variance :math:`\sigma^2` * - :math:`\mathcal{N}_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})` - :math:`p`-variate normal with mean :math:`\boldsymbol{\mu}` and covariance :math:`\boldsymbol{\Sigma}` * - :math:`\text{Gamma}(\alpha, \beta)` - Gamma with shape :math:`\alpha` and rate :math:`\beta` * - :math:`\text{Beta}(\alpha, \beta)` - Beta with shape parameters :math:`\alpha` and :math:`\beta` * - :math:`\chi^2_n` - Chi-squared with :math:`n` degrees of freedom * - :math:`t_n` - Student's :math:`t` with :math:`n` degrees of freedom * - :math:`F_{m,n}` - :math:`F`-distribution with :math:`m` and :math:`n` degrees of freedom * - :math:`\text{Mult}(n, \mathbf{p})` - Multinomial with :math:`n` trials and probability vector :math:`\mathbf{p}` * - :math:`\text{Dir}(\boldsymbol{\alpha})` - Dirichlet with concentration parameter :math:`\boldsymbol{\alpha}` * - :math:`\text{Wishart}_p(n, \mathbf{V})` - Wishart with :math:`n` degrees of freedom and scale matrix :math:`\mathbf{V}` Greek Letters and Their Typical Uses ===================================== .. list-table:: :widths: 15 20 65 :header-rows: 1 * - Letter - Name - Typical Use in Statistics * - :math:`\alpha` - alpha - Significance level; shape parameter; Type I error rate * - :math:`\beta` - beta - Regression coefficient; rate parameter; Type II error rate * - :math:`\gamma` - gamma - Skewness; Euler--Mascheroni constant; threshold parameter * - :math:`\delta` - delta - Effect size; small perturbation; Kronecker delta * - :math:`\epsilon, \varepsilon` - epsilon - Error term; small positive quantity; convergence tolerance * - :math:`\zeta` - zeta - Latent variable; link function parameter * - :math:`\eta` - eta - Natural (canonical) parameter; learning rate * - :math:`\theta` - theta - Generic parameter (the most common choice) * - :math:`\iota` - iota - (Rarely used in statistics) * - :math:`\kappa` - kappa - Cumulant; condition number; concentration parameter * - :math:`\lambda` - lambda - Rate parameter; eigenvalue; Lagrange multiplier; penalty * - :math:`\mu` - mu - Mean; location parameter * - :math:`\nu` - nu - Degrees of freedom * - :math:`\xi` - xi - Latent variable; auxiliary parameter * - :math:`\pi` - pi - Prior probability; the constant 3.14159... * - :math:`\rho` - rho - Correlation coefficient; spectral radius * - :math:`\sigma` - sigma - Standard deviation (:math:`\sigma^2` = variance) * - :math:`\tau` - tau - Precision (:math:`1/\sigma^2`); Kendall's rank correlation * - :math:`\upsilon` - upsilon - (Rarely used in statistics) * - :math:`\phi, \varphi` - phi - Standard normal density; dispersion parameter; basis function * - :math:`\chi` - chi - Chi-squared distribution * - :math:`\psi` - psi - Digamma function; parameter of interest * - :math:`\omega` - omega - Weight; angular frequency **Uppercase Greek letters** with common statistical uses: .. list-table:: :widths: 15 20 65 :header-rows: 1 * - Letter - Name - Typical Use * - :math:`\Gamma` - Gamma - Gamma function; Gamma distribution * - :math:`\Delta` - Delta - Change or difference * - :math:`\Theta` - Theta - Parameter space * - :math:`\Lambda` - Lambda - Likelihood ratio; diagonal matrix of eigenvalues * - :math:`\Sigma` - Sigma - Covariance matrix; summation (:math:`\sum`) * - :math:`\Phi` - Phi - Standard normal CDF * - :math:`\Psi` - Psi - Polygamma function * - :math:`\Omega` - Omega - Sample space; precision matrix (:math:`\Sigma^{-1}`) Glossary of Key Terms ====================== .. glossary:: :sorted: Asymptotic normality The property that the distribution of an estimator approaches a normal distribution as the sample size grows. Under regularity conditions, :math:`\sqrt{n}(\hat\theta - \theta_0) \stackrel{d}{\to} \mathcal{N}(0, \mathcal{I}(\theta_0)^{-1})`. Bias The difference :math:`E[\hat\theta] - \theta` between the expected value of an estimator and the true parameter value. Completeness A statistic :math:`T` is complete if the only function :math:`g` with :math:`E_\theta[g(T)] = 0` for all :math:`\theta` is :math:`g \equiv 0`. Confidence interval An interval :math:`[L(\mathbf{X}), U(\mathbf{X})]` that contains the true parameter with a specified probability (the confidence level). Conjugate prior A prior distribution that, when combined with the likelihood via Bayes' theorem, yields a posterior of the same parametric family. Consistency An estimator :math:`\hat\theta_n` is consistent if :math:`\hat\theta_n \stackrel{p}{\to} \theta_0` as :math:`n \to \infty`. Cramér--Rao lower bound The minimum variance achievable by any unbiased estimator: :math:`\operatorname{Var}(\hat\theta) \geq \mathcal{I}(\theta)^{-1}`. Deviance Twice the difference between the log-likelihood of the saturated model and the fitted model: :math:`D = 2[\ell_{\text{sat}} - \ell(\hat\theta)]`. Efficiency The ratio of the Cramér--Rao lower bound to the actual variance of an estimator. An efficient estimator achieves the bound. EM algorithm Expectation--Maximization algorithm, an iterative method for finding MLEs when the model involves latent variables or missing data. Estimator A function of the data used to estimate an unknown parameter. The distinction between "estimator" (the rule) and "estimate" (the numerical value) is maintained in this text. Exponential family A parametric family whose density can be written as :math:`f(x|\theta) = h(x)\exp[\eta(\theta)^\top T(x) - A(\theta)]`. Fisher information The variance of the score function, or equivalently the negative expected Hessian of the log-likelihood: :math:`\mathcal{I}(\theta) = E[U(\theta)^2] = -E[\ell''(\theta)]`. Gradient descent An iterative optimization algorithm: :math:`\theta^{(k+1)} = \theta^{(k)} + \eta\,\nabla\ell(\theta^{(k)})`. Hessian matrix The matrix of second partial derivatives of a function: :math:`H_{ij} = \partial^2 f / \partial \theta_i \partial \theta_j`. Kullback--Leibler divergence A measure of the difference between two distributions: :math:`\text{KL}(p \| q) = \int p(x)\log\frac{p(x)}{q(x)}\,dx`. Likelihood function The joint density or mass function of the data, viewed as a function of the parameters: :math:`L(\theta) = f(\mathbf{x} \mid \theta)`. Likelihood ratio test A hypothesis test based on the statistic :math:`\Lambda = 2[\ell(\hat\theta) - \ell(\theta_0)]`, which is asymptotically :math:`\chi^2`. Log-likelihood The natural logarithm of the likelihood function: :math:`\ell(\theta) = \log L(\theta)`. Maximum likelihood estimator (MLE) The parameter value that maximizes the likelihood (or equivalently the log-likelihood): :math:`\hat\theta = \arg\max_\theta \ell(\theta)`. Method of moments An estimation approach that equates sample moments to population moments and solves for the parameters. Newton--Raphson method An iterative root-finding algorithm applied to the score equation: :math:`\theta^{(k+1)} = \theta^{(k)} - [\ell''(\theta^{(k)})]^{-1}\,\ell'(\theta^{(k)})`. Nuisance parameter A parameter that is not of direct interest but must be accounted for in the inference procedure. Observed information The negative Hessian of the log-likelihood evaluated at the MLE: :math:`\mathcal{J}(\hat\theta) = -\ell''(\hat\theta)`. Power The probability of correctly rejecting a false null hypothesis: :math:`1 - \beta`, where :math:`\beta` is the Type II error rate. Profile likelihood The likelihood maximized over nuisance parameters: :math:`L_p(\psi) = \max_\lambda L(\psi, \lambda)`. p-value The probability, under the null hypothesis, of observing a test statistic as extreme as or more extreme than the observed value. Regularity conditions Technical conditions (differentiability, integrability, parameter not on boundary) that ensure standard asymptotic results hold for MLEs. Score function The derivative of the log-likelihood with respect to the parameter: :math:`U(\theta) = \partial\ell / \partial\theta`. Under regularity conditions, :math:`E[U(\theta_0)] = 0`. Sufficient statistic A statistic :math:`T(\mathbf{X})` that captures all the information in the data about the parameter. By the Fisher--Neyman factorization theorem, :math:`T` is sufficient if :math:`f(\mathbf{x}|\theta) = g(T(\mathbf{x}), \theta)\,h(\mathbf{x})`. Wald test A hypothesis test based on the statistic :math:`W = (\hat\theta - \theta_0)^2 / \widehat{\operatorname{Var}}(\hat\theta)`, which is asymptotically :math:`\chi^2_1`. Common Abbreviations ===================== .. list-table:: :widths: 20 80 :header-rows: 1 * - Abbreviation - Full Name * - AIC - Akaike Information Criterion * - BIC - Bayesian Information Criterion * - CDF - Cumulative distribution function * - CLT - Central Limit Theorem * - CRLB - Cramér--Rao lower bound * - EM - Expectation--Maximization * - GLM - Generalized linear model * - i.i.d. - Independent and identically distributed * - IRLS - Iteratively reweighted least squares * - KL - Kullback--Leibler * - LLN - Law of large numbers * - LRT - Likelihood ratio test * - MGF - Moment generating function * - MLE - Maximum likelihood estimator / estimate * - MSE - Mean squared error * - MVN - Multivariate normal * - NR - Newton--Raphson * - PDF - Probability density function * - PMF - Probability mass function * - r.v. - Random variable * - SVD - Singular value decomposition * - UMVUE - Uniformly minimum variance unbiased estimator * - w.r.t. - With respect to