{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$\\qquad$ $\\qquad$$\\qquad$  **TDA 231 Machine Learning: Homework 1** <br />\n",
    "$\\qquad$ $\\qquad$$\\qquad$ **Goal: Maximum likelihood estimation (MLE), Maximum a posteriori (MAP)**<br />\n",
    "$\\qquad$ $\\qquad$$\\qquad$                   **Grader: Vasileios** <br />\n",
    "$\\qquad$ $\\qquad$$\\qquad$                     **Due Date: 16/4** <br />\n",
    "$\\qquad$ $\\qquad$$\\qquad$                   **Submitted by: Name, Personal no., email** <br />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "General guidelines:\n",
    "* All solutions to theoretical problems, can be submitted as a single file named *report.pdf*. They can also be submitted in this ipynb notebook, but equations wherever required, should be formatted using LaTeX math-mode.\n",
    "* All discussion regarding practical problems, along with solutions and plots should be specified here itself. We will not generate the solutions/plots again by running your code.\n",
    "* Your name, personal number and email address should be specified above and also in your file *report.pdf*.\n",
    "* All datasets can be downloaded from the course website.\n",
    "* All tables and other additional information should be included."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Theoretical problems\n",
    "\n",
    "## [Maximum likelihood estimator (MLE), 4 points]\n",
    "\n",
    "Consider a dataset $x_1, \\ldots, x_n$ consisting of i.i.d. observations \n",
    "generated from a **spherical** multivariate Gaussian distribution $N(\\mu, \\sigma^2I)$, where $\\mu \\in \\mathbb{R}^p$, $I$ \n",
    "is the $p \\times p$ identity matrix, and $\\sigma^2$ is a\n",
    "scalar. Derive the maximum likelihood estimator for $\\sigma$.\n",
    "\n",
    "## [Posterior distributions, 6 points]\n",
    "\n",
    "Consider dataset $x_1, \\ldots, x_n $ consisting of i.i.d. observations \n",
    "generated from a **spherical** multivariate Gaussian distribution $N(\\mu, \\sigma^2I)$, where $\\mu =\n",
    "[\\mu_{1},\\, \\mu_{2}]^{\\top} \\in \\mathbb{R}^2$, $I$ \n",
    "is the $2 \\times 2$ identity matrix, and $\\sigma^2$ is a scalar. \n",
    "The probability distribution of a point $\\mathbf{x}=[x_{1},\\, x_{2}]^{\\top}$ is given by\n",
    "\n",
    "$$ P(X = x \\,|\\, \\sigma^{2}) =  \\frac{1}{ 2\\pi \\sigma^2}   \\exp\n",
    "\\left( -\\frac{ (x - \\mu)^{\\top}(x - \\mu) }{2\\sigma^{2}} \\right)\n",
    "~.$$\n",
    "\n",
    "We assume that $\\sigma^{2}$ has an **inverse-gamma** prior distribution\n",
    "given by\n",
    "$$ P(\\sigma^{2} = s | \\alpha, \\beta) =\n",
    "\\frac{\\beta^{\\alpha}}{\\Gamma(\\alpha)} s^{-\\alpha-1} \\exp\\left( -\n",
    "  \\frac{\\beta}{s}\\right)~. \\tag{1} $$\n",
    "  \n",
    "where $\\alpha$ and $\\beta$ are parameters and $\\Gamma(\\cdot)$ is the\n",
    "gamma function given by $\\Gamma(x) = \\int_{0}^{\\infty} t^{x-1} e^{-t}\n",
    "dt $.\n",
    "\n",
    "1. Derive the posterior distribution $p(\\sigma^{2} = s | x_{1} , \\ldots, x_{n}; \\alpha, \\beta)$. (HINT: inverse-gamma distribution is conjugate prior to sphericalGaussian distribution when mean is known).\n",
    "\n",
    "2. Assume $\\mu$ is known and consider two separate models (having different parameters)\n",
    "\n",
    "    * $\\alpha =1$ and $\\beta=1$ (Model $M_{A}$)\n",
    "    * $\\alpha = 10$ and $\\beta= 1$ (Model $M_{B}$) \n",
    "\n",
    "Compute analytically the expression for the MAP estimate for both models in terms of posterior parameters referred to as $\\alpha_{1}, \\beta_{1}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Practical problems\n",
    "\n",
    "**Useful python libraries/functions:**\n",
    "* **General:**  shape, reshape, np.mean etc.\n",
    "* **Plotting:** plot, scatter, legend, hold, imshow, subplot,\n",
    "  grid, title etc.\n",
    "\n",
    "## [Spherical Gaussian estimation, 5 points]\n",
    "\n",
    "Consider a dataset consisting of i.i.d. observations\n",
    "generated from a spherical Gaussian distribution $N(\\mu, \\sigma^2I)$, where $\\mu \\in \\mathbb{R}^p$, $I$ \n",
    "is the $p \\times p $ identity matrix, and $\\sigma^2$ is a scalar.\n",
    "\n",
    "(a) Write the mathematical expression for the MLE estimators for $\\mu$ and $\\sigma$ in above setup. (HINT: Use latex equations here, or mention in pdf. This [link](http://data-blog.udacity.com/posts/2016/10/latex-primer/) might be useful if you choose to write here).\n",
    "\n",
    "(b) Implement a function **sge()** that estimates the mean $\\mu$ and variance $\\sigma^{2}$ from the given data, using the skeleton code provided below. Note: You cannot use **numpy.cov** and **numpy.mean** or any other functions for calculating the mean and variance.\n",
    "\n",
    "(c) Implement a function **myplot1()** which takes as input a two-dimensional dataset $x$ (as described above); and draws, on the same plot, the following:\n",
    "1. A scatter plot of the original data $x$, \n",
    "2. Circles with center $\\mu$ and radius $r=k \\sigma$ for $k=1, 2, 3$ where $\\mathbf{\\mu}$ and $\\sigma^{2}$ denote the mean and variance estimated using **sge()**. \n",
    "3. Legend for each circle indicating the fraction of points (in the original dataset) that lie outside the circle boundary.\n",
    "\n",
    "(d) Load 'dataset0.txt' and run your code using only the first two features of the dataset. Submit the resulting plot as well as your implementation here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sge(X):\n",
    "    \"\"\"\n",
    "    SGE Mean and variance estimator for spherical Gaussian distribution\n",
    "\n",
    "    X : Data matrix of size n x p where each row represents a p-dimensional data point\n",
    "    e.g. X = [ 2 1 ; 3 7 ; 4 5 ] is a dataset having 3 samples having two co−ordinates each.\n",
    "    mu : Estimated mean o f the dataset [mu_1 mu_2 . . . mu_p]\n",
    "    sigma : Estimated standard deviation of the dataset ( number )\n",
    "    \"\"\"\n",
    "    return mu,sigma"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [MAP estimation, 5 points]\n",
    "\n",
    "Consider dataset $x_1, \\ldots, x_n $ consisting of i.i.d. observations \n",
    "generated from a multivariate normal distribution $N(\\mu, \\sigma^2I)$, where $\\mu =\n",
    "[\\mu_{1},\\, \\mu_{2}]^{\\top} \\in \\mathbb{R}^2$, $I$ \n",
    "is the $2 \\times 2$ identity matrix, and $\\sigma^2$ is a scalar. We will now explore the Bayesian approach to estimation of $\\sigma^{2}$ *under the assumption that the mean $\\mu$ is known.*\n",
    "The probability distribution of a point $\\mathbf{x}=[x_{1},\\, x_{2}]^{\\top}$ is given by\n",
    "\n",
    "$$ P(X = x \\,|\\, \\sigma^{2}) =  \\frac{1}{ 2\\pi \\sigma^2}   \\exp\n",
    "\\left( -\\frac{ (x - \\mu)^{\\top}(x - \\mu) }{2\\sigma^{2}} \\right)\n",
    "~.$$\n",
    "\n",
    "We assume that $\\sigma^{2}$ has an **inverse-gamma** prior distribution\n",
    "given by\n",
    "$$ P(\\sigma^{2} = s | \\alpha, \\beta) =\n",
    "\\frac{\\beta^{\\alpha}}{\\Gamma(\\alpha)} s^{-\\alpha-1} \\exp\\left( -\n",
    "  \\frac{\\beta}{s}\\right)~. \\tag{1} $$\n",
    "  \n",
    "where $\\alpha$ and $\\beta$ are parameters and $\\Gamma(\\cdot)$ is the\n",
    "gamma function given by $\\Gamma(x) = \\int_{0}^{\\infty} t^{x-1} e^{-t}\n",
    "dt $.\n",
    "\n",
    "Assume that your dataset now consists of just the first two features of 'dataset0.txt'.\n",
    "\n",
    "(a) Choose $\\mu$ to be the empirical mean. Implement a function **myplot2()**, that on the same plot, shows the prior and posterior distributions for $\\sigma$ with parameters $\\alpha = 1 $ and $\\beta = 1$.  Generate a second plot with $\\alpha=10$ and $\\beta=1$. What do you observe?\n",
    "\n",
    "HINT:\n",
    "   * Calculate the posterior distribution using the data and the formula that you derived in the theoretical question **\"Posterior distributions\"**.\n",
    "   * You might want to check out the \"log-sum-exp trick\"."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}