{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$\\qquad$ $\\qquad$$\\qquad$  **TDA 231 Machine Learning: Homework 5** <br />\n",
    "$\\qquad$ $\\qquad$$\\qquad$ **Goal: Clustering**<br />\n",
    "$\\qquad$ $\\qquad$$\\qquad$                   **Grader: Aristide** <br />\n",
    "$\\qquad$ $\\qquad$$\\qquad$                     **Due Date: 21/5** <br />\n",
    "$\\qquad$ $\\qquad$$\\qquad$                   **Submitted by: Name, Personal no., email** <br />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "General guidelines:\n",
    "* Since there is no theoretical part for this assigment, submit this ipynb only (with completed code/results).\n",
    "* All discussion regarding practical problems, along with solutions and plots should be specified here itself. We will not generate the solutions/plots again by running your code.\n",
    "* Your name, personal number and email address should be specified above.\n",
    "* All datasets can be downloaded from the course website.\n",
    "* All plots/tables and other relevant information should be included."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Practical problems\n",
    "\n",
    "The follwing code might be useful for this excercise.\n",
    "\n",
    "```python\n",
    "import scipy.io\n",
    "mat = scipy.io.loadmat('hw5_p1a.mat')\n",
    "print (mat.keys())\n",
    "X = mat['X']\n",
    "```\n",
    "\n",
    "## [K-Means Implementation, 20 points]\n",
    "\n",
    "a. Implement the basic (linear) $k$-means algorithm as described in the lecture, using the euclidean distance. Use (uniformly) random points from the data as initialization for the centroids. Terminate the iterative procedure when the the cluster assignments do not change.\n",
    "\n",
    "b. Run your implementation on the matrix $X$ in **hw5_p1a.mat** with $k=2$. Each row of the matrix is an observation, and each column is a feature. Store the cluster assignment both after 2 iterations, and at convergence. Produce a scatter plot of the data with colors indicating the cluster assignments at convergence and highlight points that have changed assignment after the second iteration.\n",
    "\n",
    "c. Implement the kernel $k$-means algorithm as described in the lecture, using the Gaussian RBF-kernel.\n",
    "\n",
    "d. Run the linear $k$-means **and** your kernel $k$-means on **hw5_p1b.mat** with $k=2$. For the Gaussian RBF-kernel, use $\\sigma=0.2$. Produce scatter plots of the data, with color indicating the cluster assignment at convergence, one plot for each of the algorithms."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}