{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "1UKOQcXNtJ8e" }, "source": [ "$\\qquad$ $\\qquad$$\\qquad **TDA 231 Machine Learning: Homework 0** \n", "\\qquad \\qquad$$\\qquad$ **Goal: Introduction to Probability, Ipython Primer**
\n", "$\\qquad$ $\\qquad$$\\qquad **Grader: Aristide, Mikael** \n", "\\qquad \\qquad$$\\qquad$ **Due Date: 26/3**
\n", "$\\qquad$ $\\qquad$$\\qquad$ **Submitted by: Name, Personal no., email**
" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "PQ8gZxqWtJ8h" }, "source": [ "General guidelines:\n", "* All solutions to theoretical problems, can be submitted as a single file named *report.pdf*. They can also be submitted in this ipynb notebook, but equations wherever required, should be formatted using LaTeX math-mode.\n", "* All discussion regarding practical problems, along with solutions and plots should be specified here itself. We will not generate the solutions/plots again by running your code.\n", "* Your name, personal number and email address should be specified above and also in your file *report.pdf*.\n", "* All datasets can be downloaded from the course website.\n", "* All tables and other additional information should be included." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vrnQ98KgtJ8i" }, "source": [ "**Jupyter/IPython Notebook** is a collaborative Python web-based environment. This will be used in all our Homework Assignments except for Neural Network assignment which is be based on matlab. It is installed in the halls ES61-ES62, E-studio and MT9. You can also use google-colab: https://research.google.com/colaboratory/faq.html \n", "to run these notebooks without having to download, install, or do anything on your own computer other than a browser.\n", "Some useful resources:\n", "1. https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/ (Quick-start guide)\n", "2. https://www.kdnuggets.com/2016/04/top-10-ipython-nb-tutorials.html\n", "3. http://data-blog.udacity.com/posts/2016/10/latex-primer/ (latex-primer)\n", "4. http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html (markdown)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "fl1Lu21rtJ8k" }, "source": [ "# Theoretical problems\n", "\n", "## [Bayes Rule, 5 points]\n", "\n", "After your yearly checkup, the doctor has bad news and good news. The\n", "bad news is that you tested positive for a very serious cancer and\n", "that the test is 99% accurate i.e. the probability of testing\n", "positive given you have the disease is 0.99. The probability of\n", "testing negative if you don’t have the disease is the same. The good news is that it is a very rare condition affecting only 1 in 10,000 people. What is the probability you actually have the disease? (Show all calculations and the final result.)\n", "\n", "## [Correlation and Independence, 5 points]\n", "\n", "Let $X$ be a continuous variable, uniformly distributed in $[-1, +1]$ and let $Y := X^2$. Clearly $Y$ is not independent of $X$ -- in fact it is uniquely determined by $X$. However, show that $\\mbox{cov}(X, Y ) = 0$.\n", "\n", "## [Setting hyperparameters, 3 points]\n", "\n", "Suppose $\\theta \\sim \\mbox{Beta}(a,b)$ and we believe $E[\\theta] = m$\n", "and $\\mbox{var}(\\theta) = v$. How should the parameters $a$ and $b$ be\n", "set to be consistent with this? Confirm that this gives the same values claimed in the lecture." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 34, "output_extras": [ { "item_id": 1 } ] }, "colab_type": "code", "executionInfo": { "elapsed": 618, "status": "ok", "timestamp": 1521034294955, "user": { "displayName": "Divy Grover", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "104758399712966395862" }, "user_tz": -60 }, "id": "I-mNIdpGtJ8m", "outputId": "cdf28907-8222-4f7d-8301-207efd2bcaef" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] } ], "source": [ "a = 5\n", "print (a)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "498qdw9utJ8t" }, "source": [ "# Practical problems\n", "\n", "## [Plotting normal distributed points, 5 points]\n", "\n", "Generate $1000$ points from 2D multivariate normal\n", "distribution having mean $\\mu = \\left[\n", "\\begin{array}{c}\n", " 1 \\\\\n", " 1\n", "\\end{array}\n", "\\right]$ and covariance $\\Sigma =\n", "\\left[\n", " \\begin{array}{rr}\n", " 0.1 & -0.05 \\\\\n", " -0.05& 0.2\n", " \\end{array}\n", "\\right]\n", "$. Define the function $f({\\bf x}, r) := \\frac{({\\bf x} - \\mu)^{ \\top } * \\Sigma^{-1} *\n", " ({\\bf x} - \\mu) }{ 2} - r$. On a single plot, show the following:\n", "* The level sets $f({\\bf x}, r) = 0$ for $r=1, 2, 3$.\n", "* Scatter plot of randomly generated points with points lying\n", "outside $f({\\bf x} , 3) = 0$ showing in black while points inside shown in\n", "blue.\n", "* Title of the plot showing how many points lie outside $f({\\bf\n", " x}, 3) = 0$.\n", "Submit your final plot as well as your implementation." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "collapsed": true, "id": "bXTnJdWTtJ8v" }, "outputs": [], "source": [ "import numpy as np\n", "import import matplotlib.pyplot as plt\n", "\n", "# You can use, np.meshgrid() and np.contour to make your life easier\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "hz3UCqmptJ81" }, "source": [ "## [Covariance and correlation, 5 points]\n", "Load dataset0.txt ($X$) containing 1074 data points\n", "each having 12 features related to US schools. Compute the covariance\n", "and correlation matrix for $X$. Scale each feature\n", "in $X$ between $[0, 1]$ to obtain a new dataset $Y$. Compute the\n", "covariance and correlation matrices for $X$ and $Y$, and plot them (e.g. as colormaps).\n", "What do you observe? Show a scatter plot of the pair of features in $Y$ having minimum\n", "correlation, indicating in the title the feature indices and the\n", "correlation value. Submit the plots, comments and your implementation." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 197, "output_extras": [ { "item_id": 1 } ] }, "colab_type": "code", "executionInfo": { "elapsed": 855, "status": "error", "timestamp": 1521466765511, "user": { "displayName": "Divy Grover", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "104758399712966395862" }, "user_tz": -60 }, "id": "nIccfvNGtJ82", "outputId": "e53f435e-0ac3-4504-c1d3-83e46c508369" }, "outputs": [], "source": [ "# You might want to load the data and analyze it first\n", "data = loadtxt(\"path-to-do/dataset0.txt\")\n", "print (data.shape)\n", "print (data)" ] } ], "metadata": { "colab": { "default_view": {}, "name": "HW0.ipynb", "provenance": [], "version": "0.3.2", "views": {} }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }