{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "1UKOQcXNtJ8e" }, "source": [ "$\\qquad$ $\\qquad$$\\qquad **TDA 231 Machine Learning: Homework 0** \n", "\\qquad \\qquad$$\\qquad$ **Goal: Introduction to Probability, Ipython Primer**
\n", "$\\qquad$ $\\qquad$$\\qquad **Grader: Aristide, Mikael** \n", "\\qquad \\qquad$$\\qquad$ **Due Date: 26/3**
\n", "$\\qquad$ $\\qquad$$\\qquad$ **Submitted by: Name, Personal no., email**
" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "PQ8gZxqWtJ8h" }, "source": [ "General guidelines:\n", "* All solutions to theoretical problems, can be submitted as a single file named *report.pdf*. They can also be submitted in this ipynb notebook, but equations wherever required, should be formatted using LaTeX math-mode.\n", "* All discussion regarding practical problems, along with solutions and plots should be specified here itself. We will not generate the solutions/plots again by running your code.\n", "* Your name, personal number and email address should be specified above and also in your file *report.pdf*.\n", "* All datasets can be downloaded from the course website.\n", "* All tables and other additional information should be included." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vrnQ98KgtJ8i" }, "source": [ "**Jupyter/IPython Notebook** is a collaborative Python web-based environment. This will be used in all our Homework Assignments except for Neural Network assignment which is be based on matlab. It is installed in the halls ES61-ES62, E-studio and MT9. You can also use google-colab: https://research.google.com/colaboratory/faq.html \n", "to run these notebooks without having to download, install, or do anything on your own computer other than a browser.\n", "Some useful resources:\n", "1. https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/ (Quick-start guide)\n", "2. https://www.kdnuggets.com/2016/04/top-10-ipython-nb-tutorials.html\n", "3. http://data-blog.udacity.com/posts/2016/10/latex-primer/ (latex-primer)\n", "4. http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html (markdown)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "fl1Lu21rtJ8k" }, "source": [ "# Theoretical problems\n", "\n", "## [Bayes Rule, 5 points]\n", "\n", "After your yearly checkup, the doctor has bad news and good news. The\n", "bad news is that you tested positive for a very serious cancer and\n", "that the test is 99% accurate i.e. the probability of testing\n", "positive given you have the disease is 0.99. The probability of\n", "testing negative if you donâ€™t have the disease is the same. The good news is that it is a very rare condition affecting only 1 in 10,000 people. What is the probability you actually have the disease? (Show all calculations and the final result.)\n", "\n", "## [Correlation and Independence, 5 points]\n", "\n", "Let $X$ be a continuous variable, uniformly distributed in $[-1, +1]$ and let $Y := X^2$. Clearly $Y$ is not independent of $X$ -- in fact it is uniquely determined by $X$. However, show that $\\mbox{cov}(X, Y ) = 0$.\n", "\n", "## [Setting hyperparameters, 3 points]\n", "\n", "Suppose $\\theta \\sim \\mbox{Beta}(a,b)$ and we believe $E[\\theta] = m$\n", "and $\\mbox{var}(\\theta) = v$. How should the parameters $a$ and $b$ be\n", "set to be consistent with this? Confirm that this gives the same values claimed in the lecture." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 34, "output_extras": [ { "item_id": 1 } ] }, "colab_type": "code", "executionInfo": { "elapsed": 618, "status": "ok", "timestamp": 1521034294955, "user": { "displayName": "Divy Grover", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "104758399712966395862" }, "user_tz": -60 }, "id": "I-mNIdpGtJ8m", "outputId": "cdf28907-8222-4f7d-8301-207efd2bcaef" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] } ], "source": [ "a = 5\n", "print (a)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "498qdw9utJ8t" }, "source": [ "# Practical problems\n", "\n", "## [Plotting normal distributed points, 5 points]\n", "\n", "Generate $1000$ points from 2D multivariate normal\n", "distribution having mean $\\mu = \\left[\n", "\\begin{array}{c}\n", " 1 \\\\\n", " 1\n", "\\end{array}\n", "\\right]$ and covariance $\\Sigma =\n", "\\left[\n", " \\begin{array}{rr}\n", " 0.1 & -0.05 \\\\\n", " -0.05& 0.2\n", " \\end{array}\n", "\\right]\n", "$. Define the function $f({\\bf x}, r) := \\frac{({\\bf x} - \\mu)^{ \\top } * \\Sigma^{-1} *\n", " ({\\bf x} - \\mu) }{ 2} - r$. On a single plot, show the following:\n", "* The level sets $f({\\bf x}, r) = 0$ for $r=1, 2, 3$.\n", "* Scatter plot of randomly generated points with points lying\n", "outside $f({\\bf x} , 3) = 0$ showing in black while points inside shown in\n", "blue.\n", "* Title of the plot showing how many points lie outside $f({\\bf\n", " x}, 3) = 0$.\n", "Submit your final plot as well as your implementation." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "collapsed": true, "id": "bXTnJdWTtJ8v" }, "outputs": [], "source": [ "import numpy as np\n", "import import matplotlib.pyplot as plt\n", "\n", "# You can use, np.meshgrid() and np.contour to make your life easier\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "hz3UCqmptJ81" }, "source": [ "## [Covariance and correlation, 5 points]\n", "Load dataset0.txt ($X$) containing 1074 data points\n", "each having 12 features related to US schools. Compute the covariance\n", "and correlation matrix for $X$. Scale each feature\n", "in $X$ between $[0, 1]$ to obtain a new dataset $Y$. Compute the\n", "covariance and correlation matrices for $X$ and $Y$, and plot them (e.g. as colormaps).\n", "What do you observe? Show a scatter plot of the pair of features in $Y$ having minimum\n", "correlation, indicating in the title the feature indices and the\n", "correlation value. Submit the plots, comments and your implementation." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 197, "output_extras": [ { "item_id": 1 } ] }, "colab_type": "code", "executionInfo": { "elapsed": 855, "status": "error", "timestamp": 1521466765511, "user": { "displayName": "Divy Grover", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "104758399712966395862" }, "user_tz": -60 }, "id": "nIccfvNGtJ82", "outputId": "e53f435e-0ac3-4504-c1d3-83e46c508369" }, "outputs": [], "source": [ "# You might want to load the data and analyze it first\n", "data = loadtxt(\"path-to-do/dataset0.txt\")\n", "print (data.shape)\n", "print (data[0])" ] } ], "metadata": { "colab": { "default_view": {}, "name": "HW0.ipynb", "provenance": [], "version": "0.3.2", "views": {} }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }