Lecture 1: examples of random number generation using NumPy

This goes through a number of examples.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-darkgrid')

Setting the seed makes our results reproducible: we will get the same result every time we run the program.

In [2]:
np.random.seed(0)

The function np.random.random generates uniformly distributed random floating-point numbers between 0 and 1.

In [3]:
np.random.random()
Out[3]:
0.5488135039273248

NumPy's functions for random number generation can optionally take a parameter size: if this parameter is given, multiple random numbers are returned.

In [4]:
np.random.random(size=5)
Out[4]:
array([0.71518937, 0.60276338, 0.54488318, 0.4236548 , 0.64589411])

The function np.random.randint generates uniformly distributed integer values. Here, we simulate a roll of 5 dice. Note that the second parameter is set to one plus the highest permitted value.

In [5]:
np.random.randint(1, 7, size=5)
Out[5]:
array([5, 1, 1, 5, 3])

The function np.random.shuffle takes a list or array and shuffles the items randomly.

In [6]:
x = ['here', 'is', 'a', 'list', 'containing', 'some', 'words']
np.random.shuffle(x)
x
Out[6]:
['some', 'a', 'list', 'containing', 'words', 'here', 'is']

The function np.random.choice randomly picks an item from a given list.

In [7]:
bloodgroups = [ 'A', 'O', 'B', 'AB' ]
np.random.choice(bloodgroups)
Out[7]:
'A'
In [8]:
np.random.choice(bloodgroups, size=100)
Out[8]:
array(['O', 'A', 'AB', 'A', 'AB', 'O', 'B', 'AB', 'AB', 'A', 'B', 'AB',
       'A', 'O', 'AB', 'O', 'AB', 'AB', 'B', 'AB', 'A', 'O', 'O', 'O',
       'AB', 'A', 'AB', 'B', 'A', 'AB', 'AB', 'B', 'AB', 'B', 'AB', 'A',
       'B', 'A', 'A', 'A', 'O', 'O', 'B', 'A', 'A', 'O', 'AB', 'A', 'O',
       'B', 'B', 'AB', 'A', 'O', 'O', 'AB', 'O', 'O', 'AB', 'B', 'AB',
       'AB', 'B', 'B', 'AB', 'A', 'B', 'AB', 'O', 'A', 'O', 'B', 'A',
       'AB', 'A', 'B', 'A', 'AB', 'AB', 'A', 'AB', 'A', 'A', 'A', 'A',
       'B', 'AB', 'A', 'AB', 'B', 'AB', 'AB', 'O', 'O', 'O', 'A', 'O',
       'O', 'O', 'AB'], dtype='<U2')

We can optionally provide a list of probabilities for np.random.choice; if the probabilities are not specified, they are assumed to be uniform.

In [9]:
# these probabilities hold for the population of Sweden, not generally
bloodgroup_probs = [0.44, 0.38, 0.12, 0.06] 
np.random.choice(bloodgroups, size=100, p=bloodgroup_probs)
Out[9]:
array(['A', 'B', 'A', 'B', 'A', 'AB', 'O', 'AB', 'O', 'O', 'A', 'A', 'A',
       'A', 'A', 'A', 'A', 'A', 'O', 'O', 'A', 'O', 'A', 'O', 'B', 'A',
       'O', 'A', 'O', 'A', 'A', 'O', 'A', 'B', 'A', 'O', 'A', 'O', 'AB',
       'A', 'O', 'O', 'O', 'A', 'AB', 'O', 'B', 'O', 'A', 'O', 'A', 'B',
       'O', 'B', 'O', 'O', 'O', 'AB', 'O', 'A', 'O', 'A', 'A', 'O', 'A',
       'O', 'A', 'A', 'A', 'O', 'O', 'O', 'O', 'O', 'A', 'B', 'A', 'A',
       'B', 'O', 'O', 'A', 'B', 'O', 'AB', 'A', 'B', 'A', 'O', 'A', 'B',
       'O', 'O', 'A', 'A', 'O', 'O', 'O', 'B', 'AB'], dtype='<U2')

By default, np.random.choice selects randomly with replacement; that is, if we call it multiple times, the same item might be included more than once. If we want to pick items without replacement, we can set the parameter replace to False. Here is an example where we draw three cards randomly without replacement from a deck of four cards.

In [10]:
deck = ['ace of spades', 'king of hearts', 'six of clubs', 'ten of diamonds']
np.random.choice(deck, replace=False, size=3)
Out[10]:
array(['ace of spades', 'ten of diamonds', 'king of hearts'], dtype='<U15')

We generate 100 rolls of a 6-sided die and plot a histogram of the results. A fair die corresponds to a discrete uniform distribution: this means that the histogram is fairly flat and consists of (six) distinct bars. And if you increase the number of rolls to a larger number, the shape will be flatter. (The histogram becomes more similar to the actual probability distribution.)

In [11]:
die_rolls = np.random.randint(1, 7, size=100)
plt.hist(die_rolls, bins=50);

The function np.random.random generates floating-point numbers: it models a continuous uniform distribution. That is, again we will get a quite flat histogram, but unlike the case with the die roll, it doesn't consist of distinct bars. (The die roll is a discrete distribution.) Again, the shape of the histogram will be smoother if you increase the size of the sample.

In [12]:
uniform_numbers = np.random.random(size=1000)
plt.hist(uniform_numbers, bins=25);

The Normal distribution, also called Gaussian distribution, is the famous distribution that has a bell-shaped curve. Again, this is a continuous distribution, which generates floating-point numbers.

In [13]:
gauss_numbers = np.random.normal(loc=10, scale=5, size=10000)
plt.hist(gauss_numbers, bins=50);

If you're interested in exploring other distributions, take a look at the official documentation of NumPy.