Newsletter #2: Certain Randomness
How randomness is achieved through algorithms and why it is important in the world of Computer Science
Random Numbers are at the crux of many use cases in Computer Science. In cryptography, they are used to create encryption keys. In simulations, they are used to seed or initiate a program to better represent the real world. They provide a way to encapsulate the vast possibilities in the real world when we try to model them on our computers.
This is also why they are used in games to determine things like when a bot plays very well, makes a mistake, how many enemies to spawn, where and when to spawn them, etc.
In machine learning, they are used to split the data into training sets and test sets, in algorithms such as random forests, initializing the weights of a neural net, etc.
The algorithm to generate random numbers should have the following property:
The next number in the sequence should not be predictable no matter how much past data you can see.
This means even if you have a neural net with access to a large amount of data generated by the algorithm, it should not be able to predict the next number with any meaningful accuracy.
Intuition
Intuitively, a simple way to generate a random number is to flip a coin. The probability of heads and tails is 0.5 and since each flip is an independent event, the historical data is useless to predict the next toss.
However, this method can be flawed because of the design of the coin itself. If one side is heavier than the other side, it might look like the toss is random for small data, but when many tosses are analyzed, the bias will be clear. The Gif below shows the same.
When the security of essentially all the data on the internet is dependent on a good RNG, such methods with the potential for unintentional bias, cannot be used.
In practice, there are two methods to achieve this, a True Random Generator and a Pseudo Random Generator
TRNG
A TRNG will take an entropy source that already exists in the system. Entropy is the randomness in the system. For example, the electrical activity in the SSD, clock cycles, movement of the mouse, etc. One, or a combination of these sources can be used to produce a sequence of bits that is truly random. One of the interesting ways is to use Lasers and the interaction between photons as a source of entropy.
Why do we need such a complicated procedure to produce a random number?
Well, as this article says, ‘any finite algorithm will eventually loop back and repeat itself, yielding a specific pattern of outputs.’ That is to say, a computer cannot generate true randomness without the help of some input from the physical world. But in the practical world, we have to make do with imperfections all the time and this is where Pseudo Random Generators of PRNGs come in.
PRNG
A pseudo-random number generator (PRNG) is an algorithm that generates a sequence of numbers that appear random, but are actually deterministic. This means that the sequence of numbers can be predicted if the starting point (also known as the seed value) is known.
This is useful when we want to regenerate a particular value or want a method that is computationally inexpensive.
This can be demonstrated using the random module in python as shown below:
Let’s look at some algorithms that are used for PRN generation
Linear Congruential Generator (LCG)
The Linear Congruential Generator (LCG) is a simple algorithm used to generate pseudo-random numbers. It uses a linear equation to calculate the next number in a sequence based on the current number. The equation used in an LCG is:
X[n+1] = (a * X[n] + c) % m
where:
X[n] is the current number in the sequence
a, c, and m are parameters that control the properties of the sequence
% is the modulo operator, which calculates the remainder when X[n+1] is divided by m
The parameters a, c, and m are carefully chosen to ensure that the sequence has a long period (the number of numbers in the sequence before it starts repeating) and a good distribution of values.
To generate the first number in the sequence, a seed value is used. This value is then used to calculate the second number, which is then used to calculate the third number, and so on. The period can be at max, m, i.e. after m values, the values returned by the algorithm will be duplicated.
The LCG is simple and efficient and is used in many applications where random numbers are needed. However, it has some limitations. For example, if the parameters are not chosen carefully, the sequence can have a short period or a poor distribution of values.
Mersenne Twister
The Mersenne Twister is a widely-used algorithm for generating pseudo-random numbers. It is known for its fast generation of high-quality random numbers, making it suitable for a wide range of applications.
In the original paper, it mentions ‘An implemented C-code, MT19937, has the period 2^219937 - 1 and a 623-dimensional equidistribution property’.
x(n-1) is generated by using a seed value and xn is generated by keeping k = 0. If we keep K = 1, 2, etc. we can get x(n+1), x(n+2), etc.
m is a constant such that 1 ≤ m ≤ n
x(uk) are the upper w - r bits of xk
x(lk+1) are the lower w-r bits of xk
The equation x(k+m) (XOR) x(uk) | x(lk+1) means concatenation in that order
A is a matrix used to get the final result.
There are many more details I have skipped here, which you can check out in the paper above.
My curiosity regarding how random.randint works in python led me to dig deeper into this topic. Turns out that generating something random is not trivial at all! If you know of any good resources to learn more about these algorithms, do let me know. If you
❎Poll
📖Resources
Generating Randomness: A Laser-Based Scramble for Random Numbers
Introduction to Random Number Generators for Machine Learning in Python
That’s it for this issue. I hope you found this article interesting. Until next time!
Connect Here: