by Tom Rogers, Twitter Link Local hex time: Local standard time:
 Perfecting Simulations – The Quest for a Perfect Random Number Generator
 PRNG Evaluator Applet The attached applet generates a set of  10,000 floating point pseudo random numbers (PRNs) between 0 and 10. Each time the "Run Trial" button is pressed it creates a new trial by generating a new set. The "Reset" button resets the number of trials. These are used in calculating the % rejects (see below). The top plot  shows a histogram of the 10,000 data points. The applet provides five different pseudo random number generators (PRNGs). The "Uniform Distr" option uses a standard pseudo random number generator from the Java language. The "Normal Distr" option is also from Java but generates pseudo random numbers with a Normal Distribution. "LFSR" stands for linear feedback shift register. It’s a pseudo random number generator which has been used in cell phone communications. The applet has three different configurations for this PRNG. All three systems are referred to as pseudo random number generators since they generate numbers from equations which repeat over a long period of time. Three different hypothesis tests are set up so that rejecting Ho implies that the data is not random. Since alpha is 5%, Ho should be rejected about 5% of the time. (See the write up at left for more information.) Fig.2A shows the distribution of consecutive first digit runs in the 10,000 PRNs. Fig. 2B is a time plot showing the order in which the PRNs were generated.  Patterns in these plots indicate that the PRNs are not truly random.

In probability and statistics, simulations are a big deal. They’re used for things like confirming theories and studying all kinds of scientific phenomenon from social sciences to physics. These simulations are often, somewhat romantically, referred to as Monte Carlo simulations because, figuratively speaking, they  depend on a roll of the dice. More correctly, they depend on multiple rolls of the dice or the generation of multiple random numbers.

In the casino, consistently getting desired results depends largely on whether one is playing for the casino or playing for one’s self. While the games may appear random they're actually biased in favor of the casino. In Monte Carlo simulation bias is deadly. Even a subtle pattern in the random number generator (RNG) can cause significant errors. Ironically, simulations with excellent randomness in their RNGs have the advantage.

Large scale Monte Carlo simulations have huge appetites for random numbers. For example, Monte Carlo techniques can simulate the seemingly random motion of oxygen molecules diffusing through plastic food packaging. Tracking the behavior of particles on a molecular level can require supercomputers to generate billions of random numbers.

Obviously, rolling dice billions of times isn't an option and so computers rely on pseudo random number generators (PRNGs). These use equations programmed into software that can generate huge quantities of numbers in short periods of time. While the numbers they generate appear random, in reality they aren't.

PRNGs start with a seed value and will eventually repeat all the numbers they generate in exactly the same order. Putting in the same seed value will give precisely the same set of random numbers. On large scale Monte Carlo simulations, care has to be taken to make sure that the PRNG cycle is significantly longer than the quantity of random numbers needed or the pattern in the PRNG cycle can show up as an error producing pattern in the simulation results.

Seed values for PRNGs are generally chosen based on some type of random process. This can be as simple as using the time of day in seconds when the simulation was started, or may involve using a device which responds to random events in the environment, such as cosmic rays.

One might ask why these other systems of generating random numbers aren’t used instead of PRNGs. Time in seconds obviously has to be used sparingly since it tends to constantly increase not to mention follows a cycle. A background radiation detector is a better choice but requires lots of care to set up and maintain. Its calibration can drift or it can end up responding to non-random events like radon levels. These can be altered by simply opening or closing a window.

PRNGs are generally better from a setup and maintenance standpoint than other systems. They are also superior for debugging or verifying software. By keeping track of the seed value, it’s possible to repeat a simulation result and evaluate if the result was due to a software problem or the values of the pseudo random numbers used.

The attached applet contains five different types of random number generators along with several different ways to evaluate them. The Uniform Distr option uses a standard pseudo random number generator from the Java language. The Normal Distr option is also from Java but generates pseudo random numbers with a Normal Distribution. LFSR stands for linear feedback shift register. It’s a PRNG which has been used in cell phone communications.

LFSRs are made up primarily of digital devices called shift registers. Each shift register can hold a value of either zero or one. The reading from each of the registers can be combined into a binary number and converted into base 10 PNRs. The Basic  LFSR uses the bits from the SRs in the order they’re connected. The LFSR Modified uses the bits from the SRs in random order and the LFSR Dual 64s generates PRNs alternately from two different LFSRs with 64 SRs in each. These two LFSRs are set up with different initial values.

Note that the number of SRs can be altered with the “Adjust LSFS” button. This was done to illustrate the effects of having a short repeat rate. The repeat cycles of the various options are as follows:

SRs                  Repeat Cycle

4                      14

8                      255

10                    1023

16                    65536

32                    Approx  4.29 x 10^9

64                    Approx  1.84 x 10^19

The low number SRs have a loss of resolution since very few bits are used when deriving the PRNs from them.

Three of the ways to evaluate PRNG use hypothesis tests to evaluate the shape, spread, and central tendency of the distribution generated by the PRNGs. These respectively use chi squared, f, and z statistics.

In each case the null hypothesis is that the numbers are truly random. Rejecting the null is evidence that the numbers are not generated by a random source.

Alpha is 5 % for all the hypothesis tests. This implies that the null hypothesis would be rejected 5 % of the time even if the numbers were generated by a true random number generator. A consistently lower or higher reject rate is evidence that the PRNG does not truly generate random numbers. In this respect the chi and f-test reject rates are too low. However, this is probably only a reflection of the fact that the range of numbers is constrained.

The applet generates sets of 10,000 consecutive pseudo random numbers and displays them in a histogram and time plot. These are useful for spotting patterns. Bars in a histogram of random numbers should show a variation in height but have no unusually high or low bars. The time plot should be jagged and random looking.

Anyone who has studied coin flipping knows that the way to detect fake data is to look at runs of heads. Typically, fakers know that long runs are unlikely and incorrectly eliminate them. In “Benford’s Law Part 1 – How to Spot Fraud” we found out that the first digit of real data is not randomly distributed but the first digit of random numbers is randomly distributed. Hence having a uniform distribution of first digit runs is a powerful sign that numbers are indeed random.

 Figure 1) Applet Output for Standard Java PRNG : Z-test failure rate = 8.7 %, There is no apparent pattern in either the first digit repeats or time plot.
 Figure 1) Applet Output for 64 SR LFSR PRNG : Z-test failure rate = 36 %, There are patterns in both the first digit repeats and time plot.
 Figure 3) Applet Output for Improved 64 SR LFSR PRNG : Z-test failure rate = 33.6 %, There is no apparent pattern in either the first digit repeats or time plot.

The applet has a plot showing different length runs of first digits. In this case the first digit is defined as the digit in the ones place. For example, the first digit of 0.954 would be a zero while the first digit of 9.540 would be a 9. The first digit runs for random numbers would be evenly distributed among the digits. Run lengths of three (blue bars) should be more frequent than run lengths of four (purple bars) and they should be more frequent than run lengths of five (red bars).

Play with the applet and it quickly becomes apparent that there’s no all-inclusive test for screening random numbers. A PRNG can look great in one test and terrible in another. Figure 1 shows that while the standard Java PRNG shows no apparent pattern in the first digit runs or time plot it fails the z-test 8.7% of the time instead of the expected 5% rate.

Figure 2 shows that the basic 64 SR LFSRs first digit repeat plots have excessive repeats for 0 and 9 but no repeats for any other digits. It also show a pattern in the time plot as well as failing the z-test an excessive 36% of the time. In the basic Figure 3 shows that the LFSR system can be greatly improved by combining the binary values of the SRs in random order. While this improves performance, the improved LFSR system still fails the z-test an excessive 33.6% of the time. The applet clearly illustrates that true randomness is hard to find.

Unfortunately, the problem of finding useful PRNGs and confirming their randomness is even more complicated than our applet indicates. Remember, the applet only looks at 10,000 consecutive numbers at a time. Compare this to a cycle of over 1.8 x 10^19 for a 64 SR LFSR and it’s clear that the applet’s evaluation has only scratched the surface. If we could generate a billion random numbers a second, which far exceeds the rate of even high end desktop computers, it would take roughly 585 years to generate the entire cycle! Even in University and government research centers, finding and evaluating PRNG for Monte Carlo Simulations is a challenge.

For Further Information:
LFSR - New Wave Instruments