Random and Pseudorandom Numbers

Randomness is a familiar concept and has even entered the popular lexicon ("that was random..."), but is a far more subtle idea than may be immediately apparent.

Random - A sequence (in time or in space) with no discernable pattern. A random process generates an outcome that cannot be predicted before the outcome has been taken.

Random values are the outcome of a random process, and have properties, including:

Allowed values -- these are numbers, characters, nucleotides (i.e., A, C, G, T representing the four DNA bases), amino acids, etc. that represent the possible outcomes of the process.

Because we routinely use a decimal numeral system, it very likely seems intuitive to use a random set of digits (0-9). This is not, however, appropriate for modeling many natural processes. The decimal numeral system is chosen arbitrarily, and one could equally well use a binary numeral system (as computers do), an octal or hexadecimal system (also used by computers). We measure time on a base 60 system (sexagesimal; 60 minutes in an hour and 60 seconds in a minute) because the ancient Sumerians used a base 60 numeral system. They developed the time system that is in use to this day, so our 60-second minute is "legacy code" from 5000 years ago. Consequently the allowed outcomes of a random process must be considered as an intrinsic part of the system being modeled.

Distribution - this represents the likelihoods of different allowed values.

Often random processes will assume a "flat" distribution, i.e., one in which all outcomes are equally likely, but this is neither necessary nor, in many cases, desirable. Just as a decimal set of allowed values is arbitrary, so is the assumption that each of these digits should appear with equal frequency. Consider, for example, the fact that the fractional change from 1 to 2 represents a doubling, while the fractional change from 8 to 9 represents only a 1/8, or 12.5% increase.

Random processes are often an important part of models.

Pseudorandom number generator - an algorithm that generates a series of numbers that has no internal pattern.

However, the series requires a starting seed, and if the algorithm is started repeatedly with the same seed, it will go through precisely the same sequence of numbers.

The most common pseudorandom number generators yield a flat distribution of digits 0-9.

Genuinely random number series are very difficult to generate algorithmically, and there are a variety of tricks that can be used to approach true randomness.

Arbitrary vs. random - we can also distinguish arbitrary values, which are chosen for no particular reason, but are not genuinely random, from true random values.

A random sample, which is a common concept in statistics, is a sample drawn from a population such that there is no bias in the process that selects the sample, and all members of the population have an equal chance of being selected.

Random sampling with replacement means that after a member of the population is sampled, it is returned to the population and may be sampled again.

Random sampling without replacement means that after a member of the population has been sampled, they are removed from the population and will not be sampled again.

 

Bioinformatics Home
Syllabus
Links
Reading