## When is Independence not Independence?

This brief article uses statistical independence as an example of when a mathematical definition is intentionally chosen to be different from the original motivating definition. (Another example comes from topology; the motivating/naive definition of a topological space would involve limits but instead open sets are used to define a topology.) This exemplifies the following messages:

- There is a difference between mathematical manipulations and intuition (and both must be learnt side-by-side); see also the earlier article on The Importance of Intuition.
- Understanding a definition mainly means understanding the
*usefulness*of the definition and how it can be*applied*in practice. - This has implications for how to teach and how to learn mathematical concepts.

Two random events, and , are statistically independent if . Here is the (small) conundrum. If one were to stare at this definition, it may not make much sense. What is it really telling us about the two events? On the other hand, if one were to learn that if and are “unrelated” events that have “nothing to do with each other” then must hold, then one might falsely believe to have understood the definition. Indeed, if and are related to each other, and and are related to each other, then surely and are related to each other? Conversely, if event is defined in terms of event then surely is related to ? Both these statements are false if ‘related’ is replaced by ‘statistically dependent’.

The true way of understanding statistical independence is to i) acknowledge that while it is motivated from real life by the intuitive notion of unrelated events, it is a different concept that has nevertheless proved to be very useful; and ii) be able to list a number of useful applications. Therefore, upon reading a definition that does not immediately feel comfortable, it may be better to flick through the remainder of the book to see the various uses of the definition than to stare blankly at the definition hoping for divine intuition.

For completeness, two naturally occurring examples of how statistical independence differs from “functional independence” are given. The first comes from the theory of continuous-time Markov chains but can be stated simply. Let and be two positive real numbers representing departure rates. Let and be independent and exponentially distributed random variables with parameters and respectively. (That is, for and .) The rule for deciding where to move to next (in the context of Markov chains) is to see which departure time, or , is smaller. (If is smaller than we move to destination 1, otherwise we move to destination 2.) Let be the probability that is smaller: . It can be shown that , and moreover, that the event is statistically independent of the departure time . This may seem strange if one thinks in terms of related events, so it is important to treat statistical independence as a mathematical concept that merely means regardless of whether or not and are, in any sense of the word, “related” to each other.

The second example is that an event can be statistically independent of itself! In fact, this turns out to be useful: to prove that is an “extreme” event, by which I merely mean that either or , it suffices to prove that is independent of itself, and sometimes the latter is easier to prove than the former. (Furthermore, having first proved that can only be zero or one can then make it easier to prove that it equals one, for instance.)

In closing, it is remarked that one can always challenge a definition by asking why this particular definition. Perhaps a different definition of statistical independence might be better? The response will always be: try to find a better definition! Sometimes you might be successful; this is how definitions are refined and generalised over time. Just keep in mind that a “good” definition is one that is useful and not necessarily one that mimics perfectly our intuition from the real world.

it appears when the definition of independence was first published, many people had trouble accepting or understansing it. the reason for this was confusion between independence and uncorrelation, which was more current.

the general definition of independence may be a bit mathematical. on the other hand, probability has an experimental aspect. today we do it on a computer but it used to be, in the 17th century with the Bernoullis, people spent long days doing dice or urn experiments.

perhaps the notion of independence grew from urn experiments. basically, drawing with replacement means independence and drawing without replacement leads to dependent events.

from a computer point of view, we can think of randomisation. given any two sequences of numbers A and B, we can randomly generate on our computer a third sequence of numbers C such that A = f(B,C) where f is a function. most importantly, C is generated without using any of the values of A or B. A and B are independent if A can be generated without knowing B, that is, A = f(C). The formula A = f(B,C) means that A is obtained by randomising B, using the computer generated sequence C.

Finally, in repeated experiments, think of A and B as the state of an automaton and of C as the input. According to the input, which the user generates without knowing the state, the new state A is defined from the history, or just the old state, B.