An Intuitive View on Mutual Information | by Mark Chang

[ad_1]

We will break down the Mutual Info system into the next elements:

The x, X and y, Y

x and y are the person observations/values that we see in our knowledge. X and Y are simply the set of those particular person values. A superb instance can be as follows:

Discrete/Binary statement of umbrella-wielding and climate

And assuming we now have 5 days of observations of Bob on this precise sequence:

Discrete/Binary statement of umbrella-wielding and climate over 5 days

Particular person/Marginal Chance

These are simply the easy chance of observing a specific x or y of their respective units of potential X and Y values.

Take x = 1 for example: the chance is solely 0.4 (Bob carried an umbrella 2 out of 5 days of his trip).

Joint Chance

That is the chance of observing a specific x and y from the joint chance of (X, Y). The joint chance (X, Y) is solely simply the set of paired observations. We pair them up in response to their index.

In our case with Bob, we pair the observations up primarily based on which day they occurred.

It’s possible you’ll be tempted to leap to a conclusion after wanting on the pairs:

Since there are equal-value pairs occurring 80% of the time, it clearly signifies that folks carry umbrellas BECAUSE it’s raining!

Properly I’m right here to play the satan’s advocate and say that which will simply be a freakish coincidence:

If the possibility of rain could be very low in Singapore, and, independently, the chance of Bob carrying umbrella can be equally low (as a result of he hates holding further stuff), are you able to see that the percentages of getting (0,0) paired observations will probably be very excessive naturally?

So what can we do to show that these paired observations are usually not by coincidence?

Joint Versus Particular person Possibilities

We will take the ratio of each chances to offer us a clue on the “extent of coincidence”.

Within the denominator, we take the product of each particular person chances of a specific x and explicit y occurring. Why did we accomplish that?

Peering into the standard coin toss

Recall the primary lesson you took in statistics class: calculating the chance of getting 2 heads in 2 tosses of a good coin.

1st Toss [ p(x) ]: There’s a 50% likelihood of getting heads
2nd Toss [ p(y) ]: There’s nonetheless a 50% likelihood of getting heads, because the consequence is unbiased of what occurred within the 1st toss
The above 2 tosses make up your particular person chances
Due to this fact, the theoretical chance of getting each heads in 2 unbiased tosses is 0.5 * 0.5 = 0.25 ( p(x).p(y) )

And should you truly do possibly 100 units of that double-coin-toss experiment, you’ll possible see that you just get the (heads, heads) end result 25% of the time. The 100 units of experiment is definitely your (X, Y) joint probability set!

Therefore, while you take the ratio of joint versus combined-individual chances, you get a worth of 1.

That is truly the true expectation for unbiased occasions: the joint chance of a selected pair of values occurring is precisely equal to the product of their particular person chances! Similar to what you had been taught in basic statistics.

Now think about that your 100-set experiment yielded (heads, heads) 90% of the time. Certainly that may’t be a coincidence…

You anticipated 25% since you recognize that they’re unbiased occasions, but what was noticed is an excessive skew of this expectation.

To place this qualitative feeling into numbers, the ratio of chances is now a whopping 3.6 (0.9 / 0.25), primarily 3.6x extra frequent than we anticipated.

As such, we begin to suppose that possibly the coin tosses had been not unbiased. Perhaps the results of the first toss may even have some unexplained impact on the 2nd toss. Perhaps there may be some degree of affiliation/dependence between 1st and 2nd toss.

That’s what Mutual Info tries to tells us!

Anticipated Worth of Observations

For us to be truthful to Bob, we must always not simply have a look at the instances the place his claims are incorrect, i.e. calculate the ratio of chances of (0,0) and (1,1).

We must also calculate the ratio of chances for when his claims are appropriate, i.e. (0,1) and (1,0).

Thereafter, we will combination all 4 eventualities in an anticipated worth methodology, which simply means “taking the common”: combination up all ratio of chances for every noticed pair in (X, Y), then divide it by the variety of observations.

That’s the objective of those two summation phrases. For steady variables like my inventory market instance, we’ll then use integrals as an alternative.

Logarithm of Ratios

Just like how we calculate the chance of getting 2 consecutive heads for the coin toss, we’re additionally now calculating the extra chance of seeing the 5 pairs that we noticed.

For the coin toss, we calculate by multiplying the chances of every toss. For Bob, it’s the identical: the chances have multiplicative impact on one another to offer us the sequence that we noticed within the joint set.

With logarithms, we flip multiplicative results into additive ones:

Changing the ratio of chances to their logarithmic variants, we will now merely simply calculate the anticipated worth as described above utilizing summation of their logarithms.

Be at liberty to make use of log-base 2, e, or 10, it doesn’t matter for the needs of this text.

Placing It All Collectively

Formula for Mutual Information for Discrete Observations — Components for Mutual Info for Discrete Observations

Let’s now show Bob incorrect by calculating the Mutual Info. I’ll use log-base e (pure logarithm) for my calculations:

So what does the worth of 0.223 inform us?

Let’s first assume Bob is true, and that using umbrellas are unbiased from presence of rain:

We all know that the joint chance will precisely equal the product of the person chances.
Due to this fact, for each x and y permutation, the ratio of chances = 1.
Taking the logarithm, that equates to 0.
Thus, the anticipated worth of all permutations (i.e. Mutual Info) is due to this fact 0.

However because the Mutual Info rating that we calculated is non-zero, we will due to this fact show to Bob that he’s incorrect!

[ad_2]

Source link

An Intuitive View on Mutual Information | by Mark Chang | Mar, 2024

AI enabling behavioral health innovation

Unveiling the Hidden Complexities of Cosine Similarity in High-Dimensional Data: A Deep Dive into Linear Models and Beyond

Editor

Unveiling the Hidden Complexities of Cosine Similarity in High-Dimensional Data: A Deep Dive into Linear Models and Beyond

Leave a Reply Cancel reply

Browse by Category

Categories

Recommended

An Intuitive View on Mutual Information | by Mark Chang | Mar, 2024

The x, X and y, Y

Particular person/Marginal Chance

Joint Chance

Joint Versus Particular person Possibilities

Anticipated Worth of Observations

Logarithm of Ratios

Placing It All Collectively

AI enabling behavioral health innovation

Unveiling the Hidden Complexities of Cosine Similarity in High-Dimensional Data: A Deep Dive into Linear Models and Beyond

Editor

Unveiling the Hidden Complexities of Cosine Similarity in High-Dimensional Data: A Deep Dive into Linear Models and Beyond

Leave a Reply Cancel reply

Browse by Category

Browse by Tags

Categories

Recommended