Probability for Data Science

3 min readOct 19, 2020

Probability and statistics form the basis for Data Science. Probability helps in making informed decisions about the likelihood of events based on the trends in collected data. Probability is therefore important in effectively handling Data Science problems, and this post gives an overview of probability for Data Science.

There are different ways to estimate probabilities, and some of the various ways will be considered in this post.

Empirical Probability

A random experiment is one in which you can’t predict its outcomes with certainty. For instance, the tossing of a coin is a random experiment, because the outcome of the toss (head or tail) depends on multiple factors like the strength and the angle of toss, the surface the coin lands on, etc

It has been stated above that we can't predict the outcome of a random experiment. However, we can estimate the chances associated with their outcomes. In the case of a coin toss, we can estimate the probability of the coin landing on head or tail.

To estimate the probability of a coin toss landing on head:

Toss the coin many times to repeat the random experiment
Count the number of times it lands on head
Divide the number of heads by the total number of times tossed.

All this explained above accounts for empirical probability.

Empirical probability is the ratio of the number of outcomes in which a specified event occurs to the total number of trials in a random experiment.

The formula for empirical probability for an event (E) is expressed as:

P(E) = Number of times event E occurred / Total number of trials.

NB: We can convert proportions (probability) to percentages because using percentages gives a more intuitive understanding of what a probability value is

One question that is always asked about probability is: “If different number of tosses landing on head gives a different probability value, what then is the true value of P(H)?”

The true value P(H) is given by the relative frequency of heads as the number of coin tosses increases. That is, the greater the number of tosses, the closer we are to the true probability.

Theoretical Probability

An easier way to estimate probabilities is to start with the assumption that the outcomes of a random experiment have equal chances of occurring.

That is, P(E) = 1 / Total number of possible outcomes

However, we cannot use the above formula to compute theoretical probabilities for events that include more than one outcome, for example, the event that we will get any number between 1 and 6 when rolling a fair die.

The formula therefore becomes:

P(E) = Number of successful outcome / Total number of possible outcomes

The event that you will get an odd number when you roll a 6 sided die includes 3 outcomes (1, 3, 5)

Therefore,

P(odd_number) = Number of successful outcome / Total number of possible outcomes = 3 / 6

NB: The formula for theoretical probability holds only under the assumption that the outcomes have equal chances of occurring.

Conclusion

In this post, we got acquainted with the basic concepts of probability and the differences between empirical and theoretical probability. To summarize, probability is ubiquitous, and there are numerous mathematical and everyday problems that are solved using probability.

I hope you enjoyed reading the article.

She Code Africa - Admin

Probability for Data Science

Conclusion

Written by Osasona Ifeoluwa