Probability: The Language of Uncertainty · ML Engineer

1Probability: The Language of Uncertainty

3Bayes' Theorem & Conditional Probability

Why should a backend engineer care about probability?

You deploy a fraud model. A transaction comes in, and the model outputs 0.87.

Not "fraud." Not "legit." Just... 0.87.

Your first instinct as a backend engineer is probably to wrap that in an if statement and move on. But that number is the whole point. Every ML model you'll ever work with, from credit scoring to spam filtering to ChatGPT picking its next word, speaks in probabilities. If you don't speak the language, you're just guessing at thresholds.

The good news: the core ideas are simple. You've been using them your whole life without naming them.

So what is probability, really?

Probability is a number between 0 and 1 that measures how confident you should be that something will happen.

That's it. 0 means "definitely not." 1 means "definitely yes." Everything interesting lives in between.

But here's the part people skip. Where does that number come from? There are two honest answers:

The frequency view. Run the experiment many times and count. Flip a coin 10,000 times, heads comes up about 5,000 times, so the probability of heads is 0.5. This is probability as long-run behavior.

The belief view. Sometimes you can't repeat the experiment. "Will this specific customer default on this specific loan?" happens exactly once. Here, probability measures your degree of belief given what you know. A 0.12 default probability means: among all customers who look like this one, about 12 in 100 defaulted.

ML models mostly live in the second world. When your model says 0.87 fraud probability, it's saying "transactions that look like this one turned out to be fraud 87% of the time in my training data."

Sample spaces: listing what can happen

Before you can put a number on anything, you need to know what the options are. The set of all possible outcomes is called the sample space.

Roll a die: the sample space is {1, 2, 3, 4, 5, 6}. Six outcomes, each equally likely, so each gets probability 1/6.

A card payment? The sample space might be {approved, declined, timeout}. Notice these are not equally likely. Maybe 92% approve, 7% decline, 1% timeout. That's fine. The only rule is that the probabilities across the whole sample space must add up to exactly 1.

Something has to happen. That's the rule.

An event is just a subset of the sample space you care about. "The payment failed" is an event covering both declined and timeout, so its probability is 0.07 + 0.01 = 0.08.

This sounds almost too basic to mention. But half the probability bugs I've seen come from a fuzzy sample space. If you can't list what can happen, you can't reason about how likely it is.

Independent or dependent? The question that changes everything

Two events are independent if knowing one happened tells you nothing about the other.

Flip a coin twice. The first flip landing heads doesn't change the second flip at all. Independent. And when events are independent, you multiply: the probability of two heads in a row is 0.5 × 0.5 = 0.25.

Now the dependent case. Say 2% of all transactions on your platform are fraudulent. You spot a transaction at 3 AM, from a brand new device, in a country the customer has never visited. Is the fraud probability still 2%?

Of course not. Those signals changed your belief. The events "transaction is fraud" and "transaction comes from a new device" are dependent. Knowing one shifts the probability of the other.

This is the entire foundation of ML in one sentence.

A fraud model is a machine that takes a pile of dependent signals and works out how far they should move the probability away from the base rate of 2%. If every feature were independent of fraud, no model on earth could beat a coin flip.

The multiplication trap

Multiplying probabilities is only valid for independent events. Fraudsters often fire many transactions in a burst, so "transaction 1 is fraud" and "transaction 2 from the same card is fraud" are strongly dependent. Multiply those as if independent and you'll badly underestimate the risk of a fraud run.

Expected value: what probability is worth in money

Here's where probability stops being abstract and starts paying your salary.

Expected value is the long-run average outcome, weighted by probability. Multiply each outcome by its probability, add them up.

Say your BNPL company approves a 1000 SAR purchase. From historical data, customers with this risk profile default 5% of the time, and when they default you recover almost nothing. Your margin on a good loan is 40 SAR.

Two outcomes:

Outcome	Probability	Result
Customer repays	0.95	+40 SAR profit
Customer defaults	0.05	-1000 SAR loss

Expected value = (0.95 × 40) + (0.05 × -1000) = 38 - 50 = -12 SAR.

On average, this loan loses you 12 SAR. Approve a million loans like this and you don't "maybe" lose money. You lose about 12 million SAR, almost guaranteed. The randomness of individual loans washes out at scale, and the expected value is what remains.

This is exactly how credit teams think. The number they obsess over is expected loss, usually written as:

# The formula every credit risk team lives by
expected_loss = probability_of_default * exposure * loss_given_default
 
# 5% default rate, 1000 SAR exposure, we lose 100% on default
expected_loss = 0.05 * 1000 * 1.0   # = 50 SAR per loan

Notice something important. The model's job is only the first term: estimating the probability of default. The business wraps that probability in money. This is why a model that outputs well-calibrated probabilities is worth far more than one that just says yes or no.

Why don't models just give an answer?

So back to that 0.87. Why doesn't the model just say "fraud"?

Because the world is genuinely uncertain, and pretending otherwise throws away information.

Two transactions, one scored 0.51 and one scored 0.99, are wildly different situations. A hard yes/no answer would treat them identically. The probability lets you decide what to do based on the cost of being wrong:

Blocking a legit 50 SAR grocery purchase annoys a customer. Cheap mistake.
Approving a fraudulent 20k SAR transfer costs you 20k. Expensive mistake.

So maybe you auto-block above 0.95, send anything between 0.60 and 0.95 to a human reviewer, and let the rest through. The threshold is a business decision built on expected value, not a modeling decision.

The model quantifies uncertainty. The business converts uncertainty into action. Keep those two jobs separate and a lot of ML system design suddenly makes sense.

Calibration matters more than accuracy

A model is well-calibrated when its probabilities mean what they say: among all transactions scored 0.80, about 80% really are fraud. In fintech, calibration is often more valuable than raw accuracy, because expected loss calculations are only as good as the probabilities you feed them.

The rules, compressed

Everything above boils down to a few rules you can hold in your head:

Rule	In plain words
P(A) is between 0 and 1	No negative probabilities, nothing above certain
All outcomes sum to 1	Something in the sample space must happen
P(not A) = 1 - P(A)	If default is 0.05, repayment is 0.95
P(A and B) = P(A) × P(B), if independent	Multiply only when events don't influence each other
E[X] = sum of outcome × probability	The long-run average, and the basis of expected loss

Five rows. That's most of the probability you need to read ML papers, sit in credit risk meetings, and set sensible thresholds.

What's next?

You now know what a probability is, when you're allowed to multiply them, and how expected value turns model scores into money decisions.

But real data isn't a single coin flip. Transaction amounts, fraud counts per day, response times: each of these is a quantity that varies randomly, and each has a characteristic shape. Next up is Random Variables and Distributions, where you'll learn to recognize those shapes and find out why one bell-shaped curve keeps showing up absolutely everywhere.