Have you ever tried learning about floating point numbers, and given up after coming across terms like “mantissa”, “exponent”, “FPU”, “IEEE 754” — or worse, seen the dreaded “floating point number equation”?
Let’s throw all these terms away. Let’s talk about floating point numbers in a simple way that doesn’t assume knowledge of binary, bytes, or bases.
In this article we’ll first cover a related concept, “fixed point numbers”, and then use that as a jump-off point for floating point numbers. In both cases, we’ll use the analogy of a metre-stick, and how the markings on it let us view centimetres, millimetres and more.
In this article, I’m going to deliberately ignore a lot of the mathematics and simplify certain concepts, so please don’t be upset if you read this and it’s not exactly technically accurate. The idea is to get across the core concept of floating point numbers.
- Fixed point numbers give us fixed, predictable precision
- Fixed point numbers are like metre sticks!
- Floating point numbers give us dynamic precision
- Floating point numbers can store either very large numbers imprecisely, or very small numbers precisely
A metre stick (or meterstick) is a measuring stick that is exactly 1 metre long. It will typically have thick marks on it every 10cm, slightly thinner marks every 1cm, and thinner marks yet every 1mm.
- 1m is 10 times bigger than 10cm
- 10cm is 10 times bigger than 1cm
- 1cm is 10 times bigger than 1mm
This is worth calling out. This metre stick has a precision of 1mm, and the markings are set in base 10.
Let’s define two terms here. Accuracy is how close a result is to the “real value” of it. Precision is how specific your result is. As a silly example, say we were to measure the length of a car that was actually 254 centimetres long, but in kilometres.
Saying it’s 132.327521 kilometres would be very precise, but also very inaccurate. Saying it was 0 kilometres would be very imprecise, but would actually be far more accurate than the first value.
A precise AND accurate answer would be 0.00254 kilometres.
We will also use the term smallest representable value to refer to precision.
Let’s imagine a company (“MeSticks” for short) that’s decided to make its fortune by manufacturing and selling measuring sticks. It’s created two major products, the Fixed Stick and the Floating Stick.
When developing these sticks, our company has enough wood to make the stick as wide as they want! However, what with expensive printer ink costs these days, they’re limited to how many markings they can print onto each stick. And they’re not willing to compromise on ink quality — only the darkest, smoothest black ink will go on their quality rulers.
Let’s imagine they want to build a 1m long stick. Due to ink limitations, they can put exactly 1000 marks on the ruler (not counting numbers written on). So, if we divide 1m by 1000, we get 1mm. They can place a mark every 1mm.
We’ll say that the largest representable value of the stick is 1m. The smallest representable value is 1mm.
If we want to measure out 36.7cm on this stick, that’s not a problem at all for us. We can see it is 367mm.
But what if we want to measure something larger, say 242cm? Our stick is far too small for this. So let’s throw it out and get one that’s scaled up to 10x bigger!
This new stick is 10m long. The number of marks we can make on the stick are going to stay the same as before — 1000 — so we’ll now see a mark every 1m, every 10cm, and every 1cm. But we will not be able to measure out 1mm properly. It now has a precision of 1cm. The largest representable value is 10m, and the smallest representable value is 1cm. Because we’ve made it bigger, and we’ve got a precision smaller than (or equal to) 1cm, we can happily measure out 242cm with no problem!
But we can no longer measure 36.7cm like we could before. We’ve lost precision, and can only measure either 36 or 37cm — in this case we’d probably round up to 37. We must make a decision though, and this is key.
If we wanted to, we could ask MeSticks to make another stick that is 10x bigger still, to measure out an even larger value like 770cm (but not 767cm.)
Or, we could get one that is 10x smaller than our first one, and measure out something much smaller like 9.4mm. But if we use a smaller one, we’re limiting the maximum size we can measure. No matter the size of stick though, MeSticks is always allowed exactly 1000 markings on it — no more, no less.
If you’ve followed this section, you should now understand the core concept behind fixed point numbers in computer science! Regardless of how they’re stored under the hood, the key point is this:
We can choose what range of values we want to cover by changing the largest representable value to one that covers our needs (without being so large that we don’t have a useful precision.)
These are the fun ones! Think of our original measuring stick again, in all its glorious 1m length. But this time, let’s change up how we spend our 1000 marks. Rather than place a single mark every 1/1000th of the way along the ruler, what if MeSticks made a mark at distances increasingly further apart?
On the new ruler, you can see two clear markings on either end, that say “0m” and “1m”. And, as before, you can see marks every 10cm, saying 10cm, 20cm, 30cm etc. But as you look for 36.7cm, it’s simply not there. You either have to choose 30cm or 40cm. There are no markings at all between the two of them.
Looking over to the left though, you notice that there are still markings between 0cm and 10cm, spaced 1 every centimetre. And, even further left, there’s millimetre markings between 0cm and 1cm.
But there’s something more. Getting a magnifying glass, you can see 10 clear marks between 0mm and 1mm, evenly spaced. There’s not a proper scientific name for this width (besides simply 0.1mm!), so we’ll call it a hair-width, as that’s how fine it is.
You can see that this keeps happening to the left, smaller than you could see even with a microscope. There’s also two extra, but incredibly important details I left out with this metre stick.
The first is that it’s not actually 1m long. It’s 100m long, and there is a marking for every 1m up to 10m, and then every 10m up to 100m. The second is that it’s the only type of Floating Stick MeSticks has decided they’re going to make. We cannot throw it out and buy a bigger or smaller one — it’s 100m, take it or leave it!
As before though, there are still 1000 markings on the stick, even if a lot of them are too small for us to see!
Let’s look at the values we talked about before. Can we measure 36.7cm? No we can’t — but we can round it to the nearest 10cm to get 40cm.
Can we measure 242cm? Again, not exactly — we can measure 2m, but we didn’t need to expand the stick at all to do so.
770cm? That’ll come out as 8m — again, no expanding needed. Accurate, but not precise.
We also looked at 9.4mm, which required us to get a smaller ruler before. This can be measured on our floating point stick! Since it’s smaller than 10mm, and we get 1/10thmm (or a “hair width”) precision, yes, we can point at this value exactly on our stick.
Can you see the pattern here?
Without needing to expand or shrink the stick, we can represent either very large numbers imprecisely, or very small numbers with higher precision.
This is the key idea behind floating point numbers:
At this point, you may be feeling like fixed precision numbers are superior to floating point — after all, look at all those useful numbers we weren’t able to show properly!
Yet in reality, most modern software uses floating point numbers rather than fixed point. Why?
The first is ergonomics. It’s far easier for a programmer to use a single data-type, the “float”, to store pretty much any number they want than it is for them to figure out what range they care about when using a fixed point number.
The second key one, which the metre-stick analogy can have trouble showing, is that the range of values is actually far larger than you’d expect. The commonly used “double precision” floating point number can represent values as small 2×10⁻³⁰⁰. In simpler terms, that looks something like this (you might need to scroll for a while!):
And the largest value is around 2×10³⁰⁰ — which looks like this:
These numbers are so mind bogglingly small and large, the human brain cannot comprehend them properly!
This is the magic of floating point numbers. They will let you store unimaginably large numbers, at which point you probably don’t actually care about the same level of precision as we were talking about earlier.
For “human sized” numbers, say 100cm, we’d actually be able to specify as precise as 100.00000000000001cm — much finer than a hair width. This fact is why floating point numbers are generally preferred — there’s very few cases where the precision it provides actually isn’t good enough for the values you’re trying to store. (That’s not at all to say that some software doesn’t have a legitimate use for fixed point numbers though!)
You might recall that earlier I pointed out that our rulers were set in base 10. I say this as binary numbers are in base 2. As such, instead of floating point numbers changing precision at points of 1, 10, 100, 1000 they would change precision at 1, 2, 4, 8, 16.
Many video games use floating point numbers to store player and object positions. This works well for most cases, but as those distances get larger, strange effects can happen!
You can see in this video how the original version of Minecraft had weird bugs occur when the player travelled too far away from the centre of the world:
This is directly related to the fact we can’t store large numbers precisely with floating point numbers!
There’s so much more to how fixed and floating point numbers work under the hood, and I encourage you to go continue learning about them! They’re fundamental to how computers and essentially all modern software works.
A great follow on guide to look at is here: