Floating Point Representation

Arun prashanth
4 min readMay 6, 2021

--

Image Courtesy: Nick Hillier on Unsplash

Computers are designed to use binary digits to represent numbers. Decimal (base 10) numbers are converted to their binary( base 2) equivalent then they represent either Integer or floating point form.

In this blog, I was given a clear view of the floating point representation.

The numbers with the fraction that can show the form to represent the floating point numbers in computers;

These are called floating point. because the binary point is not fixed.

Nowadays computers follow the IEEE 754 standard to represent the binary floating-point representation.

When considering IEEE Floating-point representation it can have three components to represent the numbers.

1.Single Precision- 32 bit
2.Double Precision- 64bit
3.Long Double (Extended ) Precision- 80bit

These three formats are represented via:

s represents the sign of number whether the s is 1 the floating-point number is negative and s is 0 it’s positive.

m is represent the mantissa and b is represents base 2 and e has represented the exponent.

The structure of most commonly using the IEEE Floating-point representation as:-

Single Precision
Double Precision
Long Double Precision

Let’s take an example to understand how the floating-point number is converted into a binary representation.

let’s take the floating point number 20.1 here 20 is the integer part and .1 is the fraction part.

There are several steps to convert the binary format ;

  1. Convert the integer part to binary
Binary form for 20

2. Convert the fraction part. This will happened in tricky but it’s not hard.

So, the 20.1 is = 10100.000110011001100110011…

3. Binary representation changes to the scientific representation

4. Convert to IEEE 754 Standard

i) Sign -> 0 ( Because of positive )
ii) Find the exponent:

For this example, we used single precision so the exponent range is (-128) to 127.

Here the 127 is exponent bias

Whenever we have (+) number what it does is the number of movement point location value will add with the exponent bias. In finding exponent add exponent bias with the number of movement of point location then change it to the binary format

So 131 decimal is converted as binary as 10000011
The exponent is 10000011

This is how 20.1 represent in IEEE standard

When it comes to the floating point rounding in IEEE standard, here using single precision, so the mantissa will take 23s bit. If the 24th bit gets 1, the 23rd bit having 0, then the 24th bit value will add with 23rd value .

Before Rounding
After Rounding

This is how floating numbers are getting converted to binary by using the IEEE 754 Standard.

--

--