Representation of Real Numbers in Binary

So, we've dabbled in whole numbers in binary, and even those slightly awkward fractions with fixed point. But what about those really fiddly numbers – the ones with lots of decimal places, or the ridiculously huge or tiny ones? Well, that's where things get a bit more involved, and we venture into the realm of representing real numbers in binary.

As we know, in binary, each column to the left of the (implied or explicit) binary point represents increasing powers of 2 (1, 2, 4, 8, and so on). But if we want to represent fractions, we extend this to the right of the binary point. The first column after the point is 2^-1 (or 0.5), the next is 2^-2 (or 0.25), then 2^-3 (or 0.125), and so on. It's like the powers of 2 going on a digital diet, getting smaller and smaller.

Floating Point Binary: For the Really Big (and Really Small) Fish!

However, fixed point binary can run into trouble when you're dealing with numbers that are either incredibly large or incredibly tiny. Imagine trying to write the distance to the Sun or the size of an atom using a fixed number of bits with the point in the same place – you'd run out of room pretty quickly!

That's where floating point binary comes to the rescue. It's a much more flexible system, a bit like the scientific notation you might have used in science lessons (e.g., 1.23 × 10⁵). In floating point binary, a number is typically represented in the form a.aa × 2^b, where 'a.aa' is the mantissa (the significant digits) and 'b' is the exponent (telling you how many places to shift the binary point).

To work out the decimal value, you essentially take the binary point in the mantissa and move it to the right (if the exponent is positive) or to the left (if the exponent is negative) by the number of places given in the exponent. Then you just convert the resulting binary number to decimal.

Worked examples

a) Calculate the denary value of the floating point binary number 0010101010110, where 9 bits are used for the mantissa followed by 4 bits for the exponent, both in two's complement.

Step 1: split the binary number into the mantissa (0.01010101) and the exponent (0110).

Step 2: calculate the denary value of the exponent. 0110 = 6

Step 3: move the binary point in the mantissa six places to the right. 0010101.01

Step 4: the mantissa is now 0010101.01 - converting to denary gives (16 + 4 + 1 + 0.25) = 21.25

b) Calculate the denary value of the floating point binary number 011110, where 3 bits are used for the mantissa followed by 3 bits for the exponent, both in two's complement.

Step 1: split the binary number into the mantissa (0.11) and the exponent (110).

Step 2: calculate the denary value of the exponent. 110 = -4 + 2 = -2

Step 3: because the exponent is negative, the binary point in the mantissa is moved left two places instead of right.

Step 4: the mantissa is now 0.0011 (added zeros are needed when moving the point). Converting to denary gives: 0.125 + 0.0625 = 0.1875

Guided Practice

1. Calculate the denary value of the floating point binary number 1011001, where 4 bits are used for the mantissa followed by 3 bits for the exponent, both in two's complement.

Step 1: Split the binary number into the mantissa (4 bits) and the exponent (3 bits).

Mantissa: 1.011

Exponent: 001

Step 2: Calculate the denary value of the exponent. Remember that two's complement is being used.

Exponent in denary: 3

Step 3: In this case, if the answer from Step 2 is positive, move the binary point in the mantissa this many places right. If negative, move it left (you'll need to imagine a binary point after the first bit of the mantissa). Remember the mantissa also uses two's complement, so the leftmost bit is the sign.

Adjusted mantissa (with moved binary point): 1011.001

Step 4: Fill in your answer from Step 3 in a table like the one below. Use this to calculate the denary value. Don't forget that two's complement is being used, so the leftmost bit is negative.

-2³	2²	2¹	2⁰	.	2^-1	2^-2	2^-3
				.

Denary value: –8 + 2 + 1 + 0.125 = –5.125

2. Calculate the denary value of the floating point binary number 010110, where 4 bits are used for the mantissa followed by 2 bits for the exponent, both in two's complement.

Step 1: Split the binary number into the mantissa (4 bits) and the exponent (2 bits).

Mantissa: 0.101

Exponent: 10

Step 2: Calculate the denary value of the exponent. Remember, two's complement is being used.

Exponent in denary: -2

Adjusted mantissa (with moved binary point): 0.00101

Step 4: Fill in your answer from Step 3 in a table like the one below. Use this to calculate the denary value. Don't forget that two's complement is being used, so the leftmost bit is negative.

2⁰	.	2^-1	2^-2	2^-3	2^-4	2^-5
	.

Denary value: 0.125 + 0.03125 = 0.15625

Independent Practice

Calculate the denary values of the following floating-point binary numbers. All values are given in two's complement.

010110010101 (8 bits mantissa, 4 bits exponent)
100111110110 (8 bits mantissa, 4 bits exponent)
0111001010 (7 bits mantissa, 3 bits exponent)
0101101110101 (9 bits mantissa, 4 bits exponent)
1011111010 (4 bits mantissa, 6 bits exponent)
0111100101010 (8 bits mantissa, 5 bits exponent)
010111010000 (7 bits mantissa, 5 bits exponent)
10100011001100 (9 bits mantissa, 5 bits exponent)

Answers

Step 1: mantissa is 0.1011001, exponent is 0101
Step 2: denary value of exponent is 5
Step 3: move binary point in mantissa five places to give 010110.01
Step 4: convert to denary = 22.25
Step 1: mantissa is 1.0011111, exponent is 0110
Step 2: denary value of exponent is 6
Step 3: move binary point in mantissa six places to give 1001111.1
Step 4: convert to denary = -48.5
Step 1: mantissa is 0.11100, exponent is 010
Step 2: denary value of exponent is 2
Step 3: move binary point in mantissa two places to give 011.100
Step 4: convert to denary = 3.5
Step 1: mantissa is 0.10110111, exponent is 0101
Step 2: denary value of exponent is 5
Step 3: move binary point in mantissa five places right to give 010110.111
Step 4: convert to denary = 22.875
Step 1: mantissa is 1.011, exponent is 111010
Step 2: denary value of exponent is -6
Step 3: move binary point in mantissa six places left (fill with 1s for negative)
Step 4: convert to denary ≈ -0.0078125
Step 1: mantissa is 0.1111001, exponent is 01010
Step 2: denary value of exponent is 10
Step 3: move binary point in mantissa ten places right to give 0111100100.0
Step 4: convert to denary = 484
Step 1: mantissa is 0.10111, exponent is 010000
Step 2: denary value of exponent is 16
Step 3: move binary point in mantissa 16 places right
Step 4: convert to denary = 47104
Step 1: mantissa is 1.0100011, exponent is 001100
Step 2: denary value of exponent is 12
Step 3: move binary point in mantissa 12 places right
Step 4: convert to denary = -2976

Normalisation: Making Binary Numbers Look Their Best (and Most Precise)!

Ah, normalisation. In the slightly peculiar world of storing binary numbers, especially those floating about with their decimal points, there's often a need to make sure they're presented in a standard, most precise way. Think of it as giving your binary numbers a bit of a digital makeover.

For a positive binary number, normalisation basically means chopping off any unnecessary leading zeros. You know, those zeros at the very beginning that don't actually change the value of the number (like writing 007 instead of 7 – looks a bit silly). By getting rid of them, you make the most of the available bits to store the actual significant digits.

Now, for a negative binary number (often in two's complement form), the rule is similar, but you chop off any leading ones. Again, these leading ones don't add to the precision and can be discarded to make the representation more efficient.

This tidying-up process is particularly important for floating point numbers. A normalised floating point number has a very specific look: it must always start with either 0.1 or 1.0. This ensures that the most significant bit is right after the binary point (for positive numbers) or the first two bits are '10' after normalisation of a negative mantissa. It's all about having that first significant digit right next to the point to maximise the precision you can store.

Normalising a Floating Point Number: The Digital Point Shift!

So, how do you actually normalise a floating point binary number? Well, you simply move the binary point within the mantissa (that's the main part of the number) either to the left or to the right until it satisfies that golden rule of starting with 0.1 or 1.0.

But, and this is a crucial but, you can't just go moving the point willy-nilly without keeping track of things! Every time you move the binary point, you have to adjust the exponent (that's the power of 2 that tells you where the point really should be).

If you move the binary point to the right by a certain number of places to get it into the normalised form, you have to decrease the exponent by the same number. It's like saying, "I've made the mantissa smaller, so I need to make the power of 2 bigger to compensate."

Conversely, if you move the binary point to the left, you have to increase the exponent by the same amount. "I've made the mantissa bigger, so I need a smaller power of 2."

It's a bit of a digital dance, moving the point and adjusting the exponent to keep the overall value of the number the same while ensuring it's in that nice, neat normalised form. All for the sake of squeezing the most accuracy out of those precious few bits!

Worked examples

a) The floating point binary number 000101010101 has an 8-bit mantissa followed by a 4-bit exponent, both in two's complement. Find the normalised version of this number.

Step 1: split the binary number into the mantissa (0.0010101) and the exponent (0101).

Step 2: move the binary point in the mantissa right until it starts as 0.1 or 1.0. Moving it two places ensures that the mantissa is now 0.10101.

Step 3: however, the mantissa is now only 6 digits in length, so add two zeros to the right-hand side. This keeps it at 8 digits (as required by the question) but does not change the value. The mantissa is now 0.1010100.

Step 4: as the binary point has been moved two places right, the exponent must be decreased by 2. The exponent is currently 0101 (5 in denary), so subtracting 2 from this gives 0011 (3).

Step 5: join together the new mantissa and exponent in the format given in the question. In this case, the normalised value is 010101000011.

b) Give the denary value 27.75 as a normalised floating point binary number, using 8 bits for the mantissa and 4 bits for the exponent, both in two's complement.

Step 1: write 27.75 as a fixed point binary number using two's complement = 011011.11.

Step 2: move the binary point five places to the left so that the mantissa is normalised (giving 0.1101111).

Step 3: the binary point was moved five places left, so increase the exponent by 5. (The exponent is currently 0, so 0 + 5 = 5.) This is 0101 in two's complement binary.

Step 4: join together the new mantissa and exponent in the format given in the question. In this case, the normalised value is 011011110101.

Guided Practice

1. State whether each of these floating point binary numbers is normalised or not:

a) 1011011 - Answer: Yes, as this begins with 10 or 01.

b) 00110010 - Answer: No, as this does not begin with 10 or 01.

c) 10100100 - Answer: Yes, as this begins with 10 or 01.

2. The floating point binary number 0011110011 has a 7-bit mantissa followed by a 3-bit exponent, both in two's complement. Find the normalised version of this number.

Step 1: split the binary number into the mantissa (0.011110) and the exponent (011). Put the binary point between the first two digits of the mantissa to give 0.011110.

Step 2: move the binary point in the mantissa right until it starts as 0.1, trimming off any leading zeros. This will give 0.11110 having moved the binary point one place.

Step 3: add additional zeros to the right-hand side to keep the mantissa as 7 bits in length without affecting the value. This gives 0.1111000.

Step 4: decrease the exponent by the number of places moved right in Step 2. Subtracting 1 from 011 gives 010.

Step 5: join together the new mantissa and exponent in the format given in the question. The answer is therefore 0111100010.

3. The floating point binary number 111010010001 has an 8-bit mantissa followed by a 4-bit exponent, both in two's complement. Find the normalised version of this number.

Step 1: split the binary number into the mantissa (1.1101001) and the exponent (0001). Put the binary point between the first two digits of the mantissa to give 1.1101001.

Step 2: as the mantissa starts with a 1, it is a negative number. Move the binary point in the mantissa right until it starts as 1.0, trimming off any leading 1s. This gives 1.01001, having moved the binary point two places.

Step 3: add additional zeros to the right-hand side to keep the mantissa as 8 bits in length without affecting the value. This gives 1.0100100.

Step 4: decrease the exponent by the number of places moved right in Step 2. Subtracting 2 from 1 gives –1, which is 1111 in 4-bit two's complement binary.

Step 5: join together the new mantissa and exponent in the format given in the question. The answer is therefore 101001001111.

Independent Practice

1. State whether each of these floating point binary numbers is normalised or not:

001101101
1101011010
0110110101

2. Give the normalised versions of each of the following floating point binary numbers. Both the mantissa and exponent are in two's complement.

00101000100 (8 bits mantissa, 3 bits exponent)
000010110111 (8 bits mantissa, 4 bits exponent)
1101000011 (6 bits mantissa, 4 bits exponent)
1010010 (4 bits mantissa, 3 bits exponent)
001101011001 (8 bits mantissa, 4 bits exponent)

3. Give the normalised floating point binary version of the following denary numbers, using two's complement, 8 bits for the mantissa and 4 bits for the exponent.

11.375
-6.875

Answers

1

a 0010110101 – No, as this does not begin with 10 or 01.
b 1101011010 – No, as this does not begin with 10 or 01.
c 0110110101 – Yes, as this begins with 10 or 01.

2

Step 1: mantissa is 0.0101000, exponent is 100 (4 in 3-bit two's complement = -4).

Step 2: move binary point to give mantissa of 0.101000 (moved one place right).

Step 3: add additional zeros to keep the mantissa as the right length. 01010000

Step 4: decrease the exponent by 1. 011

Step 5: normalised answer = 01010000011

Step 1: mantissa is 0.0001011, exponent is 0111 (7).

Step 2: move binary point to give mantissa of 0.1011 (moved three places right).

Step 3: add additional zeros to keep the mantissa as the right length. 01011000.

Step 4: decrease the exponent by 3. 0100

Step 5: normalised answer = 010110000100.

Step 1: mantissa is 1.10100, exponent is 0011 (3).

Step 2: move binary point to give mantissa of 1.0100 (moved one place right).

Step 3: add additional zeros to keep the mantissa as the right length. 101000

Step 4: decrease the exponent by 1. 0010

Step 5: normalised answer = 1010000010.

This is already normalised, so no need to do any more calculations. Normalised answer = 1010010

Step 1: mantissa is 0.0110101, exponent is 1001 (–7).

Step 2: move binary point to give mantissa of 0.110101 (moved one place right).

Step 3: add additional zeros to keep the mantissa as the right length. 01101010

Step 4: decrease the exponent by 1. 1000 (note the exponent is negative here)

Step 5: normalised answer = 011010101000.

3

Step 1: write down the fixed point version of 11.375. 01011.011

Step 2: normalise this by moving the binary point four places left. This gives 0.1011011.

Step 3: as the binary point has been moved four places left, the exponent is therefore 4 (0100).

Step 4: join the mantissa and exponent together. The full answer is therefore 010110110100.

Step 1: write down the fixed point version of –6.875. 1001.001

Step 2: normalise this by moving the binary point three places left. This gives 1.001001.

Step 3: as the binary point has been moved three places left, the exponent is therefore 3 (0011).

Step 4: the mantissa is only 7 bits at the moment. Add a 0 to the right-hand side to make it 8 bits without changing the value.

Step 5: join the mantissa and exponent together. The full answer is therefore 100100100011.

Range and Precision: The Digital Give and Take!

So, we've seen that floating point binary is a clever way to represent a vast range of numbers. But because of that constant juggling act between range and precision, the digital version of a real number isn't always a perfect copy of the real thing. Sometimes, we have to make do with a close approximation, and that's where errors sneak in like digital gremlins.

In the world of floating point binary, if you decide to use more bits for the exponent, you can represent a much wider range of numbers – both incredibly large and incredibly small. The binary point can effectively 'float' over a much greater distance. However, using more bits for the exponent leaves fewer bits available for the mantissa, which means you have fewer significant digits to represent the actual value of the number, thus reducing the precision. You might be able to store a number as big as the national debt, but not necessarily down to the last digital penny.

On the flip side, if you allocate more bits to the mantissa, you can represent numbers with much greater precision – you can store more significant figures and have a more accurate representation. However, this comes at the cost of having fewer bits for the exponent, which limits the overall range of numbers you can represent. You might be able to store the width of a human hair with incredible accuracy, but you might struggle to represent the distance to the nearest star.

So, it's a constant balancing act, this range versus precision malarkey. The designers of computer systems have to decide how to allocate those precious bits based on the types of calculations the computer will be doing. It's a bit like deciding whether you need a really long measuring tape that isn't very finely marked, or a shorter one with very precise markings. You can't usually have both to the maximum!

Worked examples

a) Calculate the largest number that can be stored using an 8-bit binary number if the number uses 3 bits for the mantissa and 5 bits for the exponent.

Step 1: with 3 bits for the mantissa and 5 bits for the exponent, then the largest number that can be represented is 01101111.

Step 2: the mantissa is therefore 0.11, with the binary point needing to be moved 15 places to the right (the exponent 01111 = 15).

Step 3: convert to denary. 011 followed by 15 zeros is 11000000000000000. 16384 + 8192 = 24576. This means that the largest denary number that can be stored in this format is 24576.

b) Calculate the largest number that can be stored using an 8-bit binary number if the number uses 5 bits for the mantissa and 3 bits for the exponent.

Step 1: with 5 bits for the mantissa and 3 bits for the exponent, the largest number that can be represented is 01111011.

Step 2: the mantissa is therefore 0.1111, with the binary point needing to be moved three places to the right (011 = 3). This means that the largest denary number that can be stored in this format is 7.5.

Step 3: convert to denary. 0111.1 = 4 + 2 + 1 + 0.5 = 7.5.

Guided Practice

1. Calculate the largest number that can be stored using a 12-bit binary number if the number uses 8 bits for the mantissa and 4 bits for the exponent.

Step 1: write down the maximum value that can be stored in the mantissa and the exponent. Because of two's complement, this will be entirely 1s, except for the first digit of each, which will be 0. Therefore the maximum value will be 011111110111.

Step 2: insert the binary point into the mantissa, making it 0.1111111. This is then moved right 7 places, making the value 01111111.

Step 3: in denary, this is 127.

Answer: 127

2. Calculate the largest number that can be stored using a 12-bit binary number if the number uses 7 bits for the mantissa and 5 bits for the exponent.

Step 1: the maximum that can be stored in this case is 011111101111.

Step 2: insert the binary point into the mantissa, making it 0.111111. This is then moved right 15 places, making the value 0111111000000000.

Step 3: in denary, this is 32256.

Answer: 32256

Independent Practice

Calculate the largest number that can be stored if the number uses:

9 bits, 6 bits for the mantissa and 3 bits for the exponent.
9 bits, 5 bits for the mantissa and 4 bits for the exponent.
6 bits, 3 bits for the mantissa and 3 bits for the exponent.
6 bits, 4 bits for the mantissa and 2 bits for the exponent.
6 bits, 2 bits for the mantissa and 4 bits for the exponent.

Answers

a

Step 1: maximum values 0.11111 (mantissa) and 011 (exponent).

Step 2: move the binary point right three places, giving 0111.11.

Step 3: this is 7.75 in denary.

b

Step 1: maximum values 0.1111 (mantissa) and 0111 (exponent).

Step 2: move the binary point right seven places, giving 01111000.

Step 3: this is 120 in denary.

c

Step 1: maximum values 0.11 (mantissa) and 011 (exponent).

Step 2: move the binary point right three places, giving 0110.

Step 3: this is 6 in denary.

d

Step 1: maximum values 0.111 (mantissa) and 01 (exponent).

Step 2: move the binary point right one place, giving 01.11.

Step 3: this is 1.75 in denary.

e

Step 1: maximum values = 0.1 (mantissa) and 0111 (exponent).

Step 2: move the binary point right seven places, giving 01000000.

Step 3: this is 64 in denary.

Errors: When Digital Reality Doesn't Quite Match Up!

Take their example: the number 76.65625. To store this with a decent amount of accuracy, you might need a certain number of bits for the mantissa (they suggest 13 bits) and some for the exponent (4 bits in their example). But what if the computer only gives you a smaller number of bits for the mantissa, say only 10? Well, you'd have to chop off some of the less significant bits, and suddenly, your stored value might be 76.625. It's close, but it's not quite the same as the original. That little difference? That's an error.

There are a couple of ways we can describe these digital discrepancies. One is the absolute error. This is simply the difference between the original, true value and the value that's actually stored in the computer. You just subtract one from the other to find out how far off the digital representation is.

The other way to think about it is the relative error. This gives you a sense of how significant the error is compared to the size of the original number. To calculate this, you take the absolute error and express it as a percentage of the original, true value. So, a small absolute error might be a big relative error if the original number was very small, and vice versa. It's all about perspective, really.

These errors are an inherent part of using floating point binary to represent real numbers. Because we're using a finite number of bits to try and capture the infinite possibilities of real numbers (including those with never-ending decimal expansions!), we often have to make approximations. It's a bit like trying to measure a wiggly coastline with a straight ruler – you're bound to have some inaccuracies. Understanding these potential errors is crucial for anyone doing serious number-crunching with computers, especially in scientific and engineering fields. You need to know how much you can trust your digital results!