0% found this document useful (0 votes)
18 views

COA Unit - II Notes

computer organization and architecture unit 2

Uploaded by

eruvaram12
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

COA Unit - II Notes

computer organization and architecture unit 2

Uploaded by

eruvaram12
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

COMPUTER ORGANIZATION AND ARCHITECTURE

Unit-II
Data Representation: Signed number representation, fixed and floating point representations, Character
representation.
Computer Arithmetic: Integer addition and subtraction, Multiplication – shift and add, Booth multiplication,
Signed operand multiplication, Division, Floating point arithmetic.
-----------------------------------------------------------------------------------------------------------------------------------

Data Representation:
Number Systems
Human beings use decimal (base 10) because we have 10 fingers such as 0, 1, 2, Up to 9).
Computers use binary (base 2) number system, as they are made from binary digital components (known as
transistors) operating in two states - on and off.
In computing, we also use hexadecimal (base 16) or octal (base 8) number systems, as a compact form for
representing binary numbers.
Decimal (Base 10) Number System
Decimal number system has ten symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, called digits.
It uses positional notation. That is, the least-significant digit (right-most digit) is of the order of 10^0 (units or
ones), the second right-most digit is of the order of 10^1 (tens), the third right-most digit is of the order
of 10^2 (hundreds), and so on, where ^ denotes exponent. For example,
735 = 700 + 30 + 5 = 7×10^2 + 3×10^1 + 5×10^0
Binary (Base 2) Number System
Binary number system has two symbols: 0 and 1, called bits. It is also a positional notation, for example,
101102
= 10000 + 0000 + 100 + 10 + 0 = 1×2^4 + 0×2^3 + 1×2^2 + 1×2^1 + 0×2^0
A binary digit is called a bit. Eight bits is called a byte (why 8-bit unit? Probably because 8=23).
Hexadecimal (Base 16) Number System
Hexadecimal number system uses 16 symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F, called hex digits.
It is a positional notation, for example,
A3EH = A00H + 30H + EH = 10×16^2 + 3×16^1 + 14×16^0
We shall denote a hexadecimal number (in short, hex) with a suffix H. Some programming languages denote
hex numbers with prefix 0x or 0X (e.g., 0x1A3C5F), or prefix x with hex digits quoted (e.g., x'C3A4D98B').
Each hexadecimal digit is also called a hex digit. Most programming languages accept lowercase 'a' to 'f' as well
as uppercase 'A' to 'F'.

Hexadecimal Binary Decimal


0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
A 1010 10
B 1011 11
C 1100 12
D 1101 13
E 1110 14
F 1111 15
Conversion from Hexadecimal to Binary
Replace each hex digit by the 4 equivalent bits (as listed in the above table), for examples,
A3C5H = 1010 0011 1100 0101B
102AH = 0001 0000 0010 1010B
Conversion from Binary to Hexadecimal
Starting from the right-most bit (least-significant bit), replace each group of 4 bits by the equivalent hex digit
(pad the left-most bits with zero if necessary), for examples,
1001001010B = 0010 0100 1010B = 24AH
10001011001011B = 0010 0010 1100 1011B = 22CBH
It is important to note that hexadecimal number provides a compact form or shorthand for representing binary
bits.
Conversion from Base r to Decimal (Base 10)
Given a n-digit base r number: dn-1dn-2dn-3...d2d1d0 (base r), the decimal equivalent is given by:
dn-1×rn-1 + dn-2×rn-2 + ... + d1×r1 + d0×r0For examples,
A1C2H = 10×16^3 + 1×16^2 + 12×16^1 + 2 = 41410 (base 10)
10110B = 1×2^4 + 1×2^2 + 1×2^1 = 22 (base 10)
Conversion from Decimal (Base 10) to Base r
Use repeated division/remainder.
For example,
To convert 261(base 10) to hexadecimal:
261/16 => quotient=16 remainder=5
16/16 => quotient=1 remainder=0
1/16 => quotient=0 remainder=1 (quotient=0 stop)
Hence, 261D = 105H (Collect the hex digits from the
remainder in reverse order)
Conversion between Two Number Systems with Fractional Part
Separate the integer and the fractional parts.
For the integer part, divide by the target radix repeatedly, and collect the remainder in reverse order.
For the fractional part, multiply the fractional part by the target radix repeatedly, and collect the integral part in
the same order.

Example 1: Decimal to Binary


Convert 18.6875D to binary
Integer Part = 18D
18/2 => quotient=9 remainder=0
9/2 => quotient=4 remainder=1
4/2 => quotient=2 remainder=0
2/2 => quotient=1 remainder=0
1/2 => quotient=0 remainder=1 (quotient=0 stop)
Hence, 18D = 10010B
Fractional Part = .6875D
.6875*2=1.375 => whole number is 1
.375*2=0.75 => whole number is 0
.75*2=1.5 => whole number is 1
.5*2=1.0 => whole number is 1
Hence .6875D = .1011B
Combine, 18.6875D = 10010.1011B
Example 2: Decimal to Hexadecimal
Convert 18.6875D to hexadecimal
Integer Part = 18D
18/16 => quotient=1 remainder=2
1/16 => quotient=0 remainder=1 (quotient=0 stop)
Hence, 18D = 12H
Fractional Part = .6875D
.6875*16=11.0 => whole number is 11D (BH) Hence .6875D = .BH
Combine, 18.6875D = 12.BH
Integer Representation(Fixed representation)
Integers are whole numbers or fixed-point numbers with the radix point fixed after the least-significant bit. They
are contrast to real numbers or floating-point numbers, where the position of the radix point varies.
Integer and float numbers have different representation and are processed differently (e.g., floating-point
numbers are processed in a so-called floating-point processor).
Computers use a fixed number of bits to represent an integer. The commonly-used bit-lengths for integers are 8-
bit, 16-bit, 32-bit or 64-bit. There are two types of integer representations:
Unsigned Integers: can represent zero and positive integers.
Signed Integers: can represent zero, positive and negative integers. Three representation schemes had been
proposed for signed integers:
Sign-Magnitude representation
1's Complement representation
2's Complement representation
n-bit Unsigned Integers
Unsigned integers can represent zero and positive integers, but not negative integers. The value of an unsigned
integer is interpreted as "the magnitude of its underlying binary pattern".
Example 1: Suppose that n=8 and the binary pattern is 0100 0001B, the value of this unsigned integer is 1×2^0
+ 1×2^6 = 65D.
Example 2: Suppose that n=16 and the binary pattern is 0001 0000 0000 1000B, the value of this unsigned
integer is 1×2^3 + 1×2^12 = 4104D.
Example 3: Suppose that n=16 and the binary pattern is 0000 0000 0000 0000B, the value of this unsigned
integer is 0.
Signed Integers
Signed integers can represent zero, positive integers, as well as negative integers. Three representation schemes
are available for signed integers:
Sign-Magnitude representation
1's Complement representation
2's Complement representation
In all the above three schemes, the most-significant bit (msb) is called the sign bit. The sign bit is used to
represent the sign of the integer.
0 for positive integers and 1 for negative integers. The magnitude of the integer, however, is interpreted
differently in different schemes.
In sign-magnitude representation:
The most-significant bit (msb) is the sign bit, with value of 0 representing positive integer and 1 representing
negative integer.
The remaining n-1 bits represents the magnitude (absolute value) of the integer. The absolute value of the
integer is interpreted as "the magnitude of the (n-1)-bit binary pattern".

Sign bit is 0 ⇒ positive


Example 1: Suppose that n=8 and the binary representation is 0 100 0001B.

Absolute value is 100 0001B = 65D


Hence, the integer is +65D

Sign bit is 1 ⇒ negative


Example 2: Suppose that n=8 and the binary representation is 1 000 0001B.

Absolute value is 000 0001B = 1D


Hence, the integer is -1D

Sign bit is 0 ⇒ positive


Example 3: Suppose that n=8 and the binary representation is 0 000 0000B.

Absolute value is 000 0000B = 0D


Hence, the integer is +0D

Sign bit is 1 ⇒ negative


Example 4: Suppose that n=8 and the binary representation is 1 000 0000B.

Absolute value is 000 0000B = 0D


Hence, the integer is -0D
The
Drawbacks of sign-magnitude representation are:

There are two representations (0000 0000B and 1000 0000B) for the number zero, which could lead to
inefficiency and confusion.
Positive and negative integers need to be processed separately.
n-bit Sign Integers in 1's Complement Representation
In 1's complement representation:
Again, the most significant bit (msb) is the sign bit, with value of 0 representing positive integers and 1
representing negative integers.
The remaining n-1 bits represents the magnitude of the integer, as follows:
for positive integers, the absolute value of the integer is equal to "the magnitude of the (n-1)-bit binary pattern".
for negative integers, the absolute value of the integer is equal to "the magnitude of the complement (inverse) of
the (n-1)-bit binary pattern" (hence called 1's complement).

Sign bit is 0 ⇒ positive


• Example 1: Suppose that n=8 and the binary representation 0 100 0001B.
Absolute value is 100 0001B = 65D
Hence, the integer is +65D

Sign bit is 1 ⇒ negative


• Example 2: Suppose that n=8 and the binary representation 1 000 0001B.

Absolute value is the complement of 000 0001B, i.e., 111 1110B = 126D
Hence, the integer is -126D

Sign bit is 0 ⇒ positive


• Example 3: Suppose that n=8 and the binary representation 0 000 0000B.

Absolute value is 000 0000B = 0D


Hence, the integer is +0D

Sign bit is 1 ⇒ negative


• Example 4: Suppose that n=8 and the binary representation 1 111 1111B.

Absolute value is the complement of 111 1111B, i.e., 000 0000B = 0D


Hence, the integer is -0D

• n-bit
Sign Integers in 2's Complement Representation

• In 2's complement representation:


• Again, the most significant bit (msb) is the sign bit, with value of 0 representing positive integers and 1
representing negative integers.
• The remaining n-1 bits represents the magnitude of the integer, as follows:
• for positive integers, the absolute value of the integer is equal to "the magnitude of the (n-1)-bit
binary pattern".
• for negative integers, the absolute value of the integer is equal to "the magnitude of the
complement of the (n-1)-bit binary pattern plus one" (hence called 2's complement).

Sign bit is 0 ⇒ positive


• Example 1: Suppose that n=8 and the binary representation 0 100 0001B.

Absolute value is 100 0001B = 65D


Hence, the integer is +65D

Sign bit is 1 ⇒ negative


• Example 2: Suppose that n=8 and the binary representation 1 000 0001B.

Absolute value is the complement of 000 0001B plus 1, i.e., 111 1110B + 1B = 127D
Hence, the integer is -127D

Sign bit is 0 ⇒ positive


• Example 3: Suppose that n=8 and the binary representation 0 000 0000B.

Absolute value is 000 0000B = 0D


Hence, the integer is +0D

Sign bit is 1 ⇒ negative


• Example 4: Suppose that n=8 and the binary representation 1 111 1111B.

Absolute value is the complement of 111 1111B plus 1, i.e., 000 0000B + 1B = 1D
Hence, the integer is -1D

Floating-Point Number Representation

• A floating-point number (or real number) can represent a very large (1.23×10^88) or a very small
(1.23×10^-88) value.
• It could also represent very large negative number (-1.23×10^88) and very small negative number (-
1.23×10^88), as well as zero,
A floating-
point number is typically expressed in the scientific notation, with a fraction (F), and an exponent (E) of a
certain radix (r), in the form of F×r^E.

Decimal numbers use radix of 10 (F×10^E); while binary numbers use radix of 2 (F×2^E).
For example, the number 55.66 can be represented as 5.566×10^1, 0.5566×10^2, 0.05566×10^3, and so on.
The fractional part can be normalized. In the normalized form, there is only a single non-zero digit before the
radix point.
For example, decimal number 123.4567 can be normalized as 1.234567×10^2;
binary number 1010.1011B can be normalized as 1.0101011B×2^3.
• IEEE-754 32-bit Single-Precision Floating-Point Numbers
• In 32-bit single-precision floating-point representation:
• The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative numbers.
• The following 8 bits represent exponent (E).
• The remaining 23 bits represents fraction (F).

Normalized Form

• Let's illustrate with an example, suppose that the 32-bit pattern is 1 1000 0001 011 0000 0000 0000
0000 0000, with:
• S=1
• E = 1000 0001
• F = 011 0000 0000 0000 0000 0000
• In the normalized form, the actual fraction is normalized with an implicit leading 1 in the form of 1.F. In
this example, the actual fraction is 1.011 0000 0000 0000 0000 0000 = 1 + 1×2^-2 + 1×2^-3 = 1.375D.
• The sign bit represents the sign of the number, with S=0 for positive and S=1 for negative number. In
this example with S=1, this is a negative number, i.e., -1.375D.
• In normalized form, the actual exponent is E-127 (so-called excess-127 or bias-127). This is because we
need to represent both positive and negative exponent.
• With an 8-bit E, ranging from 0 to 255, the excess-127 scheme could provide actual exponent of -127 to
128. In this example, E-127=129-127=2D.
• Hence, the number represented is -1.375×2^2=-5.5D.
• IEEE-754 64-bit Double-Precision Floating-Point Numbers

• The representation scheme for 64-bit double-precision is similar to the 32-bit single-precision:
• The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative numbers.

• The following 11 bits represent exponent (E).

• The remaining 52 bits represents fraction (F).

Character Representation:

In computer memory, character are "encoded" (or "represented") using a chosen "character encoding schemes"
(aka "character set", "charset", "character map", or "code page").

For example, in ASCII (as well as Latin1, Unicode, and many other character sets):

 code numbers 65D (41H) to 90D (5AH) represents 'A' to 'Z', respectively.
 code numbers 97D (61H) to 122D (7AH) represents 'a' to 'z', respectively.
 code numbers 48D (30H) to 57D (39H) represents '0' to '9', respectively.

It is important to note that the representation scheme must be known before a binary pattern can be interpreted.
E.g., the 8-bit pattern "0100 0010B" could represent anything under the sun known only to the person encoded
it.

The most commonly-used character encoding schemes are: 7-bit ASCII (ISO/IEC 646) and 8-bit Latin-x
(ISO/IEC 8859-x) for western european characters, and Unicode (ISO/IEC 10646) for internationalization
(i18n).

A 7-bit encoding scheme (such as ASCII) can represent 128 characters and symbols. An 8-bit character
encoding scheme (such as Latin-x) can represent 256 characters and symbols; whereas a 16-bit encoding
scheme (such as Unicode UCS-2) can represents 65,536 characters and symbols.

5.1 7-bit ASCII Code (aka US-ASCII, ISO/IEC 646, ITU-T T.50)

 ASCII (American Standard Code for Information Interchange) is one of the earlier character coding
schemes.
 ASCII is originally a 7-bit code. It has been extended to 8-bit to better utilize the 8-bit computer
memory organization. (The 8th-bit was originally used for parity check in the early computers.)
 Code numbers 32D (20H) to 126D (7EH) are printable (displayable) characters as tabulated (arranged in
hexadecimal and decimal) as follows:
Hex 0 1 2 3 4 5 6 7 8 9 A B C D E F
2 SP ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~

Integer Addition and subtraction:

Addition:

The binary number system uses only two digits 0 and 1 due to which their addition is simple. There are four
basic operations for binary addition, as mentioned above.

0+0=0
0+1=1
1+0=1
1+1=10
The above first three equations are very identical to the binary digit number. The column by column
addition of binary is applied below in details. Let us consider the addition of 11101 and 11011.
The above sum is carried out by following step
1 + 1 = 10 = 0 with a carry of 1.
1+0+1 = 10 = 0 with a carry of 1
1+1+0 = 10 = 10 = 0 with a carry of 1
1+1+1= 10+1 = 11= 1 with a carry of 1
1 +1 +1 = 11

Rules for binary


addition :
Example for binary
Addition:
Binary subtraction rules:
Binary Subtraction Using 1’s Complement
The number 0 represents the positive sign
The number 1 represents the negative sign
Procedures for Binary Subtraction by 1’s Complement
Write the 1’s complement of the subtrahend
Then add the 1’s complement subtrahend with the minuend
If the result has a carryover, then add that carry over in the least significant bit
If there is no carryover, then take the 1’s complement of the resultant, and it is negative.
Binary Subtraction Questions Using 1’s Compement
Question 1:
(110101)2 – (100101)2
Solution:
(1 1 0 1 0 1)2 = 5310
(1 0 0 1 0 1)2 = 3710 – subtrahend
Now take the 1’s complement of the subtrahend and add with minuend.
1 carry
110101
(+) 0 1 1 0 1 0
——————
001111
1 carry
——————
010000
Therefore, the solution is 010000
(010000)2 = 1610

Unsigned multiplication:

Hardware Implementation :
Following components are required for the Hardware Implementation of multiplication algorithm :
1. Registers:
Two Registers B and Q are used to store multiplicand and multiplier respectively.
Register A is used to store partial product during multiplication.
Sequence Counter register (SC) is used to store number of bits in the multiplier.
2. Flip Flop:
To store sign bit of registers we require three flip flops (A sign, B sign and Q sign).
Flip flop E is used to store carry bit generated during partial product addition.
3. Complement and Parallel adder:
This hardware unit is used in calculating partial product i.e, perform addition required.

Flowchart of Multiplication:
1. Initially multiplicand is stored in B register and multiplier is stored in Q register.
2. Sign of registers B (Bs) and Q (Qs) are compared using XOR functionality (i.e., if both the signs are
alike, output of XOR operation is 0 unless 1) and output stored in As (sign of A register).

Note: Initially 0 is assigned to register A and E flip flop. Sequence counter is initialized with value n, n
is the number of bits in the Multiplier.

3. Now least significant bit of multiplier is checked. If it is 1 add the content of register A with
Multiplicand (register B) and result is assigned in A register with carry bit in flip flop E. Content of E A
Q is shifted to right by one position, i.e., content of E is shifted to most significant bit (MSB) of A and
least significant bit of A is shifted to most significant bit of Q.
4. If Qn = 0, only shift right operation on content of E A Q is performed in a similar fashion.
5. Content of Sequence counter is decremented by 1.
6. Check the content of Sequence counter (SC), if it is 0, end the process and the final product is present in
register A and Q, else repeat the process.

Example:
Multiplicand = 10111
Multiplier = 10011

Booth Multiplication algorithm for signed binary number:

Booth algorithm gives a procedure for multiplying binary integers in signed 2’s complement representation in
efficient way, i.e., less number of additions/subtractions required. It operates on the fact that strings of 0’s in the
multiplier require no addition but just shifting and a string of 1’s in the multiplier from bit weight 2^k to weight
2^m can be treated as 2^(k+1 ) to 2^m.

Hardware Implementation of Booths Algorithm – The hardware implementation of the booth algorithm requires
the register configuration shown in the figure below.
Booth’s Hardware implementation:

We name the register as A, B and Q, AC, BR and QR respectively. Qn designates the least significant bit of
multiplier in the register QR. An extra flip-flop Qn+1is appended to QR to facilitate a double inspection of the
multiplier.The flowchart for the booth algorithm is shown below.
AC and the appended bit Qn+1 are initially cleared to 0 and the sequence SC is set to a number n equal to the
number of bits in the multiplier. The two bits of the multiplier in Qn and Qn+1are inspected. If the two bits are
equal to 10, it means that the first 1 in a string has been encountered. This requires subtraction of the
multiplicand from the partial product in AC. If the 2 bits are equal to 01, it means that the first 0 in a string of
0’s has been encountered. This requires the addition of the multiplicand to the partial product in AC.

When the two bits are equal, the partial product does not change. An overflow cannot occur because the
addition and subtraction of the multiplicand follow each other. As a consequence, the 2 numbers that are added
always have a opposite signs, a condition that excludes an overflow. The next step is to shift right the partial
product and the multiplier (including Qn+1). This is an arithmetic shift right (ashr) operation which AC and QR
ti the right and leaves the sign bit in AC unchanged. The sequence counter is decremented and the
computational loop is repeated n times.

Example – A numerical example of booth’s algorithm is shown below for n = 4. It shows the step by step
multiplication of -5 and -7.

MD = -5 = 1011, MD = 1011, MD'+1 = 0101


MR = -7 = 1001
The explanation of first step is as follows: Qn+1
AC = 0000, MR = 1001, Qn+1 = 0, SC = 4
Qn Qn+1 = 10
So, we do AC + (MD)'+1, which gives AC = 0101
On right shifting AC and MR, we get
AC = 0010, MR = 1100 and Qn+1 = 1
OPERATION AC MR Qn+1 SC
0000 1001 0 4
AC + MD’ + 1 0101 1001 0
ASHR 0010 1100 1 3
AC + MR 1101 1100 1
ASHR 1110 1110 0 2
ASHR 1111 0111 0 1
AC + MD’ + 1 0010 0011 1 0

Product is calculated as follows:

Product = AC MR
Product = 0010 0011 = 35

Division algorithms:

Restoring division algorithms:


• A division algorithm provides a quotient and a remainder when we divide two number.
• They are generally of two types of division algorithm -
Hardware implementation:

Flow chart and example:


 Step-1: First the registers are initialized with corresponding values (Q = Dividend, M = Divisor, A = 0, n
= number of bits in dividend)
 Step-2: Then the content of register A and Q is shifted left as if they are a single unit
 Step-3: Then content of register M is subtracted from A and result is stored in A
 Step-4: Then the most significant bit of the A is checked if it is 0 the least significant bit of Q is set to 1
otherwise if it is 1 the least significant bit of Q is set to 0 and value of register A is restored i.e the value
of A before the subtraction with M
 Step-5: The value of counter n is decremented
 Step-6: If the value of n becomes zero we get of the loop otherwise we repeat from step 2
 Step-7: Finally, the register Q contain the quotient and A contain remainder

Examples:

Perform Division Restoring Algorithm


Dividend = 11
Divisor = 3
n M A Q Operation
4 00011 00000 1011 initialize
00011 00001 011_ shift left AQ
00011 11110 011_ A=A-M
00011 00001 0110 Q[0]=0 And restore A
3 00011 00010 110_ shift left AQ
00011 11111 110_ A=A-M
00011 00010 1100 Q[0]=0
2 00011 00101 100_ shift left AQ
00011 00010 100_ A=A-M
00011 00010 1001 Q[0]=1
1 00011 00101 001_ shift left AQ
00011 00010 001_ A=A-M
00011 00010 0011 Q[0]=1

Remember to restore the value of A most significant bit of A is 1. As that register Q contain the quotient, i.e. 3
and register A contain remainder 2.

Non Restoring division algorithms:


Hardware Implementation :

Flow chart:
Example:

Explanation with example:


Step-1: First the registers are initialized with corresponding values (Q = Dividend, M = Divisor, A = 0, n =
number of bits in dividend)
Step-2: Check the sign bit of register A
Step-3: If it is 1 shift left content of AQ and perform A = A+M, otherwise shift left AQ and perform A = A-M
(means add 2’s complement of M to A and store it to A)
Step-4: Again the sign bit of register A
Step-5: If sign bit is 1 Q[0] become 0 otherwise Q[0] become 1 (Q[0] means least significant bit of register Q)
Step-6: Decrements value of N by 1
Step-7: If N is not equal to zero go to Step 2 otherwise go to next step
Step-8: If sign bit of A is 1 then perform A = A+M
Step-9: Register Q contain quotient and A contain remainder

Examples: Perform Non_Restoring Division for Unsigned Integer

Dividend =11
Divisor =3
-M =11101
N M A Q Action
4 00011 00000 1011 Start
00001 011_ Left shift AQ
11110 011_ A=A-M
3 11110 0110 Q[0]=0
11100 110_ Left shift AQ
N M A Q Action
11111 110_ A=A+M
2 11111 1100 Q[0]=0
11111 100_ Left Shift AQ
00010 100_ A=A+M
1 00010 1001 Q[0]=1
00101 001_ Left Shift AQ
00010 001_ A=A-M
0 00010 0011 Q[0]=1
Quotient = 3 (Q)
Remainder = 2 (A)

FLOATING POINT ADDITION AND SUBTRACTION


FLOATING POINT ADDITION
To understand floating point addition, first we see addition of real numbers in decimal as same logic is applied
in both cases.
For example, we have to add 1.1 * 103 and 50.
We cannot add these numbers directly. First, we need to align the exponent and then, we can add significand.
After aligning exponent, we get 50 = 0.05 * 103
Now adding significand, 0.05 + 1.1 = 1.15
So, finally we get (1.1 * 103 + 50) = 1.15 * 103
Here, notice that we shifted 50 and made it 0.05 to add these numbers.
Now let us take example of floating point number addition
We follow these steps to add two numbers:

1. Align the significand


2. Add the significands
3. Normalize the result

Let the two numbers be


x = 9.75
y = 0.5625

Converting them into 32-bit floating point representation,


9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
0.5625’s representation in 32-bit format = 0 01111110 00100000000000000000000

Now we get the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10

Now, we shift the mantissa of lesser number right side by 4 units.


Mantissa of 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, we get 0.00010010000000000000000
Mantissa of 9.75 = 1. 00111000000000000000000

Adding mantissa of both


0. 00010010000000000000000
+ 1. 00111000000000000000000
————————————————-
1. 01001010000000000000000
In final answer, we take exponent of bigger number
So, final answer consist of :
Sign bit = 0
Exponent of bigger number = 10000010
Mantissa = 01001010000000000000000

32 bit representation of answer = x + y = 0 10000010 01001010000000000000000

FLOATING POINT SUBTRACTION

Subtraction is similar to addition with some differences like we subtract mantissa unlike addition and in sign bit
we put the sign of greater number.

Let the two numbers be


x = 9.75
y = – 0.5625

Converting them into 32-bit floating point representation


9.75’s representation in 32-bit format = 0 10000010 00111000000000000000000
– 0.5625’s representation in 32-bit format = 1 01111110 00100000000000000000000

Now, we find the difference of exponents to know how much shifting is required.
(10000010 – 01111110)2 = (4)10
Now, we shift the mantissa of lesser number right side by 4 units.
Mantissa of – 0.5625 = 1.00100000000000000000000
(note that 1 before decimal point is understood in 32-bit representation)
Shifting right by 4 units, 0.00010010000000000000000
Mantissa of 9.75= 1. 00111000000000000000000

Subtracting mantissa of both


0. 00010010000000000000000
– 1. 00111000000000000000000
————————————————
1. 00100110000000000000000

Sign bit of bigger number = 0


So, finally the answer = x – y = 0 10000010 00100110000000000000000

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy