0% found this document useful (0 votes)
13 views

485 COURSES Computer Architecture I 14103

The document outlines a course on Computer Architecture I at Olabisi Onabanjo University, focusing on fundamental concepts of computer organization and architecture, including CPU design, memory systems, and input/output devices. It covers key topics such as the von Neumann architecture, machine cycles, and various types of micro-operations. The course aims to equip students with knowledge about hardware design and the interaction of functional modules to meet user processing needs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

485 COURSES Computer Architecture I 14103

The document outlines a course on Computer Architecture I at Olabisi Onabanjo University, focusing on fundamental concepts of computer organization and architecture, including CPU design, memory systems, and input/output devices. It covers key topics such as the von Neumann architecture, machine cycles, and various types of micro-operations. The course aims to equip students with knowledge about hardware design and the interaction of functional modules to meet user processing needs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Olabisi Onabanjo University, Ago Iwoye.

Ogun State. Nigeria.


Faculty of Science
Department of Mathematical Sciences

BSc- Lecture Note: COMPUTER ARCHITECTURE I

Objectives: This course will introduce students to the fundamental concepts underlying modern
computer organization and architecture. The emphasis is on studying and analysing fundamental
issues in architecture design and their impact on performance. To familiarize the students with
the hardware design including basic structure and behaviour of the various functional modules of
the computer and how they interact to provide the processing needs of the user.

Course Outlines:
 Fundamentals of Computer Architecture: Basic computer architecture , types, organizat ion
of the von Neumann machine, Instruction formats, fetch/execute cycle, instruction
d e c o d i n g a n d e xe c u t i o n , registers and register files, Instruction types and
addressing modes.
 Processor System Design: The CPU interface clock, control, data and address buses
Address decoding and memory interfacing, Basic parallel and serial interfaces, pipeline,
CISC and RISC
 Memory System Organization and Architecture : Memory systems hierarchy, Main
memory organization and its characteristics and performance, Latency, cycle time,
bandwidth, and interleaving, Cache memories

FUNDAMENTALS OF COMPUTER ARCHITECTURE

The Basic Computer Architectures


Computer is an electronic machine that performs any task very easily. In computer, the Central
Processing Unit (CPU) executes each instruction provided to it, in a series of steps, this series of
steps is called Machine Cycle , and is repeated for each instruction. One machine cycle
involves fetching of instruction, decoding the instruction, transferring the data, executing the
instruction.
The main components in a typical computer system are the Processor, Memory, Input/Output
Devices and the communication channels that connect them.

 The Processor: It is the workhorse of the system; it is the component that executes a
program by performing Arithmetic and Logical operations on data. It is the only
component that creates new information by combining or modifying current information.
In a typical system there will be only one processor, known as the Central Processing
Unit (CPU). Modern high performance systems such as vector processors and parallel
processors often have more than one processor. Systems with only one processor are
serial processors especially among computational scientists, scalar processors.

 Memory: It is a passive component that simply stores information unt il it is requested by


another part of the system. During normal operations it feeds instructions and data to the
processor and at other times. It is the source or destination of data transferred by I/O
devices. Information in a memory is accessed by its address. In programming language
terms, one can view memory as a one-dimensional array M. A processor's request to the
memory might be ``send the instruction at location M [1000]'' or a disk controller's
request might be ``store the following block of data in locations M[0] through M[255].

 Input/Output (I/O) Devices: This transfer information without altering it between the
external world and one or more internal components. I/O devices can be secondary
memories, for example disks and tapes, or devices used to communicate directly with
users, such as video displays, keyboards, and mouse.

 The Communication Channels: This ties the system together which can either be simple
links that connect two devices or more complex switches that interconnect several
components. This allows them to communicate at a given point in time. When a switch is
configured to allow two devices to exchange information, all other devices that rely on
the switch are blocked, i.e. they must wait until the switch can be reconfigured.

Figure 1: Block Diagram of a Computer System

What is Computer Architecture?

Computer Architecture is a set of rules and methods that describe the functionality, organisation
and implementation of computer systems. Some definitions of architecture define it as describing
the capabilities and programming model of a computer but not a particular implementation.

Computer Architecture can also be defined as the design methodology of how computer hardware
components interact based on the challenges imposed by real world components and technology
and also depends on the current market demands.

Building an architecture deals with the materials (components, subsystems) at hand and many
levels of detail are required to completely specify a given implementation, however, a very good
example is von Neumann architecture.
which is still used by most types of computers today. This was proposed by the mathematician
John von Neumann in 1945. It describes the design of an electronic computer with its CPU,
which includes the arithmetic logic unit, control unit, registers, memory for data and instructions,
an input/output interface and external storage functions.

Von Neumann architecture is based on the stored-program computer concept, where instruction
data and program data are stored in the same memory. This design is still used in most computers
produced today are shown in the figure 2.

Fig. 2: Von Neumann Architecture

 Central Processing Unit (CPU)


At the heart of any computer, modern or early, is a circuit called an Arithmetic Logic Unit
(ALU). It is comprised of a few simple operations which can be done very quickly. This along
with a small amount of memory running at processor speed called registers which make up what
is known as the Central Processing Unit (CPU). This executes the instruction of a computer
program. It is sometimes referred to as the microprocessor or processor.

A CPU is not very useful unless there is some wa y to communicate to it, and receive information
back from it. This is usually known as a Bus. The Bus is the input/output (I/O) gateway for the
CPU. The primary area with which the CPU communicates with its system memory is what is
commonly known as Random Access Memory (RAM). Depending on the platform, the CPU may
communicate with other parts of the system or it may communicate just through memory. The
CPU contains the ALU, Control Unit and a variety of registers.

 Memory Unit
The memory unit consists of RAM, sometimes referred to as primary or main memory. Unlike a
hard drive (secondary memory), this memory is fast and also directly accessible by the CPU.
RAM is split into partition, each partition consists of an address and its contents (both in binary
form). The address will uniquely identify every location in the memory. Loading data from
permanent memory (hard drive), into the faster and directly accessible temporary memory (RAM)
allows the CPU to operate much quicker.

 Registers
Registers are high speed storage areas in the CPU. All data and instruction must be stored in a
register before it can be processed. A Register is a group of flip-flops with each flip-flop capable
of storing one bit of information. An n-bit register has a group of n flip-flops and is capable of
storing binary information of n-bits.
A register consists of a group of flip-flops and gates. The flip-flops hold the binary information
and gates control when and how new information is transferred into a register. Various types of
registers are available commercially. The simplest register is one that consists of only flip-flops
with no external gates. These days’ registers are also implemented as a register file such as:

 Register Load: The transfer of new information into a register is referred to as loading the
register. If all the bits of register are loaded simultaneously with a common clock pulse than
the loading is said to be done in parallel.

 Register Transfer Language: The symbolic notation used to describe the micro-operation
transfers amongst registers is called register transfer language. The term "register transfer"
means the availability of hardware logic circuits that can perform a stated micro-operation
and transfer the result of the operation to the same or another register. The word "language"
is borrowed from programmers who apply this term to programming languages. This
programming language is a procedure for writing symbols to specify a given computational
process.

Following are some commonly used registers:

 Accumulator: This is the most common register, used to store data taken out from the
memory.
 General Purpose Registers : This is used to store data intermediate results during program
execution. It can be accessed via assembly programming.
 Special Purpose Registers : Users do not access these registers. These registers are for
Computer system,
 MAR: Memory Address Register are those registers that holds the address for
memory unit.
 MBR: Memory Buffer Register stores instruction and data received from the
memory and sent from the memory.
 PC: Program Counter points to the next instruction to be executed.
 IR: Instruction Register holds the instruction to be executed.
Table 1: Types of Register
Memory Address Register Holds the memory location of data that need
(MAR) to be accessed
Memory Data Register Holds data that is being transferred to or
(MDR) from memory

Accumulator (ACC) Where intermediate arithmetic and logic


results are stored
Program Counter Current Contains the address of the next instruction
Instruction Register to be executed
Current Instruction Register Contains the current instruction during
(CIR) processing
Micro-Operations
The operations executed on data stored in registers are called micro-operations. A micro-
operation is an elementary operation performed on the information stored in one or more
registers.
Example: Shift, count, clear and load.

Types of Micro-Operations
The micro-operations in digital computers are of 4 types:

1. Register Transfer Micro-Operations: Ttransfer binary information from one register to


another.
2. Arithmetic Micro-Operations: Perform arithmetic operations on numeric data stored in
registers.
3. Logic Micro-Operations: Perform bit manipulation operation on non-numeric data stored
in registers.
4. Shift Micro-Operations: Perform shift micro-operations performed on data.

 Register Transfer Micro-Operations


Information transferred from one register to another is designated in symbolic form by means
of replacement operator.
R2 ← R1; It denotes the transfer of the data from register R1 into R2.
Normally we want the transfer to occur only in predetermined control condition. This can be
shown by following if-then statement: if (P=1) then (R2 ← R1)
P is a control signal generated in the control section.
Control Function
A control function is a Boolean variable that is equal to 1 or 0. The control function is shown
as:
P: R2 ← R1
The control condition is terminated with a colon. It shows that transfer operation can be
executed only if P=1.

 Arithmetic Micro-Operations:
This allows arithmetic (add, subtract etc.) and logic (AND, OR, NOT etc.) operations to be
carried out. Some of the basic micro-operations are addition, subtraction, increment and
decrement.
a) Add Micro-Operation
It is defined by the following statement:
R3  R1  R2
The above statement instructs the data or contents of register R1 to be added to data or content
of register R2 and the sum should be transferred to register R3.

b) Subtract Micro-Operation
Let us again take an example:
R3  R1  R2  1 In subtract micro-operation, instead of using minus operator we take 1's
compliment and add 1 to the register which gets subtracted, i.e R1  R2 is equivalent
to R3  R1  R2  1

c) Increment/Decrement Micro-Operation
Increment and decrement micro-operations are generally performed by adding and subtracting
1 to and from the register respectively.
R1  R1  1
R1  R1  1

Table 2: The Symbolic Description

Symbolic Designation Description


R3 ← R1 + R2 Contents of R1+R2 transferred to R3.
R3 ← R1 - R2 Contents of R1-R2 transferred to R3.
R2 ← (R2)' Compliment the contents of R2.
R2 ← (R2)' + 1 2's compliment the contents of R2.
R3 ← R1 + (R2)' + 1 R1 + the 2's compliment of R2 (subtraction).
R1 ← R1 + 1 Increment the contents of R1 by 1.
R1 ← R1 - 1 Decrement the contents of R1 by 1.

 Logic Micro-Operations
These are binary micro-operations performed on the bits stored in the registers. These
operations consider each bit separately and treat them as binary variables.
Let us consider the X-OR micro-operation with the contents of two registers R1 and R2.

In the above statement we have also included a Control Function.


Assume that each register has 3 bits. Let the content of R1 be 010 and R2 be 100. The X-OR
micro-operation will be:
010  R1
100  R 2
110  R1
After P=1

 Shift Micro-Operations
These are used for serial transfer of data. That means we can shift the contents of the register to
the left or right. In the shift left operation the serial input transfers a bit to the right most position
and in shift right operation the serial input transfers a bit to the left most position.
There are three types of shifts as follows:

a) Logical Shift
It transfers 0 through the seria l input. The symbol "shl" is used for logical shift left
and "shr" is used for logical shift right.
The register symbol must be same on both sides of arrows.

b) Circular Shift
This circulates or rotates the bits of register ar ound the two ends without any loss of data or
contents. In this, the serial output of the shift register is connected to its serial
input. "cil" and "cir" is used for circular shift left and right respectively.

c) Arithmetic Shift
This shifts a signed binary number to left or right. An arithmetic shift left multiplies a signed
binary number by 2 and shift left divides the number by 2. Arithmetic shift micro-operation
leaves the sign bit unchanged because the signed number remains same when it is multiplied
or divided by 2.

 Arithmetic and Logic Unit (ALU)


The ALU allows arithmetic (add, subtract etc.) and logic (AND, OR, NOT etc.) operations to be
carried out.

 Control Unit (CU)


The control unit controls the operation of the computer’s ALU, memory and input/ output
devices, inform them how to respond to the program instructions that has been read and
interpreted from the memory unit. The control unit also provides the timing and control
signals required by other computer components.

 Buses
Buses are the means by which data is transmitted from one part of a computer to another,
connecting all major internal components to the CPU and memory. A standard CPU system
bus is comprised of a a control bus, data bus and address bus.

Table 3: The Buses

Address Bus Carries the addresses of data (but not the data)
between the processor and memory
Data Bus Carries data between the processor, the memory unit
and the input/output devices

Control Bus Carries control signals/commands from the CPU


(and status signals from other devices) in
order to control and coordinate all the activities
within the computer
Addressing Modes and Instruction Se t

Addressing Mode
Addressing modes are the ways how architectures specify the address of an object they want to
access. In machines, an addressing mode can specify a constant, a register or a location in
memory.

The operation field of an instruction specifies the operation to be performed. This operation will
be executed on some data which is stored in computer registers or the main memory. The way
any operand is selected during the program execution is dependent on the addressing mode of the
instruction. The purpose of using addressing modes is as follows:
 To give the programming versatility to the user.
 To reduce the number of bits in addressing field of instruction.

Types of Addressing Modes


 Immediate Mode
In this mode, the operand is specified in the instruction itself. An immediate mode instruction has
an operand field rather than the address field. For example: ADD 7, which say Add 7 to contents
of accumulator. 7 is the operand here.

 Register Mode
In this mode the operand is stored in the register and this register is present in CPU. The
instruction has the address of the Register where the operand is stored.

Advantages of this mode:


• Shorter instructions and faster instruction fetch.
• Faster memory access to the operand(s)

Disadvantages of this mode:


• Very limited address space
• Using multiple registers helps performance but it complicates the instructions.

 Register Indirect Mode


In this mode, the instruction specifies the register whose contents give us the address of operand
which is in memory. Thus, the register contains the address of operand rather than the operand
itself.
 Direct Addressing Mode
In this mode, effective address of operand is present in instruction itself.
For Example: ADD R1, 4000 - In this the 4000 is effective address of operand.
NOTE: Effective Address is the location where operand is present.
• Single memory reference to access data.
• No additional calculations to find the effective address of the operand.

 Indirect Addressing Mode


In this, the address field of instruction gives the address where the effective address is stored in
memory. This slows down the execution, as this includes multiple memory lookups to find the
operand.

 Displacement Addressing Mode


In this the contents of the indexed register is added to the Address part of the instruction, to
obtain the effective address of operand.
EA = A + (R) in this, the address field holds two values, A (which is the base value) and R (that
holds the displacement), or vice versa.
 Relative Addressing Mode
It is a version of Displacement addressing mode. In this the contents of PC (Program Counter) is
added to address part of instruction to obtain the effective address.
EA = A + (PC), where EA is effective address and PC is program counter.
The operand is A cells away from the current cell (the one pointed to by PC)

 Base Register Addressing Mode


It is again a version of Displacement addressing mode. This can be defined as EA = A + (R),
where A is displacement and R holds pointer to base address.

 Stack Addressing Mode


In this mode, operand is at the top of the stack. For example: ADD, this instruction will POP top
two items from the stack, add them, and will then PUSH the result to the top of the stack.

Auto Increment/Decrement Mode


In this the register is incremented or decremented after or before its value is used.

Table 4: Summary of the Addressing Mode


The most common names for addressing modes (names may differ among architectures)
Addressing Example Meaning When used
modes Instruction
Register Add R4,R3 R4 <- R4 + R3 When a value is in a register
Immediate Add R4, #3 R4 <- R4 + 3 For constants
Displacement Add R4, R4 <- R4 + M[100+R1] Accessing local variables
100(R1)
Register Add R4,(R1) R4 <- R4 + M[R1] Accessing using a pointer or a
deffered computed address
Indexed Add R3, (R1 + R3 <- R3 + M[R1+R2] Useful in array addressing:
R2) R1 - base of array
R2 - index amount
Direct Add R1, R1 <- R1 + M[1001] Useful in accessing static data
(1001)
Memory Add R1, R1 <- R1 + M[M[R3]] If R3 is the address of a pointer p ,
deferred @(R3) then mode yields *p
Auto- Add R1, (R2)+ R1 <- R1 +M[R2] Useful for stepping through arrays
increment R2 <- R2 + d in a loop.
R2 - start of array
d - size of an element
Auto- Add R1,-(R2) R2 <-R2-d Same as autoincrement.
decrement R1 <- R1 + M[R2] Both can also be used to imple ment
a stack as push and pop
Scaled Add R1, R1<- Used to index arrays. May be
100(R2)[R3] R1+M[100+R2+R3*d] applied to any base addressing
mode in some machines.

Instruction Codes
While a Program is a set of instructions that specify the operations, operands, and the sequence
by which processing has to occur. An instruction code is a group of bits that tells the computer to
perform a specific operation part.

 Operation Code
The operation code of an instruction is a group of bits that define operations such as add, subtract,
multiply, shift and compliment. The number of bits required for the operation code depends upon
the total number of operations available on the computer. The operation code must consist of at
least n bits for a given 2^n operations. The operation part of an instruction code specifies the
operation to be performed.

 Register Part
The operation must be performed on the data stored in registers. An instruction code therefore
specifies not only operations to be performed but also the registers where the operands (data) will
be found as well as the registers where the result has to be stored.

 Stored Program Organisation


The simplest way to organize a computer is to have Processor Register and instruction code with
two parts. The first part specifies the operation to be performed and second specifies an address.
The memory address tells where the operand in memory will be found. Instructions are stored in
one section of memory and data in another.

Computers with a single processor register are known as Accumulator (AC). The operation is
performed with the memory operand and the content of AC.

 Common Bus System


The basic computer has 8 registers, a memory unit and a control unit. Paths must be provided to
transfer data from one register to another. An efficient method for transferring data in a system is
to use a Common Bus System. The output of registers and memory are connected to the common
bus.
 Load(LD)
The lines from the common bus are connected to the inputs of each register and data inputs of
memory. The particular register whose LD input is enabled receives the data from the bus during
the next clock pulse transition.
Before studying about instruction formats lets first study about the operand address parts.
When the 2nd part of an instruction code specifies the operand, the instruction is said to
have immediate operand. And when the 2nd part of the instruction code specifies the address of
an operand, the instruction is said to have a direct address. And in indirect address, the 2nd part
of instruction code, specifies the address of a memory word in which the address of the operand
is found.

COMPUTER INSTRUCTIONS
The basic computer has three instruction code formats. The Operation Code (opcode) part of the
instruction contains 3 bits and remaining 13 bits depends upon the operation code encountered.

There are Three Types Of Instruction Formats:

Three-Address Instructions
Computers with three-address instruction formats can use each address field to specify either a
processor register or a memory operand. The program in assembly language that evaluates X =
(A + B) ∗ (C + D) is shown below, together with comments that explain the register transfer
operation of each instruction.

ADD R1, A, B R1 ← M [A] + M [B]


ADD R2, C, D R2 ← M [C] + M [D]
MUL X, R1, R2 M [X] ← R1 ∗ R2

It is assumed that the computer has two processor registers, R1 and R2. The symbol M [A]
denotes the operand at memory address symbolized by A.
The advantage of the three-address format is that it results in short programs when evaluating
arithmetic expressions. The disadvantage is that the binary-coded instructions require too many
bits to specify three addresses. An example of a commercial computer that uses three-address
instructions is the Cyber 170. The instruction formats in the Cyber computer are restricted to
either three register address fields or two register address fields and one memory address field.

Two-Address Instructions
Two address instructions are the most common in commercial computers. Here again each
address field can specify either a processor register or a memory word. The program to evaluate
X = (A + B) ∗ (C + D) is as follows:
MOV R1, A R1 ← M [A]
ADD R1, B R1 ← R1 + M [B]
MOV R2, C R2 ← M [C]
ADD R2, D R2 ← R2 + M [D]
MUL R1, R2 R1 ← R1∗R2
MOV X, R1 M [X] ← R1
The MOV instruction moves or transfers the operands to and from memory and processor
registers. The first symbol listed in an instruction is assumed to be both a source and the
destination where the result of the operation is transferred.
One-Address Instructions
One-address instructions use an implied accumulator (AC) register for all data manipulation. For
multiplication and division there is a need for a second register. However, here we will neglect
the second and assume that the AC contains the result of tall operations. The program to evaluate
X = (A + B) ∗ (C + D) is
LOAD A AC ← M [A]
ADD B AC ← A [C] + M [B]
STORE T M [T] ← AC
LOAD C AC ← M [C]
ADD D AC ← AC + M [D]
MUL T AC ← AC ∗ M [T]
STORE X M [X] ← AC
All operations are done between the AC register and a memory operand. T is the address of a
temporary memory location required for storing the intermediate result.

Zero-Address Instructions
A stack-organized computer does not use an address field for the instructions ADD and MUL.
The PUSH and POP instructions, however, need an address field to specify t he operand that
communicates with the stack. The following program shows how X = (A + B) ∗ (C + D) will be
written for a stack organized computer. (TOS stands for top of stack)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ← (A + B)
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ← (C + D)
MUL TOS ← (C + D) ∗ (A + B)
POP X M [X] ← TOS
To evaluate arithmetic expressions in a stack computer, it is necessary to convert the expression
into reverse Polish notation. The name “zero-address” is given to this type of computer because
of the absence of an address field in the computational instructions.

There are three types of formats:

 Memory Reference Instruction


It uses 12 bits to specify the address and 1 bit to specify the addressing mode (I). I is equal
to 0 for direct address and 1 for indirect address.

 Register Reference Instruction


These instructions are recognized by the opcode 111 with a 0 in the left most bit of instruction.
The other 12 bits specify the operation to be executed.

 Input-Output Instruction
These instructions are recognised by the operation code 111 with a 1 in the left most bit of
instruction. The remaining 12 bits are used to specify the input-output operation.
Format of Instruction
Basic fields of an instruction format are given below:

 An operation code field that specifies the operation to be performed.


 An address field that designates the memory address or register.
 A mode field that specifies the way the operand of effective address is determined.
Computers may have instructions of different lengths containing varying number of addresses.
The number of address field in the instruction format depends upon the internal organization of
its registers.

Instruction Cycle
An instruction cycle, also known as fetch-decode-execute cycle is the basic operational process of
a computer. This process is repeated continuously by CPU from boot up to s hut down of
computer.

Following are the steps that occur during an instruction cycle:


 Fetch the Instruction
The instruction is fetched from memory address that is stored in Program Counter (PC) and
stored in the Instruction Register (IR). At the end of the fetch operation, PC is incremented by 1
and it then points to the next instruction to be executed.
 Decode the Instruction
The instruction in the IR is executed by the decoder.
 Read the Effective Address
If the instruction has an indirect address, the effective address is read from the memory.
Otherwise operands are directly read in case of immediate operand instruction.
 Execute the Instruction
The Control Unit passes the information in the form of control signals to the functional unit of
CPU. The result generated is stored in main memory or sent to an output device.

The cycle is then repeated by fetching the next instruction. Thus in this way the instruction cycle
is repeated continuously.
Figure 3: Instruction Cycle

CISC and RISC ARCHITECTURES

What is CISC and RISC?

Central Processing Unit Architecture operates the capacity to work from “Instruction Set
Architecture” to where it was designed. There are 2 types of concepts to implement the processor
hardware architecture. The architectural designs of CPU are:

 RISC (Reduced Instruction Set Computing)


 CISC (Complex Instruction Set Computing)
Complex instruction set computing (CISC) has the ability to execute addressing modes or multi-
step operations within one instruction set. It is the design of the CPU where one instruction
performs many low-level operations. For example, memory storage, an arithmetic operation and
loading from memory. RISC is a CPU design strategy based on the insight that simplified
instruction set gives higher performance whe n combined with a microprocessor architecture
which has the ability to execute the instructions by using some microprocessor cycles per
instruction. Hardware of the Intel is termed as Complex I nstruction Set Computer (CISC) while
Apple hardware is Reduced Instruction Set Computer (RISC).

Hardware designers invent numerous technologies and tools to implement the desired architecture
in order to fulfil these needs. Hardware architecture may be implemented to be either hardware
specific or software specific, but according to the application both are used in the required
quantity.
Figure 4: RISC and CISC Architecture

 CISC Architecture

The CISC approach attempts to minimize the number of instructions per program, sacrificing the
number of cycles per instruction. Computers based on the CISC architecture are designed to
decrease the memory cost. Because, the large programs need more storage, thus increasing the
memory cost and large memory becomes more expensive. To solve these problems, the number
of instructions per program can be reduced by embedding the number of operations in a single
instruction, thereby making the instructions more complex.

Figure 5: CISC Architecture

MUL loads two values from the memory into separate registers in CISC. CISC uses minimum
possible instructions by implementing hardware and executes operations.

Instruction Set Architecture is a medium to permit communication between the programmer and
the hardware. Data execution part, copying of data, deleting or editing is the user commands used
in the microprocessor and with this microprocessor the Instructio n set architecture is operated.
The main instruction used in the above Instruction Set Architecture are as below

 Instruction Set:

Group of instructions given to execute the program and they direct the computer by manipulating
the data. Instructions are in the form – Opcode (operational code) and Operand. Where, opcode is
the instruction applied to load and store data, etc. The operand is a memory register where
instruction applied.

 Addressing Modes:

Addressing modes are the manner in the data is accessed. Depending upon the type of instruction
applied, addressing modes are of various types such as direct mode where straight data is
accessed or indirect mode where the location of the data is accessed. Processors having identical
ISA may be very different in organization. A processor with identical ISA and nearly identical
organization is still not nearly identical.
CPU performance is given by the fundamental law

Thus, CPU performance is dependent upon Instruction Count, CPI (Cycles per instruction) and
Clock cycle time. And all three are affected by the instruction set architecture.

 Instruction Count of the CPU

This underlines the importance of the instruction set architecture. There are two prevalent
instruction set architectures

Examples of CISC PROCESSORS


 IBM 370/168 – It was introduced in the year 1970. CISC design is a 32 bit processor and
four 64-bit floating point registers.
 VAX 11/780 – CISC design is a 32-bit processor and it supports many numbers of
addressing modes and machine instructions which is from Digital Equipment
Corporation.
 Intel 80486 – It was launched in the year 1989 and it is a CISC processor, which has
instructions varying lengths from 1 to 11 and it will have 235 instructions.

Characteristics of CISC Architecture

i. Instruction-decoding logic will be Complex.


ii. One instruction is required to support multiple addressing modes.
iii. Less chip space is enough for general purpose registers for the instructions that are
operated directly on memory.
iv. Various CISC designs are set up two special registers for the stack pointer, handling
interrupts e.t.c.
v. MUL is referred to as a “complex instruction” and requires the programmer for storing
functions.
 RISC Architecture

RISC (Reduced Instruction Set Computer) is used in portable devices due to its power efficiency.
RISC is a type of microprocessor architecture that uses highly-optimized set of instructions. RISC
does the opposite by reducing the cycles per instruction at the cost of the n umber of instructions
per program Pipelining is one of the unique features of RISC. It is performed by overlapping the
execution of several instructions in a pipeline fashion.

It has a high performance advantage over CISC. RISC processors take simple instructions and are
executed within a clock cycle
Example: Apple iPod and Nintendo DS.

Figure 6: RISC Architecture


RISC Architecture Characteristics

i. Simple Instructions are used in RISC architecture.


ii. RISC helps and supports few simple data types and synthesizes complex data types.
iii. RISC utilizes simple addressing modes and fixed length instructions for pipelining.
iv. RISC permits any register to use in any context.
v. One Cycle Execution Time
vi. The amount of work that a computer can perform is reduced by separating “LOAD” and
“STORE” instructions.
vii. RISC contains Large Number of Registers in order to prevent various numbers of
interactions with memory.
viii. In RISC, P ipelining is easy as the execution of all instructions will be done in a uniform
interval of time i.e. one click.
ix. In RISC, more RAM is required to store assembly level instructions.
x. Reduced instructions need a less number of transistors in RISC.
xi. RISC uses Harvard memory model means it is Harvard Architecture.
xii. A compiler is used to perform the conversion operation means to convert a high-level
language statement into the code of its form.
 RISC & CISC Comparison

MUL instruction is divided into three instructions


“LOAD” – moves data from the memory bank to a register
“PROD” – finds product of two operands located within the registers
“STORE” – moves data from a register to the memory banks
The main difference between RISC and CISC is the number of instructions and its complexity.

Figure 7: Architecture of RISC Vs CISC

 Semantic Gap
Both RISC and CISC architectures have been developed as an attempt to cover the semantic gap.
With an objective of improving efficiency of software development, several
powerful programming languages have come up, viz., Ada, C, C++, Java, etc. They provide a
high level of abstraction, conciseness and power. By this evolution the semantic gap grows. To
enable efficient compilation of high level language programs, CISC and RISC designs are the two
options.

CISC designs involve very complex architectures, including a large number of instructions and
addressing modes, whereas RISC designs involve simplified instruction set and adapt it to the real
requirements of user programs.

Figure 8: CISC and RISC Design

The Advantages and Disadvantages of RISC and CISC

 The Advantages of RISC architecture


i. RISC(Reduced instruction set computing)architecture has a set of instructions, so high-
level language compilers can produce more efficient code
ii. It allows freedom of using the space on microprocessors because of its simplicity.
iii. Many RISC processors use the registers for passing arguments and holding the local
variables.
iv. RISC functions use only a few parameters, and the RISC processors cannot use the call
instructions, and therefore, use a fixed length instruction which is easy to pipeline.
v. The speed of the operation can be maximized and the execution time can be minimized.
Very less number of instructional formats, a few numbers of instructions and a few
addressing modes are needed.

 The Disadvantages of RISC architecture


i. Mostly, the performance of the RISC processors depends on the programmer or compiler
as the knowledge of the compiler plays a vital role while changing the CISC code to a
RISC code
ii. While rearranging the CISC code to a RISC code, termed as a code expansion, will
increase the size. And, the quality of this code expansion will again depend on the
compiler, and also on the machine’s instruction set.
iii. The first level cache of the RISC processors is als o a disadvantage of the RISC, in which
these processors have large memory caches on the chip itself. For feeding the
instructions, they require very fast memory systems.

 Advantages of CISC architecture


i. Microprogramming is easy assembly language to implement, and less expensive than
hard wiring a control unit.
ii. The ease of microcoding new instructions allowed designers to make CISC machines
upwardly compatible:
iii. As each instruction became more accomplished, fewer instructions could be used to
implement a given task.

 Disadvantages of CISC architecture


i. The performance of the machine slows down due to the amount of clock time taken by
different instructions will be dissimilar
ii. Only 20% of the existing instructions is used in a typical programming event, even
though there are various specialized instructions in reality which are not even used
frequently.
iii. The conditional codes are set by the CISC instructions as a side effect of each instruction
which takes time for this setting – and, as the subsequent instruction changes the
condition code bits – so, the compiler has to examine the condition code bits before this
happens.

MEMORY ORGANIZATION
A memory unit is the collection of storage units or devices together. The memory unit stores the
binary information in the form of bits. Generally, memory/storage is classified into 2 categories:

 Volatile Memory: This loses its data, when power is switched off.
 Non-Volatile Memory: This is a permanent storage and does not lose any data when power
is switched off.

Memory Hierarchy

Figure 9: Memory Hierarchy


The total memory capacity of a computer can be visualized by hierarchy of components. The
memory hierarchy system consists of all storage devices contained in a computer system from the
slow Auxiliary Memory to fast Main Memory and to smaller Cache memory.

Auxiliary memory access time is generally 1000 times that of the main memory, hence it
is at the bottom of the hierarchy.
 The main memory occupies the central position because it is equipped to communicate
directly with the CPU and with auxiliary memory devices through Input/output processor
(I/O).
When the program not residing in main memory is needed by the CPU, they are brought in from
auxiliary memory. Programs not currently needed in main memory are transferred into au xiliary
memory to provide space in main memory for other programs that are currently in use.
 The cache memory is used to store program data which is currently being executed in the
CPU. Approximate access time ratio between cache memory and main memory is about 1
to 7~10

Memory Access Methods


Each memory type is a collection of numerous memory locations. To access data from any
memory, first it must be located and then the data is read from the memory location. Following
are the methods to access information from memory locations:

1. Random Access: Main memories are random access memories, in which each memory
location has a unique address. Using this unique address any memory location can be reached
in the same amount of time in any order.
2. Sequential Access: This method allows memory access in a sequence or in order.
3. Direct Access: In this mode, information is stored in tracks, with each track having a separate
read/write head.

1. Main Memory
The memory unit that communicates directly within the CPU, Auxillary me mory and Cache
memory, is called main memory. It is the central storage unit of the computer system. It is a large
and fast memory used to store data during computer operations. Main memory is made up
of RAM and ROM, with RAM integrated circuit chips holing the major share.
i. Random Access Memory (RAM):
 DRAM: Dynamic RAM, is made of capacitors and transistors, and must be refreshed
every 10~100 ms. It is slower and cheaper than SRAM.
 SRAM: Static RAM, has a six transistor circuit in each cell and retains data , until
powered off.
 NVRAM : Non-Volatile RAM, retains its data, even when turned off. Example: Flash
memory.

ii. Read Only Memory (ROM): is non-volatile and is more like a permanent storage for
information. It also stores the bootstrap loader program, to load and start the operating
system when computer is turned on. PROM (Programmable ROM), EPROM (Erasable
PROM) and EEPROM (Electrically Erasable PROM) are some commonly used ROMs.
2. Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For example: Magnetic disks
and tapes are commonly used auxiliary devices. Other devices used as auxiliary memory are
magnetic drums, magnetic bubble memory and optical disks. It is not directly accessible to the
CPU, and is accessed using the Input/Output channels.

3. Cache Memory
The data or contents of the main memory that are used again and again by CPU, are stored in the
cache memory so that we can easily access that data in shorter time.
Whenever the CPU needs to access memory, it first checks the cache memory. If the data is not
found in cache memory then the CPU moves onto the main memory. It also transfers block of
recent data into the cache and keeps on deleting the old data in cache to accomodate the new one.

 Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit ratio. When the
CPU refers to memory and finds the word in cache it is said to produce a hit. If the word is not
found in cache, it is in main memory then it counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit ratio.
Hit Ratio = Hit/(Hit + Miss)

4. Associative Memory
It is also known as content addressable memory (CAM). It is a memory chip in which each bit
position can be compared. In this the content is compared in each bit cell which allows very fast
table lookup. Since the entire chip can be compared, contents are randomly stored without
considering addressing scheme. These chips have less storage capacity than regular memory
chips.

5. Mapping and Concept of Virtual Memory


The transformation of data from main memory to cache memory is called mapping. There are 3
main types of mapping:

 Associative Mapping
 Direct Mapping
 Set Associative Mapping

6. Virtual Memory
Virtual memory is the separation of logical memory from physical memory. This separation
provides large virtual memory for programmers when only small physical memory is available.
Virtual memory is used to give programmers the illusion that they have a very large memory
even though the computer has a small main memory. It makes the task of programming easier
because the programmer no longer needs to worry about the amount of physical memory
available.
Parallel Processing and Data Transfer Modes in a Computer System
Instead of processing each instruction sequentially, a parallel processing system provides
concurrent data processing to increase the execution time. In this the system may have two or
more ALU's and should be able to execute two or more instructions at the same time. The
purpose of parallel processing is to speed up the computer processing capability and increase its
throughput.

NOTE: Throughput is the number of instructions that can be executed in a unit of time.
Parallel processing can be viewed from various levels of complexity. At the lowest level, we
distinguish between parallel and serial operations by the type of registers used. At the higher level
of complexity, parallel processing can be achieved by using multiple functional units that perform
many operations simultaneously.

 Pipelining

Pipelining is the process of accumulating instruction from the processor through a pipeline. It
allows storing and executing instructions in an orderly process. It is also known as pipeline
processing.
Pipelining is a technique where multiple instructions are overlapped during execution. P ipeline is
divided into stages and these stages are connected with one another to form a pipe like structure.
Instructions enter from one end and exit from another end.
Note: Pipelining increases the overall instruction throughput.
Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
In pipeline system, each segment consists of an input register followed by a combinational
circuit. The register is used to hold data and combinational circuit performs operations on it. The
output of combinational circuit is applied to the input register of the next segment.

Figure 10: Pipelining

Pipeline system is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, there are robotic arms to
perform a certain task, and then the car moves on ahead to the next arm.
Types of Pipeline
It is divided into 2 categories:
i. Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for float ing point
operations, multiplication of fixed point numbers etc. For example: The input to the Floating
Point Adder pipeline is:
X  A  2 a
Y  B  2 b
Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents.
The floating point addition and subtraction is done in 4 parts:

 Compare the exponents.


 Align the mantissas.
 Add or subtract mantissas
 Produce the result.
ii. Instruction Pipeline
In this a stream of instructions can be executed by overlapping fetch, decode and execute phases
of an instruction cycle. This type of technique is used to increase the throughput of the computer
system.
An instruction pipeline reads instruction from the memory while previous instructions are being
executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into
segments of equal duration.

 Pipeline Conflicts
There are some factors that cause the pipeline to deviate its normal performance. Some of these
factors are given below:
i. Timing Variations
All stages cannot take same amount of time. This problem generally occurs in instruction
processing where different instructions have different operand requirements and thus different
processing time.

ii. Data Hazards


When several instructions are in partial execution, and if they reference same data then the
problem arises. We must ensure that next instruction does not attempt to access data before the
current instruction, because this will lead to incorrect results.

 Pipeline Hazards
There are situations, called hazards that prevent the next instruction in the instruction stream from
being executing during its designated clock cycle. Hazards reduce the performance from the ideal
speedup gained by pipelining.
A hazard is created whenever there is a dependence between instruction and they are close
enough that the overlap caused by pipelining would change the order of access to an operand.

There are three classes of hazards:

1. Structural Hazards. They arise from resource conflicts when the hardware cannot support all
possible combinations of instructions in simultaneous overlapped execution.
2. Data Hazards. They arise when an instruction depends on the result of a previous
instruction in a way that is exposed by the overlapping of instructions in the pipeline.
3. Control Hazards. They arise from the pipelining of branches and other instructions that
change the PC.

i. Branching
In order to fetch and execute the next instruction, we must know what that instruction is. If the
present instruction is a conditional branch, and its result will lead us to the next instruction, then
the next instruction may not be known until the current one is processed.
ii. Interrupts
Interrupts set unwanted instructions into the instruction stream. Interrupts effect the execution of
instruction.

iii. Data Dependency


It arises when an instruction depends upon the result of a previous instruction but this result is not
yet available.

Principles of Pipelining Using DLX Architecture

The principles of pipelining will be described using DLX (DELUXE) and a simple version of its
pipeline. Those principles can be applied to more complex instruction sets than DLX, although
the resulting pipelines are more complex. It has simple pipeline architecture for CPU and
provides a good architectural model for study.

The architecture of DLX was chosen based on observations about most frequently used primitives
in programs. DLX provides a good architectural model for study, not only because of the recent
popularity of this type of machine, but also because it is easy to understand.
Like most recent load/store machines, DLX emphasizes

 A simple load/store instruction set


 Design for pipelining efficiency
 An easily decoded instruction set
 Efficiency as a compiler target

Registers for DLX

 32-bit general purpose registers


 32 floating-point registers (FPRs),
 32 single precision (32-bit) registers or Even-odd pairs holding double precision values.
Thus, the 64bit a few special registers can be transferred to and from the integer registers.

Data types for DLX

 For integer data: 8-bit bytes, 16-bit half words, 32-bit words
 For floating point: 32-bit single precision, 64-bit double precision
 The DLX operations work on 32-bit integers and 32- or 64-bit floating point. Bytes and
half words are loaded into registers with either zeros or the sign bit replicated to fill the
32 bits of the registers.
Memory

 Byte addressable
 32-bit address
 Two addressing modes (immediate and displacement). Register deferred and absolute
addressing with 16-bit field are accomplished
 Memory references are load/store between memory and FPRs and all memory accesses must
be aligned
 There are instructions for moving between a FPR and a
Instructions

 Instruction layout for DLX


 Complete list of instructions in DLX
 32 bits(fixed), must be aligned
Operations

There are four classes of instructions:


 Load/Store: Any of the s or FPRs may be loaded and stored except that loading R0
has no effect.
 ALU Operations: All ALU instructions are register-register instructions.
The operations are : add, subtract, AND, OR, XOR, shifts
 Compare instructions, two registers (=,!=,<,>,=<,=>).
If the condition is true, these instructions place a 1 in the destination register,
otherwise they place a 0.
 Branches/Jumps: All branches are conditional. The branch condition is specified
by the instruction, which may test the register source for zero or nonzero.
 Floating-Point Operations: add, subtract, multiply, divide

An Implementation of DLX

This un-pipelined implementation is not the most economical or the highest performance
implementation without pipelining. Instead, it is designed to lead naturally to a pipelined
implementation. Implementing the instruction set requires the introduction of several temporary
registers that are not part of the architecture. Every DLX instruction can be implemented in at
most five clock cycles. The five clock cycles are:

i. Instruction fetch cycle (IF)


ii. Instruction decode/register fetch (ID)
iii. Execution/Effective address cycle (EX)
iv. Memory access/branch completion cycle (MEM)
v. Write-back cycle (WB)

On each cycle the instruction will be process from IF to WB cycle (If "Cycle" is disabled then it
has no effect). If it appears like nothing has changed, it means that the cycle is not active for the
instruction type. Detailed description of each ia as follows:
Instruction fetch cycle (IF):
IR  MEM[PC]
NPC  PC +4

Operation:

• Send out the PC and fetch the instruction from memory into the instruction register (IR)
• Increment the PC by 4 to address the next sequential instruction
• The IR is used to hold the instruction that will be needed on subsequent clock cycles
• The NPC is used to hold the next sequential PC (program counter)

 Instruction decode/register fetch (ID):


A  Regs[IR6..10]
B  Regs[IR11..15]
Imm  ((IR16)16##IR16..31)
Operation:

i. Decode the instruction and access the register file to read the registers.
ii. The outputs of the general-purpose registers are read into two temporary registers (A
and B) for use in later clock cycles.
iii. The lower 16 bits of the IR are also sign-extended and stored into the temporary
register IMM, for use in the next cycle.
iv. Decoding is done in parallel with reading registers, which is possible because these
fields are at a fixed location in the DLX instruction format. This technique is known
as fixed-field decoding.

 Execution/Effective address cycle (EX):


The ALU operates on the operand prepared in the prior cycle, performing one of four functions
depending on the DLX instruction type

 Memory reference:
ALUOutput  A +Imm

Operation: The ALU adds the operands to form the effective address and places the result into
the register ALUOutput

Register-Register ALU instruction:

ALUOutput  A op B

Operation: The ALU performs the operation specified by the opcode on the value in register A
and on the value in register B. The result is placed in the register ALUOutput.

Register- Immediate ALU instruction:

ALUOutput  A op Imm
Operation: The ALU performs the operation specified by the opcode on the value in register A
and on the value in register Imm. The result is placed in the register ALUOutput.
Branch:

ALUOutput  NPC + Imm


Cond  ( A op 0 )

Operation:

• The ALU adds the NPC to the sign-extended immediate value in Imm to compute
the address of the branch target.
• Register A, which has been read in the prior cycle, is checked to determine
whether the branch is taken.
• The comparison operation op is the relational operator determined by the branch
opcode (e.g. op is "==" for the instruction BEQZ)

 Memory access/branch completion cycle (MEM):


The only DLX instructions active in this cycle are loads, stores, and branches.

Operation:
 Access memory if needed
 If the instruction is load , data returns from memory and is placed in the LMD (load
memory data) register
 If the instruction is store, data from the B register is written into memory.
 In either case the address used is the one computed during the prior cycle
and stored in the register ALUOutput

Branch:

 if (cond) PC  ALU Output else PC  NPC

Operation:
- If the instruction branches, the PC are replaced with branch destination address in the register
ALUOutput
- Otherwise, PC is replaced with the incremented PC in the register NPC

Memory reference:

LMD  Mem[ALUOutput] or Mem[ALUOutput]  B

 Write-back cycle (WB):


• Register-Register ALU instruction: Regs[IR16..20]  ALUOutput
• Register-Immediate ALU instruction: Regs[IR 11..15]  ALUOutput
Load instruction:

Regs[IR11..15]  LMD
Operation:
 Write the result into the register file, whether it comes from the memory(LMD) or from ALU
(ALUOutput)
 The register destination field is in one of two positions depending on the opcode
Limitations on practical depth of a pipeline arise from:

 Pipeline latency. The fact that the execution time of each instruction does not decrease
puts limitations on pipeline depth;
 Imbalance among pipeline stages. Imbalance among the pipe stages reduces
performance since the clock can run no faster than the time needed for the slowest
pipeline stage;
 Pipeline overhead. Pipeline overhead arises from the combination of pipeline register
delay (setup time plus propagation delay) and clock skew.

Once the clock cycle is as small as the sum of the clock skew and latch overhead, no further
pipelining is useful, since there is no time left in the cycle for useful work.

Example
1. Consider a non-pipelined machine with 6 execution stages of lengths 50ns, 50ns, 60ns,
60ns, 50ns, and 50 ns.
i. Find the instruction latency on this machine.
ii. How much time does it take to execute 100 instructions?

Solution:

Instruction latency = 50+50+60+60+50+50= 320 ns


Time to execute 100 instructions = 100*320 = 32000 ns
2. Suppose a pipelining is introduced on this machine. Assume that when introducing
pipelining, the clock skew adds 5ns of overhead to each execution stage.
i. What is the instruction latency on the pipelined machine?
ii. How much time does it take to execute 100 instructions?

Solution:
Remember that in the pipelined implementation, the length of the pipe stages must all be the
same, i.e., the speed of the slowest stage plus overhead. With 5ns overhead it comes to:

The length of pipelined stage = MAX(lengths of unpipelined stages) + overhead = 60 + 5 = 65 ns


Instruction latency = 65 ns
Time to execute 100 instructions = 65*6*1 + 65*1*99 = 390 + 6435 = 6825 ns

3. What is the speedup obtained from pipelining?

Solution:
Speedup is the ratio of the average instruction time without pipelining to the average instruction
time with pipelining.
Average instruction time not pipelined = 320 ns
Average instruction time pipelined = 65 ns
Speedup for 100 instructions = 32000 / 6825 = 4.69
REFERENCES

Abd-El-Barr, M . and El-Rewini, H. (2004) Pipelining Design Techniques, in Fundamentals of Computer Organization and
Architecture, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/0471478326.ch9

https://cs.nyu.edu/~gottlieb/courses/2000s/2006-07-spring/os2250/lectures/lecture-03.html
https://www.docsity.com/en/computer-architecture-and-organization-instruction-types-saritha/30725/
http://studylib.net/doc/6649432/cmp1203-computer-architecture-and-organization

https://www.google.com.ng/url?sa=t&source=web&rct=j&url=https://www.iare.ac.in/sites/default/file
s/PPT/CO%2520Lecture%2520Notes.pdf&ved=2ahUKEwiNl_2OsY3ZAhUNQMAKHdBmB_AQFj
AFegQICRAB&usg=AOvVaw0YMjE-M2IUxLJbD-g6jGsM

https://www.google.com.ng/url?sa=t&source=web&rct=j&url=https://www.elprocus.com/difference-
between-risc-and-cisc-architecture/&ved=2ahUKEwjioNK-
sY3ZAhVHDMAKHRneCeoQFjADegQIEBAB&usg=AOvVaw3q6GNzCAaQ9aCFzvMjV1yf

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy