-
May 7, 4:30 - 6:30 pm: Semester project presentation
-
May 7-12: Final
Exam to be taken online
-
May 12: Semester
Project Final Report due
CS385 Computer Architecture
Spring-2012
Classes: MW 4:30 pm - 5:45 pm, Nicolaus Copernicus Hall 22414
Instructor: Dr. Zdravko Markov, MS 307, (860)-832-2711, http://www.cs.ccsu.edu/~markov/,
e-mail: markovz at ccsu dot edu
Office hours: MW 9:30 am - 10:45 am, T 9:30 am - 12:00 pm, or
by appointment
Catalog description: The architecture of the computer is explored
by studying its various levels: physical level, operating-system level,
conventional machine level and higher levels. An introduction to microprogramming
and computer networking is provided.
Course Prerequisites: CS 354
Prerequisites by topic
-
Basic skills in software design and programming
-
Assembly language programming and basics of computer organization
-
Digital systems design
-
Boolean algebra and discrete mathematics
Course description
The course provides a comprehensive coverage of computer architecture.
It discusses the main components of the computer and the basic principles
of its operation. It demonstrates the relationship between the software
and the hardware and focuses on the foundational concepts that are the
basis for current computer design. The course is based on the MIPS processor,
a simple clean RISC processor whose architecture is easy to learn and understand.
The major topics covered in the course are the following:
-
MIPS instruction set
-
Computer arithmetic and ALU design
-
Datapath and control
-
Using Hardware Description Language to design and simulate the CPU
-
Pipelining
-
Memory hierarchy, caches and virtual memory
-
Interfacing CPU and peripherals, buses
-
Multiprocessors, networks of multiprocessors, parallel programming
-
Performance issues
Course goals and CS Department's objectives and learning outcomes
Upon successful completion of the course the student will be able to
-
Understand the fundamentals of different instruction set architectures
and their relationship to the CPU design.
-
Understand the principles and the implementation of computer arithmetic.
-
Understand the operation of modern CPUs including pipelining, memory systems
and busses.
-
Understand the principles of operation of multiprocessor systems.
-
Design and emulate a single cycle or pipelined CPU by given specifications
using HDL.
-
Write simple parallel programs.
-
Work in teams to design and implement CPUs.
-
Write reports and make presentations about their computer architecture
projects.
CS 385 is part of the core CS program and is designed in accordance with
the Departments general objectives and the programs educational outcomes
as specified in the Department
Mission Statement. It supports three of the general objectives with
the following outcomes:
-
Objective 1. Graduates will have a broad understanding of the fundamental
theories, concepts, and applications of computer science.
-
Outcome a: An ability to apply knowledge of computing and mathematics appropriate
to the discipline
-
Outcome b: An ability to analyze a problem, and identify and define the
computing requirements appropriate to its solution
-
Outcome c: An ability to design, implement, and evaluate a computer-based
system, process, component, or program to meet desired needs.
-
Objective 2. Graduates will be prepared for careers in computer science
and information technology.
-
Outcome i: An ability to use current techniques, skills, and tools necessary
for computing practice.
-
Outcome j: An ability to apply mathematical foundations, algorithmic principles,
and computer science theory in the modeling and design of computer-based
systems in a way that demonstrates comprehension of the tradeoffs involved
in design choices.
-
Objective 3. Graduates will communicate effectively, both orally and in
writing.
-
Outcome (d): An ability to function effectively on teams to accomplish
a common goal.
-
Outcome (f): An ability to communicate effectively.
Required textbook: David
A. Patterson and John
L. Hennessy, Computer Organization and Design: The Hardware/Software
Interface, Fourth Edition, Elsevier, 2008, ISBN: 978-0-12-374493-7.
Required software:
-
Icarus
Verilog: HDL compiler and simulator, available on the Patterson and
Hennessy's book CD or at http://bleyer.org/icarus/.
Note about installation: don't use folder names that include spaces (like
Program Files). Read book sections B.4 and 5.8 for using HDL.
-
Other simulators that may be used for drawing logic diagrams and experimenting
with small circuirs (note that the semester project should be done with
Verilog):
-
SPIM simulator:
A free software simulator for running MIPS R2000 assembly language programs
available for Unix, DOS, and Windows.
Semester project: There will be a semester project to build a simplified
MIPS machine. The projects will be done in teams of 2-3 people each and
will require three progress reports, a final report and a presentation.
The machine must be implemented in HDL Verilog, tested with a sample MIPS
program and properly documented. The progress and the final reports must
be submitted through the Blackboard Vista course management system available
through CentralPipeline
(Student > Blackboard Vista Courses > CS-385) or directly at https://vista.csus.ct.edu/webct/logon/29935382119061.
Class Participation: Active participation in class is expected
of all students. Regular attendance is also expected. If you must miss
a class, try to inform the instructor of this in advance.In case of missed
classes and work due to plausible reasons (such as illness or accidents)
limitted assistance will be offered. Unexcused absences will result in
the student being totally responsible for the make-up process.
Honesty policy: The CCSU honor code for Academic Integrity is
in effect in this class. It is expected that all students will conduct
themselves in an honest manner and NEVER claim work which is not their
own. Violating this policy will result in a substantial grade penalty,
and may lead to expulsion from the University. You may find it online at
http://web.ccsu.edu/academicintegrity/UndergradAcadMisconductPolicy.htm.
Please read it carefully.
Grading: Grading will be based on one programming assignment
(10%), a midterm test (20%, taken online), a final exam (25%, taken online)
and a semester project (45%, including progress reports, the final documentation,
and the presentation). The letter grades will be calculated according to
the following table:
| A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |
D+ |
D |
D- |
F |
| 95-100 |
90-94 |
87-89 |
84-86 |
80-83 |
77-79 |
74-76 |
70-73 |
67-69 |
64-66 |
60-63 |
0-59 |
Unexcused late submission policy: Submissions made more than
two days after the due date will be graded one letter grade down.
Submissions made more than a week late will receive two letter
grades down. No submissions will be accepted more than two weeks
after the due date.
Tentative schedule of classes and assignments
Note: Dates for classes, assignments and tests may change. The
lecture notes may also be updated. Check the schedule and the class pages
regularly for updates!
-
January 18: Introduction:
Computer Architecture = Instruction Set Architecture + Machine Organization
-
January 23: MIPS
Instructions: arithmetic, registers, memory, fecth&execute cycle
-
January 25: MIPS
Instructions: control and addressing modes
-
January 30: Computer
arithmetic and ALU design: representing numbers, arithmetic and logic operations
-
February 1: ALU
design: full adder, slt operation, HDL design, carry lookahead
-
February 6: Assignment 1 due (10 pts.)
-
February 6: ALU
design: multiplication, representing floating point numbers
-
February 8: The
Processor: Building a datapath
-
February 13: The
Processor: Control (single cycle approach)
-
February 15: Using
a Hardware Description Language to Design and Simulate the MIPS processor
-
February 22: Turing
machines
-
February 27: Introduction
to pipelining
-
February 29: Progress Report #1 due (10 pts.): A simpilfied single-cycle
datapath capable of executing the addi instruction and all R-typeinstructions.
See Semester
Project for details.
-
March 5: Solving
pipeline hazards
-
March 7: Implementing
pipeline datapath and control
-
March 12: Implementing
data and branch hazards control
-
March 14: Review
of Datapath, Control and Pipelining, HDL implementation
-
March 14-18: Midterm
Test (20 pts.) to be taken online in BB Vista
-
March 28: Progress Report #2 due (10 pts.): Complete single-cycle datapath.
See Semester
Project for details.
-
March 26: Memory
hierarchy
-
April 2: The
Basics of caches
-
April 4: Improving
cache performance
-
April 9: Virtual
Memory basics
-
April 11: Virtual
Memory optimization
-
April 16: Progress Report #3 due (10 pts.): 3-stage pipelined datapath
for addi and R-type instructions. See Semester
Project for details.
-
April 16: A
general framework of memory hierarchies
-
April 16: Interfacing
Processors and Peripherals - Buses
-
April 23: Interfacing
I/O devices to Memory, CPU and OS
-
April 25: Multiprocessors
-
April 30, May 2: Networks
of muiltiprocessors
-
May 12: Semester
Project Final Report due (10 pts.)
-
May 7-12: Final
Exam (25 pts.) to be taken online
-
May 7, 4:30 - 6:30 pm: Semester project presentation (5 pts.)
CS385 Computer Architecture, Lecture 1
Reading: Chapter 1
Topics: Introduction, Computer Architecture = Instruction Set
Architecture + Machine Organization.
Lecture
slides (PDF)
Lecture Notes
-
Levels of Abstraction
-
Computer Architecture = Instruction Set Architecture + Machine Organization
-
Instruction Set The Software Hardware Interface
-
Levels of Computer Architecture in More Depth
-
Software:
-
Application
-
Operating System
-
Firmware
-
Instruction Set Architecture:
-
Organization of Programmable Storage
-
Data type and Structures: encodings and machine representation
-
Instruction set
-
Instruction Formats
-
Addressing Modes and Accessing Data and Instructions
-
Exception Handling
-
Hardware:
-
Instruction Set Processing
-
I/O System
-
Digital Design
-
Circuit Design
-
Layout
-
Basic Components of a Computer
-
Processor: Datapath and Control
-
Memory
-
I/O
-
Computer Organization
-
Capabilities and Performance of the Basic Functional Units
-
The Way These Units are Interconnected
-
Information Flow between components
-
Information Flow Control
-
Performance (Lecture
slides (PDF)
-
Measuring and improving computer performance:
-
Program execution time
-
CPU time: user CPU time and system CPU time
-
Power
-
Evaluating computer systems:
-
Relative performance
-
Workload
-
Benchmarks
-
Multiprocessors and Parallelism
-
Single instruction stream, single data stream (SISD)
-
Single instruction stream, multiple data streams (SIMD)
-
Multiple instruction streams, single data stream (MISD)
-
Multiple instruction streams, multiple data streams (MIMD)
CS385 Computer Architecture, Lecture 2
Reading: Patterson & Hennessy - Sections 2.1 - 2.3, 2.5 - 2.7,
2.10, 2.13, 2.16 - 2.20, B.9, "Spim, pcspim, and xspim" in Section "Software"
on the CD.
Topics: MIPS instructions, arithmetic, registers, memory, fecth?execute
cycle
Lecture
slides (PDF)
Lecture Notes
-
Design goal: maximize performance and minimize cost. Primitive (low level)
and very restrictive instructions (fixed number and type of operands).
-
Design principles:
-
Simplicity favors regularity (uniform instruction format)
-
Smaller is faster (only 32 registers)
-
Good design demands a compromise (I-type instructions)
-
MIPS arithmetic: 3 operands, fixed order, registers only.
-
Using only registers: R-type instructions.
-
Registers: 32-bits long, conventions.
-
Memory organization: words and byte addressing.
-
Data transfer (load and store) instructions. Example: accessing array elements.
-
Translating C code into MIPS instructions the swap example.
-
Machine Language: instruction format, I-type (Immediate) format for data
transfer
-
Stored program concept: programs in memory, fetch?execute cycle
-
Von Neumann Architecture
-
CPU, Memory System, I/O system
-
Stored program concept: programs in memory, fetch?execute cycle
-
Instructions are executed sequentially
-
Turing machines
-
Non-Von Neumann Architecture:
-
Various parallel and multiprocessor architectures
-
In broader sense: NN, GA etc.
Exercises: Load this
program in the SPIM simulator and analyze the format of the insturctions.
Run the program with different values of X and Y and trace the execution
in step mode.
CS385 Computer Architecture, Lecture 3
Reading: Patterson & Hennessy - Chapter 2
Topics: MIPS Instructions: control and addressing modes
Lecture
slides (PDF)
Lecture Notes
-
Implementing the C code for if in MIPS: conditional branch.
-
Implementing the C code for ifelse in MIPS: unconditional branch
-
Simple for loop
-
Check for less-than: building a pseudoinstuction for branch if less-than.
-
Addressing in branch instructions: PC-relative and pseudodirect.
-
Constants: use of immediate addressing (constants as operands addi, slti,
andi, ori)).
-
32-bit constants manipulate upper 2 bytes separately (load upper immediate)
-
Summary of MIPS addressing: register (add), immediate (addi), base or displacement
(lw), PC-relative (bne), pseudodirect (j).
-
Alternative approaches: IA-32
CS385 Computer Architecture, Lecture 4
Reading: Patterson & Hennessy - Sections 3.1, 3.2, C.5 (CD).
Topics: Computer arithmetic and ALU design: representing numbers,
arithmetic and logic operations
Lecture
slides (PDF)
Lecture Notes
-
Representing numbers: sign bit, one's complement, two's complement.
-
Arithmetic: addition, subtraction, detecting overflow.
-
Logical operations: shift, and, or.
-
Basic ALU building components: and-gate, or-gate, inverter, multiplexor.
-
ALU for logical operations.
-
ALU for add, and, or.
-
Supporting subtraction
Exercises: Implement an overflow detection unit using only the CarryIn
and CarryOut bits of ALU-31
Tutorials and practice quizzes on twos complement numbers:
CS385 Computer Architecture, Lecture 5
Reading: Patterson & Hennessy - C.5 (CD).
Topics: ALU design: full adder, slt operation, HDL design, carry
lookahead
Lecture
slides (PDF)
Programs: 2-1-mux.vl,
4-bit-adder.vl,
more
examples of Verilog programs, ALU4.vl,
mips-alu.vl
Lecture Notes
-
Implementation of a full adder:
-
Carry out logic
-
Result logic: using 'and', 'or' and inverter and using xor-gate.
-
Supporting set on less-than (slt).
-
Test for equality (needed for branching)
-
Designing the ALU in Verilog
-
Carry Lookahead
Exercises: Implement the 4-bit adder with carry lookahead logic
in Verilog using the structural specification approach (gate-level modeling).
CS385 Computer Architecture, Lecture 6
Reading: Patterson & Hennessy - Sections 3.3, 3.5.
Topics: ALU design: multiplication, representing floating point
numbers
Lecture
slides (PDF)
Lecture Notes
-
Implementing multiplication:
-
Using 64-bit adder;
-
Using 32-bit adder for the upper 32-bit of the product;
-
Avoiding the use of the multiplier register.
-
Floating point numbers
-
Scientific notation: (-1)^sign * significand * 2^exponent
-
Range and precision (overflow and underflow).
-
IEEE 754 floating point standard - allows integer comparison:
-
normalized representation
-
implicit leading 1
-
exponent is biased: exponent in [0..0 (most negative), 1..1 (most positive)]
-
bits of the significand represent the fraction between 0 and 1.
-
(-1)^S * (1 + s1*2^-1 + s2*2^-2 + ...) * 2^(exponent-bias)
-
Problems with floating point arithmetic
Tutorials and practice quizzes on floating point numbers:
CS385 Computer Architecture, Lecture 7
Reading: Sections 4.1 - 4.3
Topics: The Processor, Building a Datapath
Lecture
slides (PDF)
Programs: mips-regfile.vl,
mips-r-type.vl,
mips-r-type+addi.vl
Lecture Notes
-
Abstract level implementation:
-
Instruction memory
-
Program counter
-
Register file
-
ALU
-
Data memory
-
Basic building elements
-
Combinational logic
-
State elements: D-lathes and D flip-flops
-
Clocking methodology: edge triggered
-
Fetching instructions and incrementing the program counter
-
Register file and execution of R-type instructions
-
Datapath for lw and sw instructions (add data memory and sign extend)
-
Datapath for branch instructions
Demo:
CS385 Computer Architecture, Lecture 8
Reading: Patterson & Hennessy - Section 4.4
Topics: Single-cycle control
Lecture
slides (PDF)
Programs: mips-r-type.vl,
mips-r-type+addi.vl,
mips-simple.vl
Lecture Notes
-
ALU control: mapping the opcode and function bits to the ALU control inputs
-
Designing the main control unit
-
Operation of the Datapath (single-cycle implementation):
-
R-type instructions
-
Load (store) word
-
Branching instructions
-
Problems of the single-cycle implementation
CS385 Computer Architecture, Lecture 11
Reading: Patterson & Hennessy - C.4 (CD).
Topic: Using Hardware Description Language to Design and Simulate
the MIPS processor
-
Behavior model of MIPS - single cycle implementation: mips-simple.vl
-
Project version (progress report #2). Changes needed:
-
Adjust the word size (ALU, registers, memory etc.) to 16 bit
-
Modify datapath control to reflect the instruction set architecture
-
Add addi and bne instructions
-
Write a complete program for testing and test the CPU
Exercises
-
Implement the addi instruction by:
CS385 Computer Architecture,
Lecture 12
Turing machines
-
Turing
machines
-
Alan
Turing web page
-
A
Turing Machine Applet
-
A real, physical Turing
machine, with video
CS385 Computer Architecture, Lecture 13
Reading: Patterson & Hennessy - Section 4.5
Topic: Introduction to Pipelining
Lecture
slides (PDF)
Lecture Notes
-
Pipelining by analogy (laundry example):
-
Pipelining helps throughput of the entire workload
-
Multiple tasks operating simultaneously and using different resources
-
Potential speedup = number of pipe stages
-
The pipeline rate is limited by the slowest stage
-
Unbalanced lengths of pipe stages reduces the speedup
-
The time to "fill" the pipeline and the time "drain" it reduces the speedup
-
Stall for dependencies
-
Five stages of the load MIPS instruction
-
The pipelined datapath
-
Single cycle, multiple cycle vs. pipeline
-
Advantages of pipelined execution
-
Problems with pipelining (pipeline hazards)
-
Structural hazards
-
Data hazards
-
Control hazards
CS385 Computer Architecture, Lecture 14
Reading: Patterson & Hennessy - Section 4.5, 4.6
Topic: Solving pipeline hazards, Designing a pipelined processor
Lecture
slides I (PDF)
Lecture
slides II (PDF)
Lecture Notes
-
Structural hazards: single memory
-
Control hazards:
-
Stall: wait until decision is clear
-
Predict: fixed prediction (e.g. fail), dynamic prediction (based on history)
-
Delayed brach (software solution):
add $4, $5, $6
beq $1, $2, $40
beq $1, $2, 40 ==> add
$4, $5, $6
lw $3, 300($0)
lw $3, 300($0)
-
Data hazards (dependecies backwards in time):
-
Forwarding (bypassing)
-
Reordering code
lw $t0, 0($t1)
lw $t0, 0($t1)
lw $t2, 4($t1) ==>
lw $t2, 4($t1)
sw $t2, 0($t1)
sw $t0, 4($t1)
sw $t0, 4($t1)
sw $t2, 0($t1)
-
Designing a pipelined processor
CS385 Computer Architecture, Lecture 15
Reading: Patterson & Hennessy - Section 4.6, 4.12 (CD)
Topic: Implementing pipeline datapath and control
Lecture
slides (PDF)
Lecture Notes
-
Splitting datapath into stages: using registers to store parts of the instruction
-
Transferring data forward and backward between the stages: lw example
-
Corrected datapath: storing rd for the write back stage.
-
Graphically representing pipelines: multiple-clock-cycle vs. single-clock-cycle
diagram
-
Pipeline control:
-
IF: no control signals to store for later stages (they are always asserted).
-
ID: no control signals to store for later stages (they are always asserted).
-
EX: set RegDst, ALUOp, ALUSrc
-
MEM: set Branch, MemRead, MemWrite
-
WB: set MemtoReg, RegWrite
-
Datapath with control
-
Example: running this code through the pipeline in 9 cycles.
lw $10, 20($1)
sub $11, $2, $3
and $12, $4, $5
or $13, $6, $7,
add $14, $8, $9
Exercises: Section 4.12 (CD), pages 16-30.
Demo: https://www.cs.tcd.ie/Jeremy.Jones/vivio/dlx/showanim.php?name=Tutorial09
CS385 Computer Architecture, Lecture 16
Reading: Patterson & Hennessy - Section 4.7, 4.8
Topic: Implementing data and branch hazard control
Lecture
slides (PDF)
Lecture Notes
-
Detecting data dependencies
-
EX/MEM.Rd = ID/EX.Rs
EX/MEM.Rd = ID/EX.Rt
-
MEM/WB.Rd = ID/EX.Rs
MEM/WB.Rd = ID/EX.Rt
-
Forwarding
-
if (EX/MEM.RegWrite and EX/MEM.Rd = ID/EX.Rs)ForwardA = 10
if (EX/MEM.RegWrite and EX/MEM.Rd = ID/EX.Rt)ForwardB = 10
-
if (MEM/WB.RegWrite and MEM/WB.Rd = ID/EX.Rs)ForwardA = 01
if (MEM/WB.RegWrite and MEM/WB.Rd = ID/EX.Rt)ForwardB = 01
-
Data hazards and stalls
If (ID/EX.MemRead and
(ID/EX.Rt = IF/ID.Rs or
ID/EX.Rt = IF/ID.Rt))
stall the pipeline
-
Branch hazards
-
Reducing the delay of branches - move up the address calculation (move
the adder) and the branch decision (add XOR and AND gates)
-
Assuming the branch will not be taken
-
Flashing instructions in IF, ID, and EX stages, if the branch is taken
-
Advanced pipelining
-
Superpipelining
-
Superscalar
-
Dynamic scheduling
CS385 Computer Architecture, Lecture 17
Reading: Patterson & Hennessy - Chapter 4, Section 4.12 (CD)
Topic: Review of Datapath, Control and Pipelining, HDL implementation
Programs: mips-pipe.vl
Lecture
slides (PDF)
Lecture Notes
Datapath
-
Abstract level implementation:
-
Instruction memory
-
Program counter
-
Register file
-
ALU
-
Data memory
-
Basic building elements
-
Combinational logic
-
State elements: D-lathes and D flip-flops
-
Clocking methodology: edge triggered
-
Basic operations
-
Instructions fetch
-
Accessing register file and execution of R-type instructions
-
Datapath for lw and sw instructions (add data memory and sign extend)
-
Datapath for branch instructions
Control
-
ALU control: mapping the opcode and function bits to the ALU control inputs
-
Designing the main control unit
-
Operation of the Datapath (single-cycle implementation):
-
R-type instructions
-
Load (store) word
-
Branching instructions
-
Problems of the single-cycle implementation
-
Multicycle Approach to Processor Control
-
Basic principles of the Multicycle Approach to Processor Control
-
Breaking up instruction into steps
-
Reusing functional units in different steps
-
Storing intermediate results
-
Need for controlling the sequence of steps
-
Execution steps:
-
Instruction fetch
-
Instruction decoding and register fetch
-
Instruction dependent (memory reference, R-type execution or branch)
-
R-type completion or memory access
-
Memory read completion
-
Finite state machine control
-
Microprogramming
Pipelining
-
Basic principles of pipelining
-
Pipelining helps throughput of the entire workload
-
Multiple tasks operating simultaneously and using different resources
-
Potential speedup = number of pipe stages
-
The pipeline rate is limited by the slowest stage
-
Unbalanced lengths of pipe stages reduces the speedup
-
The time to "fill" the pipeline and the time "drain" it reduces the speedup
-
Stall for dependencies
-
MIPS pipelining: the five stages of the lw instruction
-
Problems with pipelining:
-
Structural hazards
-
Data hazards
-
Control hazards
-
Designing a pipelined processor
-
Transferring data forward and backward between the stages
-
Pipeline control
-
Implementing data and branch hazard control
-
Detecting data dependencies
-
Forwarding
-
Data hazards and stalls
-
Branch hazards
-
Advanced pipelining
HDL implementation: mips-pipe.vl
Demo:
CS385 Computer Architecture, Lecture 18
Reading: Patterson & Hennessy - Section 5.1
Topic: Memory Hierarchy
Lecture
slides (PDF)
Lecture Notes
-
Memory technologies and trends
-
Impact on performance
-
The need of hierarchical memory organization
-
The principle of locality
-
Memory hierarchy terminology
-
Basics of RAM implementation
-
SRAM: D-latches, three-state buffers, address decoders, two level
addressing
-
DRAM: DRAM cell, refreshing
-
Error detection and correction
CS385 Computer Architecture, Lecture 19
Reading: Patterson & Hennessy - Section 5.2
Topic: The Basics of caches
Lecture
slides (PDF)
Lecture Notes
-
Direct-mapped cache
-
Accessing a cache
-
Writing to the cache (write-through and write-back schemes)
-
Handling cache misses
-
Read miss: load the word from memory
-
Write miss: write both to the cache and to the memory (using write buffer)
-
Example: DECStation 3100 cache
-
Spatial locality caches: keeping consistency on write
-
Main memory organization
CS385 Computer Architecture, Lecture 20
Reading: Patterson & Hennessy - Section 5.3
Topic: Improving cache performance
Lecture
slides (PDF)
Lecture Notes
-
Measuring cache performance
-
Stall clock cycles = Instructions * Miss rate * Miss penalty
-
Example 1 (reducing CPI):
-
2% - instruction miss; 4% - data miss; 36% - lw/sw; 40 cyc - miss penalty.
-
2 CPI => CPI_stall = 3.36, i.e. perfect cache is 1.68 times faster.
-
1 CPI => CPI_stall = 2.36, i.e. perfect cache is 2.36 times faster.
-
Example 2: doubling clock rate => 80 cyc - miss penalty, CPI_stall = 4.75,
Performance: 3.36/(4.75/2) = 1.41 faster with stalls, 2 times faster without
stalls.
-
Conclusion: cache penalties increase as the machine becomes faster
-
Flexible placement of blocks in the cache
-
Direct mapped: Cache index = (Block address) modulo (Cache size); no search;
small tag.
-
Set associative: Cache index = (Block address) modulo (Number of sets in
cache); search the set; larger tag.
-
Fully associative: Cache index is not determined; search the whole cache;
tag = address.
-
Locating a block in the cache: N-way cache requires N comparators and N-way
multiplexor
-
Choosing which block to replace: least recently used
-
Multilevel caches
Exercises
-
Write a sequence of memory references for which:
-
the direct mapped cache performs better than the 2-way associative cache;
-
the 2-way associative cache performs better than the fully associative
cache.
-
Exercises 5.3, 5.4 (pages 550, 551).
CS385 Computer Architecture, Lecture 21
Reading: Patterson & Hennessy - Section 5.4
Topic: Virtual Memory
Lecture
slides (PDF)
Lecture Notes
-
The need of VM
-
Many programs (processes) can use a single memory
-
Use a memory exceeding the size of the main memory
-
VM organization and terminology: virtual address, physical address, page,
page offset, page fault, memory mapping (translation).
-
Design decisions motivated by the very high cost of page faults:
-
Large pages (4K-64K)
-
Reducing page fault penalties: fully associative VM
-
Software management of page faults
-
Write-back instead of write-through
-
Addressing pages:
-
Page table, page table register
-
Processes (active, inactive) and page tables
-
Page faults
-
Replacing pages: LRU, reference (use) bit
-
Write-back scheme (dirty bit)
CS385 Computer Architecture, Lecture 22
Reading: Patterson & Hennessy - Section 5.4
Topic: Virtual Memory optimization
Lecture
slides (PDF)
Lecture Notes
-
Optimizing address translation - Translation Lookaside Buffer (TLB):
-
TLB miss
-
Page fault
-
TLB associativity
-
MIPS R2000 (DECStation 3100) TLB
-
Overall operation of a memory hierarchy
-
Memory protection with VM
-
Using exceptions for handling TLB misses and pages faults: using EPC and
Cause registers
-
Summary of VM
CS385 Computer Architecture, Lecture 23
Reading: Patterson & Hennessy - Section 5.5, 5.6.
Topic: A general framework of memory hierarchies
Lecture
slides (PDF)
Lecture Notes
-
Associativity schemes
-
Placing blocks
-
Miss rates and cache sizes
-
Finding blocks
-
Why do we use full associativity and a separate lookup table (page table)
in VM
-
Choosing a block to replace
-
Writing blocks
-
The sources of misses
-
The challenge: reducing the miss rate has a negative effect on the overall
performance
-
Pentium Pro and PowerPC 604
Exercises
CS385 Computer Architecture, Lecture 24
Reading: Sections 6.1 - 6.5
Topic: Interfacing Processors and Peripherals - Buses
Lecture
slides (PDF)
Lecture Notes
-
Buses: lines, transactions, types
-
Synchronous and asynchronous buses
-
Handshaking protocol
-
Bus access: master and slave
-
Bus arbitration schemes
-
Bus standards
CS385 Computer Architecture, Lecture 25
Reading: Section 6.6 - 6.8
Topic: Interfacing I/O devices to Memory, CPU and OS
Lecture
slides (PDF)
Lecture Notes
-
The role of the operating system in interfacing I/O devices to Memory
-
Controlling the I/O devices
-
Memory mapped I/O
-
Special I/O instructions
-
Communicating with the processor
-
Polling
-
Interrupt-driven I/O
-
Direct memory access (DMA)
-
DMA and the memory system
-
Designing an I/O system: latency and bandwidth constraints.
CS385 Computer Architecture, Lecture 26
Reading: Section 7.1 - 7.3
Topic: Multiprocessors
Lecture
slides (PDF)
COD-Chapter7.pdf
Lecture Notes
-
Basic approaches to sharing data and types of connectivity
-
Programming multiprocessors
-
Multiprocessors connected by a single bus
-
A parallel program
-
Multiprocessor cache coherency
-
Implementing a multiprocessor cache coherency protocol
-
Synchronization using coherency
CS385 Computer Architecture, Lecture 27
Reading: Section 7.4 - 7.10
Topic: Networks of muiltiprocessors and clusters
Lecture
slides (PDF)
COD-Chapter7.pdf
Lecture Notes
-
Shared memory vs. multiple private memories
-
Centralized memory vs. distributed memory
-
Parallel programming by message passing
-
Distributed memory communication
-
Memory allocation
-
Clusters and network topology
-
Modern clusters:
CS385 Assignment 1: Assembly Programming in
MIPS (maximum grade 10 points)
Write a program in MIPS assembler to perform some useful computation (e.g.
calculate sales tax, convert temperature from Celsius into Fahrenheit).
Use integer arithmetic and round results.
The program must include:
-
At least one instruction from each instruction type: R-type arithmetic,
I-type arithmetic, Memory transfer and Branch.
-
Input and Output through system calls.
-
Detailed comments explaining the type, format and the meaning
of each instruction including the compiler directives. If you use
a pseudo instruction, there must be an explanation how it translates into
real MIPS instructions and the latter must be commented too.
Use the SPIM simulator to debug and run the program. Appendix A of the
text (on CD) provides reference information about MIPS assembly programming.
You may find additional information about MIPS programming using SPIM at
CS
254 - Computer organization and assembly language programming and Introduction
to RISC Assembly Language Programming.
Documentation and submission: Submit the source of the program
(plain text file) as an attachment through Blackboard Vista Courses > CS-385
> Assignment 1.
CS385 Semester Project: Building a mini
MIPS machine (maximum grade 45 points including the presentation)
Note that this is a team project. So, you need to form teams of 2-3
members to accomplish it.
Description: Design a simplified version of a MIPS machine
and write Verilog programs that describe its structure and simulate
its functioning. Use structural (gate level) modeling for all components
unless otherwise specified. The machine should include the following components:
-
General purpose registers (register file): 4 registers, 16-bit long,
numbered 0 - 3. Register $0 must contain 0 (read-only). Implemented by
D flip-flops with gate-level modeling.
-
Other registers: 16-bit program counter, pipeline registers. Implemented
by reg data type in Verilog.
-
Istruction Memory. Word size: 16 bits, word addressed, size: 1024
bytes. Implemented by reg data type in Verilog.
-
Data Memory. Word size: 16 bits, byte addressed, size: 1024 bytes.
Implemented by reg data type in Verilog.
-
Data Cache (optional): direct mapped, write-through, 16-bit block
size, size: 8 blocks. Any kind of Verilog model accepted.
-
ALU: 16-bit data, 3-bit control (and, or, add, sub, slt).
-
Control unit: may be implemented by behavioral modeling.
-
Other components necessary to connect the main components: multiplexes
and decoders implemented by gate-level modeling.
Instruction set
| Instruction |
Opcode |
| add |
0000 |
| sub |
0001 |
| and |
0010 |
| or |
0011 |
| addi |
0100 |
| lw |
0101 |
| sw |
0110 |
| slt |
0111 |
| beq |
1000 |
| bne |
1001 |
Instruction formats:
R-format (add, sub, and, or, slt)
| op |
rs |
rt |
rd |
unused |
| 4 |
2 |
2 |
2 |
6 |
I-format (addi, lw, sw, beq, bne)
| op |
rs |
rt |
address / value |
| 4 |
2 |
2 |
8 |
Restrictions:
-
Use structural (gate level) modeling for all components except for
the program counter, memories, and pipeline registers.
-
Implement a pipelined datapath and control.
Extra credit (maximum 5 points): Implementing a carry lookahead
logic for the ALU, a data cache, additional MIPS instructions, or improvements
of the pipeline (forwarding or stalling).
Testing: To test the MIPS machine write a simple program
that includes arithmetic (add, sub), data transfer (lw, sw) and branch
(beq or bne) instructions. Use the addi instruction to introduce
numeric constants in your program. For example, this may be a program that
sums up 5 consecutive memory words by using a loop and stores the result
in a register. Include in your report:
-
The assembly source of the test program with comments explaining
the algortihm
-
The machine code (the contents of the memory)
-
Simulation results obtained by running the Verilog program. To monitor
the execution of the test program for each instruction display the value
at the write data input of the register file.
Progress reports: Three progress reports describing the current
status of the project and including the design of major components of the
MIPS machine should be submitted through Blackboard Vista. Each report
must include:
-
The names of the team members and the tasks that each one
accomplished.
-
A diagram showing the design of the processor with labels for all
components and signals corresponding exactly to the mudules, input/outputs
and wire names used in the Verilog code.
-
The Verilog source code including detailed comment for each module
defined or used.
-
Test result showing correct functioning of the components obtained
by running a test program. The source and the machine language
translation of the program must be included too.
The schedule for the progress reports and the presentation is the following:
-
Due on February 29: A simpilfied single-cycle datapath
capable of executing the addi instruction and all R-type
instructions. Major components: Instruction memory, ALU, and Register File.
Use template files: ALU4.vl
(extend it to 16 bit) and regfile.vl
(change the D flip-flops with 16-bit registers and redesign mux4x1 using
gate-level modeling and extend it to 16-bit data). See the behavioral implementation
mips-r-type+addi.vl.
Include in the report: Verilog sources and test results. For testing use
the test program test-r-addi.asm
and adjust it to the 16-bit version of MIPS following the requirements
given in section "Testing".
-
Due on March 28: Complete single-cycle datapath. Implement
all instructions and run a complete test program as explained in section
"Testing". See the behavioral model of MIPS mips-simple.vl
for the implementation of the data memory and branching logic. Use the
test program from mips-simple.vl
extended with a bne instruction. Include in the report: A top-level
diagram
of the datapath, Verilog source files and test results
-
Due on April 16: 3-stage pipelined datapath for addi
and R-type instructions.Write a test program that includes all R-type instructions
and addi, and run it through the pipeline. Use the 3-stage behavioural
model mips-pipe3.vl
and make the necessary changes. Include in the report: A top-level
diagram
of the datapath, Verilog source files and test results.
-
May 7, 4:30 - 6:30 pm: Semester project presentation (5 pts.). Prepare
presentation slides (PPT or PDF) and make a 10-15 min presentation of your
project. Every team member should present his/her work as a part
of the project.
Final Project (due on May 12): Implement the final version
of the complete pipelined datapath. Use the behavioural model mips-pipe.vl
and make the necessary changes. Write a general description of the machine
and short description of each major component, include the Verilog code
with comments and results from running the Verilog program simulating the
MIPS test program (see the "Testing" section above).
Submission of the final project: Submit the final project
report as Word, HTML or PDF documents through CCSU pipeline > Vista Courses
> CS-385. The submission must inlcude the following items:
-
The report file with a general description of the machine architecture,
diagrams for the major components, instruction set and format, and description
of each major component. The report should also inlcude:
-
The test program with comment showing the result that each instruction
produces and Verilog simulation output that matches these results. Follow
the directions in section Testing above.
-
The Verilog source code with comment.
-
The presentation slides (PPT or PDF)
CS385 Midterm Test (maximal grade 20 points)
The midterm test will be available in Blackboard Vista. There will be 20
multiple choice and short answer questions from the following topics: number
systems (binary, two's complement, floating point), MIPS instruction set
architecture and assembly programming, single-cycle datapath and control,
Verilog HDL, basics of pipelining, and solving pipeline hazards. Log on
to Blackboard Vista at https://vista.csus.ct.edu/webct/logon/29935382119061
for instructions.
CS385 Final Exam (maximal grade 25 points)
The Final Exam will be available in Blackboard Vista. There will be 20
multiple choice and short answer questions from the following topics: Pipelining,
Memory Hierarchy, Caches, Interfacing Peripherals, and Multiprocessors.
Log on to Blackboard Vista at https://vista.csus.ct.edu/webct/logon/29935382119061
for instructions.