CS385 Computer Architecture
Spring-2009
Classes: TR 6:45 pm - 8:00 pm, Maria Sanford Hall 203
Instructor: Dr. Zdravko Markov, MS 307, (860)-832-2711, http://www.cs.ccsu.edu/~markov/,
e-mail: markovz at ccsu dot edu
Office hours: TR 10:00am - 12:30pm, or by appointment
Catalog description: The architecture of the computer is explored
by studying its various levels: physical level, operating-system level,
conventional machine level and higher levels. An introduction to microprogramming
and computer networking is provided.
Course Prerequisites: CS 354
Prerequisites by topic:
-
Basic skills in software design and programming
-
Assembly language programming and basics of computer organization
-
Digital systems design
-
Boolean algebra and discrete mathematics
Course description: The course provides a comprehensive coverage
of computer architecture. It discusses the main components of the computer
and the basic principles of its operation. It demonstrates the relationship
between the software and the hardware and focuses on the foundational concepts
that are the basis for current computer design. The course is based on
the MIPS processor, a simple clean RISC processor whose architecture is
easy to learn and understand. The major topics covered in the course are
the following:
-
MIPS instruction set
-
Computer arithmetic and ALU design
-
Datapath and control
-
Pipelining
-
Memory hierarchy, caches and virtual memory
-
Interfacing CPU and peripherals, buses
-
Multiprocessors, networks of multiprocessors, parallel programming
-
Performance issues
Course Goals: Upon successful completion of the course the student
will be able to
-
Understand the fundamentals of different instruction set architectures
and their relationship to the CPU design.
-
Understand the principles and the implementation of computer arithmetic.
-
Understand the operation of modern CPUs including pipelining, memory systems
and busses.
-
Understand the principles of operation of multiprocessor systems.
-
Design a CPU (multicycle implementation) by a given specification using
HDL.
-
Write simple parallel programs.
Required textbook: David
A. Patterson and John
L. Hennessy, Computer Organization and Design: The Hardware/Software
Interface, Third Edition, Morgan Kaufmann Publishers, 2004, ISBN: 1-55860-604-1.
Required software:
-
Icarus Verilog:
HDL compiler and simulator, available on the Patterson and Hennessy's book
CD or at http://www.icarus.com/eda/verilog/.
Note about installation: don't use folder names that include spaces (like
Program Files). Read book sections B.4 and 5.8 for using HDL.
-
Other simulators that may be used for experiments (note that the project
should be done with a Verilog simulator):
-
Digital
Works 2.0 (freeware)
-
Verilog HDL (Book CD of Morris Mano, Digital Design, Third Edition, Prentice
Hall, 2002, ISBN 0-13-062121-8 )
-
SPIM simulator:
A free software simulator for running MIPS R2000 assembly language programs
available for Unix, DOS, and Windows.
Semester project: There will be a semester project to build a simplified
MIPS machine. The project will require three progress reports which will
be graded too. The machine must be implemented in HDL Verilog, tested with
a sample MIPS program and properly documented. The project and progress
reports must be submitted through the Vista course management system available
through CentralPipeline
(Student > Blackboard Vista Courses > CS-385) or directly at https://vista.csus.ct.edu/webct/logon/306873577011.
Class Participation: Active participation in class is expected
of all students. Regular attendance is also expected. If you must miss
a class, try to inform the instructor of this in advance.
Honesty policy: It is expected that all students will conduct
themselves in an honest manner (see the CCSU Student handbook), and NEVER
claim work which is not their own. Violating this policy will result in
a substantial grade penalty, and may lead to expulsion from the University.
Grading: Grading will be based on one programming assignment
(5%), a midterm test (25%), a final exam (25%) and a semester project (45%,
including progress reports and the final documentation). The letter grades
will be calculated according to the following table:
| A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |
D+ |
D |
D- |
F |
| 95-100 |
90-94 |
87-89 |
84-86 |
80-83 |
77-79 |
74-76 |
70-73 |
67-69 |
64-66 |
60-63 |
0-59 |
Unexcused late submission policy: Submissions made more than
two days after the due date will be graded one letter grade down.
Submissions made more than a week late will receive two letter
grades down. No submissions will be accepted more than two weeks
after the due date.
Tentative schedule of classes and assignments
Note: Dates will be placed for all due days. Check the schedule regularly
for updates!
-
January 27: Introduction:
Computer Architecture = Instruction Set Architecture + Machine Organization
-
February 3: MIPS
Instructions: arithmetic, registers, memory, fecth&execute cycle
-
February 5: MIPS
Instructions: control and addressing modes
-
February 10: Computer
arithmetic and ALU design: representing numbers, arithmetic and logic operations
-
February 12: Assignment
1 due (5 pts.).
-
February 17: ALU
design: full adder, slt operation, HDL design, carry lookahead
-
February 19: ALU
design: multiplication, representing floating point numbers
-
February 24: The
Processor: Building a datapath
-
February 26: The
Processor: Control (single cycle approach)
-
March 3: Progress Report #1 due (10 pts.): ALU, Register File. Use
template files:
ALU4.vl
(extend it to 16 bit) and regfile.vl
(change the D flip-flops with 16-bit registers and redesign mux4x1 using
gate-level modeling and extend it to 16-bit data).
-
March 5: Multicycle
approach to processor control
-
March 10: Implementing
finite state machine control, Microprogramming
-
March 12: Using
a Hardware Description Language to Design and Simulate the MIPS processor
-
March 17: Turing
machines
-
March 31: Introduction
to pipelining
-
March 31: Progress Report #2 due (10 pts.) - see Semester
Project for details.
-
April 2: Solving
pipeline hazards
-
April 7: Implementing
pipeline datapath and control
-
April 9: Midterm
Test (25 pts.) due
-
April 9: Implementing
data and branch hazards control
-
April 14: Review
of Datapath, Control and Pipelining
-
April 16: Memory
hierarchy, The
Basics of caches
-
April 21: Improving
cache performance
-
April 28: Progress Report #3 due (10 pts.) - Branching logic, Execution
control. See the project
description for more details.
-
May 5: Virtual
Memory basics, Virtual
Memory optimization, A
general framework of memory hierarchies
-
May 7: Interfacing
Processors and Peripherals - Buses, Interfacing
I/O devices to Memory, CPU and OS
-
May 12: Multiprocessors,
Networks
of muiltiprocessors
-
The
role of performance
-
May 21: Final
Exam due (25 pts.)
-
May 21: Semester
Project due (15 pts.)
CS385 Computer Architecture, Lecture 1
Reading: Chapter 1
Topics: Introduction, Computer Architecture = Instruction Set
Architecture + Machine Organization.
Lecture
slides (PDF)
Lecture Notes
-
Levels of Abstraction
-
Computer Architecture = Instruction Set Architecture + Machine Organization
-
Instruction Set The Software Hardware Interface
-
Levels of Computer Architecture in More Depth
-
Software:
-
Application
-
Operating System
-
Firmware
-
Instruction Set Architecture:
-
Organization of Programmable Storage
-
Data type and Structures: encodings and machine representation
-
Instruction set
-
Instruction Formats
-
Addressing Modes and Accessing Data and Instructions
-
Exception Handling
-
Hardware:
-
Instruction Set Processing
-
I/O System
-
Digital Design
-
Circuit Design
-
Layout
-
Basic Components of a Computer
-
Processor: Datapath and Control
-
Memory
-
I/O
-
Computer Organization
-
Capabilities and Performance of the Basic Functional Units
-
The Way These Units are Interconnected
-
Information Flow between components
-
Information Flow Control
CS385 Computer Architecture, Lecture 2
Reading: Sections 2.1 - 2.5
Topics: MIPS instructions, arithmetic, registers, memory, fecth&execute
cycle
Lecture
slides (PDF)
Lecture Notes
-
Design goal: maximize performance and minimize cost. Primitive (low level)
and very restrictive instructions (fixed number and type of operands).
-
Design principles:
-
Simplicity favors regularity (uniform instruction format)
-
Smaller is faster (only 32 registers)
-
Good design demands a compromise (I-type instructions)
-
MIPS arithmetic: 3 operands, fixed order, registers only.
-
Using only registers: R-type instructions.
-
Registers: 32-bits long, conventions.
-
Memory organization: words and byte addressing.
-
Data transfer (load and store) instructions. Example: accessing array elements.
-
Translating C code into MIPS instructions the swap example.
-
Machine Language: instruction format, I-type (Immediate) format for data
transfer
-
Stored program concept: programs in memory, fetch&execute cycle
-
Von Neumann Architecture
-
CPU, Memory System, I/O system
-
Stored program concept: programs in memory, fetch&execute cycle
-
Instructions are executed sequentially
-
Turing machines
-
Non-Von Neumann Architecture:
-
Various parallel and multiprocessor architectures (see book chapter 9)
-
In broader sense: NN, GA etc.
CS385 Computer Architecture, Lecture 3
Reading: Sections 2.6 - 2.9, 2.16
Topics: MIPS Instructions: control and addressing modes
Lecture
slides (PDF)
Lecture Notes
-
Implementing the C code for if in MIPS: conditional branch.
-
Implementing the C code for ifelse in MIPS: unconditional branch
-
Simple for loop
-
Check for less-than: building a pseudoinstuction for branch if less-than.
-
Addressing in branch instructions: PC-relative and pseudodirect.
-
Constants: use of immediate addressing (constants as operands addi, slti,
andi, ori)).
-
32-bit constants manipulate upper 2 bytes separately (load upper immediate)
-
Summary of MIPS addressing: register (add), immediate (addi), base or displacement
(lw), PC-relative (bne), pseudodirect (j).
-
Alternative approaches: IA-32
CS385 Computer Architecture, Lecture 4
Reading: Sections 3.1 - 3.3, B.5 (CD)
Topics: Computer arithmetic and ALU design: representing numbers,
arithmetic and logic operations
Lecture
slides (PDF)
Lecture Notes
-
Representing numbers: sign bit, one's complement, two's complement.
-
Arithmetic: addition, subtraction, detecting overflow.
-
Logical operations: shift, and, or.
-
Basic ALU building components: and-gate, or-gate, inverter, multiplexor.
-
ALU for logical operations.
-
ALU for add, and, or.
-
Supporting subtraction
Exercises: Implement an overflow detection unit using only the CarryIn
and CarryOut bits of ALU-31
Tutorials and practice quizzes on twos complement numbers:
CS385 Computer Architecture, Lecture 5
Reading: Sections B.4, B.5, B.6 (CD)
Topics: ALU design: full adder, slt operation, HDL design, carry
lookahead
Lecture
slides (PDF)
Programs: 2-1-mux.vl,
4-bit-adder.vl,
more
examples of Verilog programs, ALU4.vl
Lecture Notes
-
Implementation of a full adder:
-
Carry out logic
-
Result logic: using 'and', 'or' and inverter and using xor-gate.
-
Supporting set on less-than (slt).
-
Test for equality (needed for branching)
-
Designing the ALU in Verilog
-
Carry Lookahead
Exercises: Implement the 4-bit adder with carry lookahead logic
in Verilog using the structural specification approach (gate-level modeling).
CS385 Computer Architecture, Lecture 6
Reading: Section 3.4, 3.6 - 3.10
Topics: ALU design: multiplication, representing floating point
numbers
Lecture
slides (PDF)
Lecture Notes
-
Implementing multiplication:
-
Using 64-bit adder;
-
Using 32-bit adder for the upper 32-bit of the product;
-
Avoiding the use of the multiplier register.
-
Floating point numbers
-
Scientific notation: (-1)^sign * significand * 2^exponent
-
Range and precision (overflow and underflow).
-
IEEE 754 floating point standard - allows integer comparison:
-
normalized representation
-
implicit leading 1
-
exponent is biased: exponent in [0..0 (most negative), 1..1 (most positive)]
-
bits of the significand represent the fraction between 0 and 1.
-
(-1)^S * (1 + s1*2^-1 + s2*2^-2 + ...) * 2^(exponent-bias)
-
Problems with floating point arithmetic
Tutorials and practice quizzes on floating point numbers:
CS385 Computer Architecture, Lecture 7
Reading: Sections 5.1 - 5.3
Topics: The Processor, Building a Datapath
Lecture
slides (PDF)
Lecture Notes
-
Abstract level implementation:
-
Instruction memory
-
Program counter
-
Register file
-
ALU
-
Data memory
-
Basic building elements
-
Combinational logic
-
State elements: D-lathes and D flip-flops
-
Clocking methodology: edge triggered
-
Fetching instructions and incrementing the program counter
-
Register file and execution of R-type instructions
-
Datapath for lw and sw instructions (add data memory and sign extend)
-
Datapath for branch instructions
Demo: http://www.it.jcu.edu.au/Subjects/cp2005/resources/animation/wk6animations.ppt
CS385 Computer Architecture, Lecture 8
Reading: Section 5.4
Topics: Single-cycle control
Lecture
slides (PDF)
Lecture Notes
-
ALU control: mapping the opcode and function bits to the ALU control inputs
-
Designing the main control unit
-
Operation of the Datapath (single-cycle implementation):
-
R-type instructions
-
Load (store) word
-
Branching instructions
-
Problems of the single-cycle implementation
CS385 Computer Architecture, Lecture 9
Reading: Section 5.5, 5.6
Topics: Multicycle Approach to Processor Control, Exceptions
Lecture
slides (PDF)
Lecture Notes
-
Basic principles:
-
Breaking up instruction into steps
-
Reusing functional units in different steps
-
Storing intermediate results
-
Need for controlling the sequence of steps
-
Execution steps:
-
Instruction fetch
-
Instruction decoding and register fetch
-
Instruction dependent (memory reference, R-type execution or branch)
-
R-type completion or memory access
-
Memory read completion
-
Implementing the control unit - finite state machine
Exercises
-
How many cycles will it take to execute this code?
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Lbl #assume not
add $t5, $t2, $t3
sw $t5, 8($t3)
Lbl:
...
-
What is going on during the 8th cycle of execution?
-
In what cycle does the actual addition of $t2 and $t3 takes place?
CS385 Computer Architecture, Lecture 10
Reading: Section 5.7 (CD)
Topics: Implementing Finite State Machine Control, Microprogramming
Lecture
slides (PDF)
Lecture Notes
-
ROM implementation
-
Single ROM: 10-bit address, 20-bit word, 2^10*20=20K
-
Two ROM's: 2^10*4 (Op+State-> Next state) + 2^4*16 (State -> Control)
-
Programmable Logic Array (PLA)
-
Representing truth table as a sum of products
-
AND-gates and OR-gates arrays
-
Implementing the Next-step function with a Sequencer
-
Implementing datapath control by microprogramming
-
Microinstruction specification at symbolic level
-
Microinstruction format
-
Executing microinstructions
-
Microprogramming vs. ROM and PLA implementation
-
Handling exceptions and interrupts
-
Types of exceptions and interrupts
-
Handling exceptions
-
Extending the datapath control to detect exceptions
CS385 Computer Architecture, Lecture 11
Reading: Section: 5.8 (CD). Note the numerous errors in the Verilog
code.
Topic: Using Hardware Description Language to Design and Simulate
the MIPS processor
-
Behavior model of MIPS (mips.vl)
-
Project version (two stage control) of MIPS (mips2.vl).
Changes needed to complete the project:
-
Adjust the word size (ALU, registers, memory etc.) to 16 bit
-
Replace the behavior model of fetching with structural design
-
Implement the execute phase using structural design
-
Write a complete program for testing and test the CPU
Exercises
-
Implement the addi instruction by:
-
adding a new row to the single cycle main control table
-
adding a new column to the multicycle control table
-
modifying the microprogram
-
modifying the HDL code (mips.vl)
CS385 Computer Architecture,
Lecture 12
Turing machines
-
Turing
machines
-
Alan
Turing web page
-
A
Turing Machine Applet
CS385 Computer Architecture, Lecture 13
Reading: Section: 6.1 - 6.2.
Topic: Introduction to Pipelining
Lecture
slides (PDF)
Lecture Notes
-
Pipelining by analogy (laundry example):
-
Pipelining helps throughput of the entire workload
-
Multiple tasks operating simultaneously and using different resources
-
Potential speedup = number of pipe stages
-
The pipeline rate is limited by the slowest stage
-
Unbalanced lengths of pipe stages reduces the speedup
-
The time to "fill" the pipeline and the time "drain" it reduces the speedup
-
Stall for dependencies
-
Five stages of the load MIPS instruction
-
The pipelined datapath
-
Single cycle, multiple cycle vs. pipeline
-
Advantages of pipelined execution
-
Problems with pipelining (pipeline hazards)
-
Structural hazards
-
Data hazards
-
Control hazards
CS385 Computer Architecture, Lecture 14
Reading: Section: 6.1 - 6.2.
Topic: Solving pipeline hazards, Designing a pipelined processor
Lecture
slides I (PDF)
Lecture
slides II (PDF)
Lecture Notes
-
Structural hazards: single memory
-
Control hazards:
-
Stall: wait until decision is clear
-
Predict: fixed prediction (e.g. fail), dynamic prediction (based on history)
-
Delayed brach (software solution):
add $4, $5, $6
beq $1, $2, $40
beq $1, $2, 40 ==> add
$4, $5, $6
lw $3, 300($0)
lw $3, 300($0)
-
Data hazards (dependecies backwards in time):
-
Forwarding (bypassing)
-
Reordering code
lw $t0, 0($t1)
lw $t0, 0($t1)
lw $t2, 4($t1) ==>
lw $t2, 4($t1)
sw $t2, 0($t1)
sw $t0, 4($t1)
sw $t0, 4($t1)
sw $t2, 0($t1)
-
Designing a pipelined processor
CS385 Computer Architecture, Lecture 15
Reading: Section: 6.2 - 6.3
Topic: Implementing pipeline datapath and control
Lecture
slides (PDF)
Lecture Notes
-
Splitting datapath into stages: using registers to store parts of the instruction
-
Transferring data forward and backward between the stages: lw example
-
Corrected datapath: storing rd for the write back stage.
-
Graphically representing pipelines: multiple-clock-cycle vs. single-clock-cycle
diagram
-
Pipeline control:
-
IF: no control signals to store for later stages (they are always asserted).
-
ID: no control signals to store for later stages (they are always asserted).
-
EX: set RegDst, ALUOp, ALUSrc
-
MEM: set Branch, MemRead, MemWrite
-
WB: set MemtoReg, RegWrite
-
Datapath with control
-
Example: running this code through the pipeline in 9 cycles.
lw $10, 20($1)
sub $11, $2, $3
and $12, $4, $5
or $13, $6, $7,
add $14, $8, $9
Exercises: For More Practice, 6.9, 6.15 (on CD).
CS385 Computer Architecture, Lecture 16
Reading: Section: 6.4 - 6.9
Topic: Implementing data and branch hazard control
Lecture
slides (PDF)
Lecture Notes
-
Detecting data dependencies
-
EX/MEM.Rd = ID/EX.Rs
EX/MEM.Rd = ID/EX.Rt
-
MEM/WB.Rd = ID/EX.Rs
MEM/WB.Rd = ID/EX.Rt
-
Forwarding
-
if (EX/MEM.RegWrite and EX/MEM.Rd = ID/EX.Rs)ForwardA = 10
if (EX/MEM.RegWrite and EX/MEM.Rd = ID/EX.Rt)ForwardB = 10
-
if (MEM/WB.RegWrite and MEM/WB.Rd = ID/EX.Rs)ForwardA = 01
if (MEM/WB.RegWrite and MEM/WB.Rd = ID/EX.Rt)ForwardB = 01
-
Data hazards and stalls
If (ID/EX.MemRead and
(ID/EX.Rt = IF/ID.Rs or
ID/EX.Rt = IF/ID.Rt))
stall the pipeline
-
Branch hazards
-
Reducing the delay of branches - move up the address calculation (move
the adder) and the branch decision (add XOR and AND gates)
-
Assuming the branch will not be taken
-
Flashing instructions in IF, ID, and EX stages, if the branch is taken
-
Advanced pipelining
-
Superpipelining
-
Superscalar
-
Dynamic scheduling
CS385 Computer Architecture, Lecture 17
Reading: Chapters 5, 6
Topic: Review of Datapath, Control and Pipelining
Lecture
slides (PDF)
Lecture Notes
Datapath
-
Abstract level implementation:
-
Instruction memory
-
Program counter
-
Register file
-
ALU
-
Data memory
-
Basic building elements
-
Combinational logic
-
State elements: D-lathes and D flip-flops
-
Clocking methodology: edge triggered
-
Basic operations
-
Instructions fetch
-
Accessing register file and execution of R-type instructions
-
Datapath for lw and sw instructions (add data memory and sign extend)
-
Datapath for branch instructions
Control
-
ALU control: mapping the opcode and function bits to the ALU control inputs
-
Designing the main control unit
-
Operation of the Datapath (single-cycle implementation):
-
R-type instructions
-
Load (store) word
-
Branching instructions
-
Problems of the single-cycle implementation
-
Multicycle Approach to Processor Control
-
Basic principles of the Multicycle Approach to Processor Control
-
Breaking up instruction into steps
-
Reusing functional units in different steps
-
Storing intermediate results
-
Need for controlling the sequence of steps
-
Execution steps:
-
Instruction fetch
-
Instruction decoding and register fetch
-
Instruction dependent (memory reference, R-type execution or branch)
-
R-type completion or memory access
-
Memory read completion
-
Finite state machine control
-
Microprogramming
Pipelining
-
Basic principles of pipelining
-
Pipelining helps throughput of the entire workload
-
Multiple tasks operating simultaneously and using different resources
-
Potential speedup = number of pipe stages
-
The pipeline rate is limited by the slowest stage
-
Unbalanced lengths of pipe stages reduces the speedup
-
The time to "fill" the pipeline and the time "drain" it reduces the speedup
-
Stall for dependencies
-
MIPS pipelining: the five stages of the lw instruction
-
Problems with pipelining:
-
Structural hazards
-
Data hazards
-
Control hazards
-
Designing a pipelined processor
-
Transferring data forward and backward between the stages
-
Pipeline control
-
Implementing data and branch hazard control
-
Detecting data dependencies
-
Forwarding
-
Data hazards and stalls
-
Branch hazards
-
Advanced pipelining
Demo: http://www.web-ee.com/primers/files/MIPS/MIPS.htm
CS385 Computer Architecture, Lecture 18
Reading: Section 7.1
Topic: Memory Hierarchy
Lecture
slides (PDF)
Lecture Notes
-
Memory technologies and trends
-
Impact on performance
-
The need of hierarchical memory organization
-
The principle of locality
-
Memory hierarchy terminology
-
Basics of RAM implementation
-
SRAM: D-latches, three-state buffers, address decoders, two level
addressing
-
DRAM: DRAM cell, refreshing
-
Error detection and correction
CS385 Computer Architecture, Lecture 19
Reading: Section 7.2
Topic: The Basics of caches
Lecture
slides (PDF)
Lecture Notes
-
Direct-mapped cache
-
Accessing a cache
-
Writing to the cache (write-through and write-back schemes)
-
Handling cache misses
-
Read miss: load the word from memory
-
Write miss: write both to the cache and to the memory (using write buffer)
-
Example: DECStation 3100 cache
-
Spatial locality caches: keeping consistency on write
-
Main memory organization
CS385 Computer Architecture, Lecture 20
Reading: Section 7.3
Topic: Improving cache performance
Lecture
slides (PDF)
Lecture Notes
-
Measuring cache performance
-
Stall clock cycles = Instructions * Miss rate * Miss penalty
-
Example 1 (reducing CPI):
-
2% - instruction miss; 4% - data miss; 36% - lw/sw; 40 cyc - miss penalty.
-
2 CPI => CPI_stall = 3.36, i.e. perfect cache is 1.68 times faster.
-
1 CPI => CPI_stall = 2.36, i.e. perfect cache is 2.36 times faster.
-
Example 2: doubling clock rate => 80 cyc - miss penalty, CPI_stall = 4.75,
Performance: 3.36/(4.75/2) = 1.41 faster with stalls, 2 times faster without
stalls.
-
Conclusion: cache penalties increase as the machine becomes faster
-
Flexible placement of blocks in the cache
-
Direct mapped: Cache index = (Block address) modulo (Cache size); no search;
small tag.
-
Set associative: Cache index = (Block address) modulo (Number of sets in
cache); search the set; larger tag.
-
Fully associative: Cache index is not determined; search the whole cache;
tag = address.
-
Locating a block in the cache: N-way cache requires N comparators and N-way
multiplexor
-
Choosing which block to replace: least recently used
-
Multilevel caches
Exercises
-
Write a sequence of memory references for which:
-
the direct mapped cache performs better than the 2-way associative cache;
-
the 2-way associative cache performs better than the fully associative
cache.
-
Exercises 7.9, 7.10 (page 556), 7.29 (page 558).
CS385 Computer Architecture, Lecture 21
Reading: Section 7.4
Topic: Virtual Memory
Lecture
slides (PDF)
Lecture Notes
-
The need of VM
-
Many programs (processes) can use a single memory
-
Use a memory exceeding the size of the main memory
-
VM organization and terminology: virtual address, physical address, page,
page offset, page fault, memory mapping (translation).
-
Design decisions motivated by the very high cost of page faults:
-
Large pages (4K-64K)
-
Reducing page fault penalties: fully associative VM
-
Software management of page faults
-
Write-back instead of write-through
-
Addressing pages:
-
Page table, page table register
-
Processes (active, inactive) and page tables
-
Page faults
-
Replacing pages: LRU, reference (use) bit
-
Write-back scheme (dirty bit)
CS385 Computer Architecture, Lecture 22
Reading: Section 7.4
Topic: Virtual Memory optimization
Lecture
slides (PDF)
Lecture Notes
-
Optimizing address translation - Translation Lookaside Buffer (TLB):
-
TLB miss
-
Page fault
-
TLB associativity
-
MIPS R2000 (DECStation 3100) TLB
-
Overall operation of a memory hierarchy
-
Memory protection with VM
-
Using exceptions for handling TLB misses and pages faults: using EPC and
Cause registers
-
Summary of VM
CS385 Computer Architecture, Lecture 23
Reading: Section 7.5
Topic: A general framework of memory hierarchies
Lecture
slides (PDF)
Lecture Notes
-
Associativity schemes
-
Placing blocks
-
Miss rates and cache sizes
-
Finding blocks
-
Why do we use full associativity and a separate lookup table (page table)
in VM
-
Choosing a block to replace
-
Writing blocks
-
The sources of misses
-
The challenge: reducing the miss rate has a negative effect on the overall
performance
-
Pentium Pro and PowerPC 604
Exercises
CS385 Computer Architecture, Lecture 24
Reading: Section 8.4
Topic: Interfacing Processors and Peripherals - Buses
Lecture
slides (PDF)
Lecture Notes
-
Buses: lines, transactions, types
-
Synchronous and asynchronous buses
-
Handshaking protocol
-
Bus access: master and slave
-
Bus arbitration schemes
-
Bus standards
CS385 Computer Architecture, Lecture 25
Reading: Section 8.5
Topic: Interfacing I/O devices to Memory, CPU and OS
Lecture
slides (PDF)
Lecture Notes
-
The role of the operating system in interfacing I/O devices to Memory
-
Controlling the I/O devices
-
Memory mapped I/O
-
Special I/O instructions
-
Communicating with the processor
-
Polling
-
Interrupt-driven I/O
-
Direct memory access (DMA)
-
DMA and the memory system
-
Designing an I/O system: latency and bandwidth constraints.
CS385 Computer Architecture, Lecture 26
Reading: Section 9.1 - 9.3 (on CD)
Topic: Multiprocessors
Lecture
slides (PDF)
Lecture Notes
-
Basic approaches to sharing data and types of connectivity
-
Programming multiprocessors
-
Multiprocessors connected by a single bus
-
A parallel program
-
Multiprocessor cache coherency
-
Implementing a multiprocessor cache coherency protocol
-
Synchronization using coherency
CS385 Computer Architecture, Lecture 27
Reading: Section 9.4 - 9.6 (on CD)
Topic: Networks of muiltiprocessors and clusters
Lecture
slides (PDF)
Lecture Notes
-
Shared memory vs. multiple private memories
-
Centralized memory vs. distributed memory
-
Parallel programming by message passing
-
Distributed memory communication
-
Memory allocation
-
Clusters and network topology
-
Modern clusters:
CS385 Computer Architecture, Lecture 28
Reading: Section 4.1 - 4.4
Topic: The role of performance
Lecture
slides (PDF)
Lecture Notes
-
Measuring computer performance:
-
Program execution time
-
CPU time: user CPU time and system CPU time
-
Evaluating computer systems:
-
Relative performance
-
Workload
-
Benchmarks
-
Categories of parallelism
-
Single instruction stream, single data stream (SISD)
-
Single instruction stream, multiple data streams (SIMD)
-
Multiple instruction streams, single data stream (MISD)
-
Multiple instruction streams, multiple data streams (MIMD)
CS385 Assignment 1: Assembly Programming in
MIPS (maximal grade 5 points)
Write a program in MIPS assembler to perform some useful computation (e.g.
calculate sales tax, convert temperature from Celsius into Fahrenheit).
The program must include:
-
At least one instruction from each instruction type: R-type arithmetic,
I-type arithmetic, Memory transfer and Branch.
-
Input and Output through system calls.
-
Comments explaining the type, format and the meaning of each instruction.
If you use a pseudo instruction, there must be an explanation how it translates
into real MIPS instructions.
Use the SPIM
simulator to debug and run the program. Appendix A of the text (on
CD) provides reference information about MIPS assembly programming. You
may find additional information about MIPS programming using SPIM at CS
254 - Computer organization and assembly language programming and Introduction
to RISC Assembly Language Programming.
Documentation and submission: Submit the source text of the program
(ASCII text) as an attachment through Blackboard Vista Courses > CS-385
> Assignment 1.
CS385 Semester Project: Building a mini
MIPS machine (total grade 45 points, including 3 progress reports by 10
points each)
Not available at this time.
CS385 Midterm Test (maximal grade 25 points)
The midterm test topics include: number systems, MIPS assembly programming,
single-cycle datapath and control, multi-cycle datapath and control, Verilog
HDL, and solving pipeline hazards. Log on to Blackboard Vista at https://vista.csus.ct.edu/webct/logon/306873577011
for instructions.
CS385 Final Exam (maximal grade 25 points)
The Final Exam topics include: Memory Hierarchy, Caches, Interfacing Peripherals,
and Multiprocessors.