CS385 Computer Architecture
Spring-2007
Classes: MW 5:15pm - 6:30pm, Robert Vance Academic Center 108
Instructor: Dr. Zdravko Markov, MS 307, (860)-832-2711, http://www.cs.ccsu.edu/~markov/,
e-mail: markovz at ccsu.edu
Office hours: MW 10:00am - 12:00pm, 6:30pm - 7:00pm, or by appointment
Catalog description: The architecture of the computer is explored
by studying its various levels: physical level, operating-system level,
conventional machine level and higher levels. An introduction to microprogramming
and computer networking is provided.
Course Prerequisites: CS 354
Prerequisites by topic:
-
Basic skills in software design and programming
-
Assembly language programming and basics of computer organization
-
Digital systems design
-
Boolean algebra and discrete mathematics
Course description: The course provides a comprehensive coverage
of computer architecture. It discusses the main components of computers
and the basic principles of their operation. It demonstrates the relationship
between the software and hardware and focuses on the foundational concepts
that are the basis for current computer design. The course is based on
the MIPS processor, a simple clean RISC processor whose architecture is
easy to learn and understand. The major topics covered by the course are
the following:
-
MIPS instruction set
-
Computer arithmetic and ALU design
-
Datapath and control
-
Pipelining
-
Memory hierarchy, caches and virtual memory
-
Interfacing CPU and peripherals, buses
-
Multiprocessors, networks of multiprocessors, parallel programming
-
Performance issues
Course Goals: Upon successful completion of the course the student
will be able to
-
Understand the fundamentals of different instruction set architectures
and their relationship to the CPU design.
-
Understand the principles and the implementation of computer arithmetic.
-
Understand the operation of modern CPUs including pipelining, memory systems
and busses.
-
Understand the principles of operation of multiprocessor systems.
-
Design a CPU (multicycle implementation) by a given specification using
HDL.
-
Write simple parallel programs.
Required textbook: David A. Patterson and John L. Hennessy, Computer
Organization and Design: The Hardware/Software Interface, Third Edition,
Morgan Kaufmann Publishers, 2004, ISBN: 1-55860-604-1.
Required software:
-
Icarus Verilog (available on the Patterson and Hennessy's book CD or at
http://armoid.com/icarus/).
Note about installation: don't use folder names that include spaces (like
Program Files). Read book sections B.4 and 5.8 for using HDL.
-
Other simulators that may be used for experiments (note that the project
should be done with a Verilog simulator):
-
Digital
Works 2.0 (freeware)
-
Verilog HDL (Book CD of Morris Mano, Digital Design, Third Edition, Prentice
Hall, 2002, ISBN 0-13-062121-8 )
-
SPIM simulator: A
free software simulator for running MIPS R2000 assembly language programs
available for Unix, DOS, and Windows.
WEB resources:
Semester project: There will be a semester project to build a simplified
MIPS machine. The project will require four progress reports which will
be graded too. The machine must be implemented in HDL Verilog, tested with
a sample MIPS program and properly documented.
Class Participation: Active participation in class is expected
of all students. Regular attendance is also expected. If you must miss
a test, try to inform the instructor of this in advance.
Honesty policy: It is expected that all students will conduct
themselves in an honest manner (see the CCSU Student handbook), and NEVER
claim work which is not their own. Violating this policy will result
in a substantial grade penalty, and may lead to expulsion from the University.
Grading: Grading will be based on one programming assignment
(5%), a midterm test (25%), a final exam (25%) and a semester project (45%,
including progress reports and the final documentation). The letter grades
will be calculated according to the following table:
| A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |
D+ |
D |
D- |
F |
| 95-100 |
90-94 |
87-89 |
84-86 |
80-83 |
77-79 |
74-76 |
70-73 |
67-69 |
64-66 |
60-63 |
0-59 |
Tentative schedule of classes and assignments
-
Introduction:
Computer Architecture = Instruction Set Architecture + Machine Organization
-
MIPS
Instructions: arithmetic, registers, memory, fecth&execute cycle
-
MIPS
Instructions: control and addressing modes
-
Computer
arithmetic and ALU design: representing numbers, arithmetic and logic operations
-
Assignment
1 due (5 pts.). ALU
design: full adder, slt operation, HDL design, carry lookahead
-
ALU
design: multiplication, representing floating point numbers
-
The
Processor: Building a datapath
-
The
Processor: Control (single cycle approach)
-
March 2: Progress Report #1 due (5 pts.): ALU, Register File. Use
template files: ALU4.vl
(extend it to 16 bit) and regfilewrite.vl
(use a behavioral 4-to-1 multiplexer for the read port).
-
Multicycle
approach to processor control
-
Implementing
finite state machine control, Microprogramming
-
Using
a Hardware Description Language to Design and Simulate the MIPS processor
-
Turing
machines
-
March 28: Progress Report #2 due (5 pts.) - Memory, Instruction
Fetch logic and Stage Control logic: use the project template (mips2.vl)
and (1) adjust the word size to 16 bit, (2) implement fetching with structural
design and (3) test it.
-
Introduction
to pipelining
-
Solving
pipeline hazards
-
April 2: Midterm
Test (25 pts.) due
-
Implementing
pipeline datapath and control
-
Implementing
data and branch hazards control
-
Review
of Datapath, Control and Pipelining
-
Memory
hierarchy
-
April 16: Progress Report #3 due (5 pts.) - Branching logic, Execution
control. See the project
description for more details.
-
The
Basics of caches
-
Improving
cache performance
-
Virtual
Memory basics
-
Virtual
Memory optimization
-
A
general framework of memory hierarchies
-
Interfacing
Processors and Peripherals - Buses
-
May2: Progress Report #4 due (5 pts.) - add register file, ALU and
connecting MUX's. Add data cache (optional).
-
Interfacing
I/O devices to Memory, CPU and OS
-
Multiprocessors
-
Networks
of muiltiprocessors
-
The
role of performance
-
Final
Exam due (25 pts.)
-
May 16: Semester
Project due (25 pts.)
CS385 Computer Architecture, Lecture 1
Reading: Chapter 1
Topics: Introduction, Computer Architecture = Instruction Set
Architecture + Machine Organization.
Lecture
slides (PDF)
Lecture Notes
-
Levels of Abstraction
-
Computer Architecture = Instruction Set Architecture + Machine Organization
-
Instruction Set The Software Hardware Interface
-
Levels of Computer Architecture in More Depth
-
Software:
-
Application
-
Operating System
-
Firmware
-
Instruction Set Architecture:
-
Organization of Programmable Storage
-
Data type and Structures: encodings and machine representation
-
Instruction set
-
Instruction Formats
-
Addressing Modes and Accessing Data and Instructions
-
Exception Handling
-
Hardware:
-
Instruction Set Processing
-
I/O System
-
Digital Design
-
Circuit Design
-
Layout
-
Basic Components of a Computer
-
Processor: Datapath and Control
-
Memory
-
I/O
-
Computer Organization
-
Capabilities and Performance of the Basic Functional Units
-
The Way These Units are Interconnected
-
Information Flow between components
-
Information Flow Control
CS385 Computer Architecture, Lecture 2
Reading: Sections 2.1 - 2.5
Topics: MIPS instructions, arithmetic, registers, memory, fecth&execute
cycle
Lecture
slides (PDF)
Lecture Notes
-
Design goal: maximize performance and minimize cost. Primitive (low level)
and very restrictive instructions (fixed number and type of operands).
-
Design principles:
-
Simplicity favors regularity (uniform instruction format)
-
Smaller is faster (only 32 registers)
-
Good design demands a compromise (I-type instructions)
-
MIPS arithmetic: 3 operands, fixed order, registers only.
-
Using only registers: R-type instructions.
-
Registers: 32-bits long, conventions.
-
Memory organization: words and byte addressing.
-
Data transfer (load and store) instructions. Example: accessing array elements.
-
Translating C code into MIPS instructions the swap example.
-
Machine Language: instruction format, I-type (Immediate) format for data
transfer
-
Stored program concept: programs in memory, fetch&execute cycle
-
Von Neumann Architecture
-
CPU, Memory System, I/O system
-
Stored program concept: programs in memory, fetch&execute cycle
-
Instructions are executed sequentially
-
Turing machines
-
Non-Von Neumann Architecture:
-
Various parallel and multiprocessor architectures (see book chapter 9)
-
In broader sense: NN, GA etc.
CS385 Computer Architecture, Lecture 3
Reading: Sections 2.6 - 2.9, 2.16
Topics: MIPS Instructions: control and addressing modes
Lecture
slides (PDF)
Lecture Notes
-
Implementing the C code for if in MIPS: conditional branch.
-
Implementing the C code for ifelse in MIPS: unconditional branch
-
Simple for loop
-
Check for less-than: building a pseudoinstuction for branch if less-than.
-
Addressing in branch instructions: PC-relative and pseudodirect.
-
Constants: use of immediate addressing (constants as operands addi, slti,
andi, ori)).
-
32-bit constants manipulate upper 2 bytes separately (load upper immediate)
-
Summary of MIPS addressing: register (add), immediate (addi), base or displacement
(lw), PC-relative (bne), pseudodirect (j).
-
Alternative approaches: IA-32
CS385 Computer Architecture, Lecture 4
Reading: Sections 3.1 - 3.3, B.5 (CD)
Topics: Computer arithmetic and ALU design: representing numbers,
arithmetic and logic operations
Lecture
slides (PDF)
Lecture Notes
-
Representing numbers: sign bit, one's complement, two's complement.
-
Arithmetic: addition, subtraction, detecting overflow.
-
Logical operations: shift, and, or.
-
Basic ALU building components: and-gate, or-gate, inverter, multiplexor.
-
ALU for logical operations.
-
ALU for add, and, or.
-
Supporting subtraction
Exercises:Implement
an overflow detection unit using only the CarryIn and CarryOut bits of
ALU-31
CS385 Computer Architecture, Lecture 5
Reading: Sections B.4, B.5, B.6 (CD)
Topics: ALU design: full adder, slt operation, HDL design, carry
lookahead
Lecture
slides (PDF)
Programs: 2-1-mux.vl,
4-bit-adder.vl,
more
examples of Verilog programs
Lecture Notes
-
Implementation of a full adder:
-
Carry out logic
-
Result logic: using 'and', 'or' and inverter and using xor-gate.
-
Supporting set on less-than (slt).
-
Test for equality (needed for branching)
-
Designing the ALU in Verilog
-
Carry Lookahead
Exercises: Implement and test half and full adders in Verilog with
using the structural specification approach (gate-level modeling).
CS385 Computer Architecture, Lecture 6
Reading: Section 3.4, 3.6 - 3.10
Topics: ALU design: multiplication, representing floating point
numbers
Lecture
slides (PDF)
Lecture Notes
-
Implementing multiplication:
-
Using 64-bit adder;
-
Using 32-bit adder for the upper 32-bit of the product;
-
Avoiding the use of the multiplier register.
-
Floating point numbers
-
Scientific notation: (-1)^sign * significand * 2^exponent
-
Range and precision (overflow and underflow).
-
IEEE 754 floating point standard - allows integer comparison:
-
normalized representation
-
implicit leading 1
-
exponent is biased: exponent in [0..0 (most negative), 1..1 (most positive)]
-
bits of the significand represent the fraction between 0 and 1.
-
(-1)^S * (1 + s1*2^-1 + s2*2^-2 + ...) * 2^(exponent-bias)
-
Problems with floating point arithmetic
Exercises
CS385 Computer Architecture, Lecture 7
Reading: Sections 5.1 - 5.3
Topics: The Processor, Building a Datapath
Lecture
slides (PDF)
Lecture Notes
-
Abstract level implementation:
-
Instruction memory
-
Program counter
-
Register file
-
ALU
-
Data memory
-
Basic building elements
-
Combinational logic
-
State elements: D-lathes and D flip-flops
-
Clocking methodology: edge triggered
-
Fetching instructions and incrementing the program counter
-
Register file and execution of R-type instructions
-
Datapath for lw and sw instructions (add data memory and sign extend)
-
Datapath for branch instructions
Demo: http://www.it.jcu.edu.au/Subjects/cp2005/resources/animation/wk6animations.ppt
CS385 Computer Architecture, Lecture 8
Reading: Section 5.4
Topics: Single-cycle
control
Lecture
slides (PDF)
Lecture Notes
-
ALU control: mapping the opcode and function bits to the ALU control inputs
-
Designing the main control unit
-
Operation of the Datapath (single-cycle implementation):
-
R-type instructions
-
Load (store) word
-
Branching instructions
-
Problems of the single-cycle implementation
Demo: http://www.web-ee.com/primers/files/MIPS/MIPS.htm
CS385 Computer Architecture, Lecture 9
Reading: Section 5.5, 5.6
Topics: Multicycle Approach to Processor Control, Exceptions
Lecture
slides (PDF)
Lecture Notes
-
Basic principles:
-
Breaking up instruction into steps
-
Reusing functional units in different steps
-
Storing intermediate results
-
Need for controlling the sequence of steps
-
Execution steps:
-
Instruction fetch
-
Instruction decoding and register fetch
-
Instruction dependent (memory reference, R-type execution or branch)
-
R-type completion or memory access
-
Memory read completion
-
Implementing the control unit - finite state machine
Exercises
-
How many cycles will it take to execute this code?
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Lbl #assume not
add $t5, $t2, $t3
sw $t5, 8($t3)
Lbl:
...
-
What is going on during the 8th cycle of execution?
-
In what cycle does the actual addition of $t2 and $t3 takes place?
CS385 Computer Architecture, Lecture 10
Reading: Section 5.7 (CD)
Topics: Implementing Finite State Machine Control, Microprogramming
Lecture
slides (PDF)
Lecture Notes
-
ROM implementation
-
Single ROM: 10-bit address, 20-bit word, 2^10*20=20K
-
Two ROM's: 2^10*4 (Op+State-> Next state) + 2^4*16 (State -> Control)
-
Programmable Logic Array (PLA)
-
Representing truth table as a sum of products
-
AND-gates and OR-gates arrays
-
Implementing the Next-step function with a Sequencer
-
Implementing datapath control by microprogramming
-
Microinstruction specification at symbolic level
-
Microinstruction format
-
Executing microinstructions
-
Microprogramming vs. ROM and PLA implementation
-
Handling exceptions and interrupts
-
Types of exceptions and interrupts
-
Handling exceptions
-
Extending the datapath control to detect exceptions
CS385 Computer Architecture, Lecture 11
Reading: Section: 5.8 (CD). Note the numerous errors in the Verilog
code.
Topic: Using Hardware Description Language to Design and Simulate
the MIPS processor
-
Behavior model of MIPS (mips.vl)
-
Project version (two stage control) of MIPS (mips2.vl).
Changes needed to complete the project:
-
Adjust the word size (ALU, registers, memory etc.) to 16 bit
-
Replace the behavior model of fetching with structural design
-
Implement the execute phase using structural design
-
Write a complete program for testing and test the CPU
CS385 Computer Architecture,
Lecture 12
Turing machines
-
Turing machines
-
A Turing
Machine Applet
-
Alan Turing web page
CS385 Computer Architecture, Lecture 13
Reading: Section: 6.1 - 6.2.
Topic: Introduction to Pipelining
Lecture
slides (PDF)
Lecture Notes
-
Pipelining by analogy (laundry example):
-
Pipelining helps throughput of the entire workload
-
Multiple tasks operating simultaneously and using different resources
-
Potential speedup = number of pipe stages
-
The pipeline rate is limited by the slowest stage
-
Unbalanced lengths of pipe stages reduces the speedup
-
The time to "fill" the pipeline and the time "drain" it reduces the speedup
-
Stall for dependencies
-
Five stages of the load MIPS instruction
-
The pipelined datapath
-
Single cycle, multiple cycle vs. pipeline
-
Advantages of pipelined execution
-
Problems with pipelining (pipeline hazards)
-
Structural hazards
-
Data hazards
-
Control hazards
CS385 Computer Architecture, Lecture 14
Reading: Section: 6.1 - 6.2.
Topic: Solving pipeline hazards, Designing a pipelined processor
Lecture
slides (PDF)
Lecture Notes
-
Structural hazards: single memory
-
Control hazards:
-
Stall: wait until decision is clear
-
Predict: fixed prediction (e.g. fail), dynamic prediction (based on history)
-
Delayed brach (software solution):
add $4, $5, $6
beq $1, $2, $40
beq $1, $2, 40 ==> add
$4, $5, $6
lw $3, 300($0)
lw $3, 300($0)
-
Data hazards (dependecies backwards in time):
-
Forwarding (bypassing)
-
Reordering code
lw $t0, 0($t1)
lw $t0, 0($t1)
lw $t2, 4($t1) ==>
lw $t2, 4($t1)
sw $t2, 0($t1)
sw $t0, 4($t1)
sw $t0, 4($t1)
sw $t2, 0($t1)
-
Designing a pipelined processor
CS385 Computer Architecture, Lecture 15
Reading: Section: 6.2 - 6.3
Topic: Implementing pipeline datapath and control
Lecture
slides (PDF)
Lecture Notes
-
Splitting datapath into stages: using registers to store parts of the instruction
-
Transferring data forward and backward between the stages: lw example
-
Corrected datapath: storing rd for the write back stage.
-
Graphically representing pipelines: multiple-clock-cycle vs. single-clock-cycle
diagram
-
Pipeline control:
-
IF: no control signals to store for later stages (they are always asserted).
-
ID: no control signals to store for later stages (they are always asserted).
-
EX: set RegDst, ALUOp, ALUSrc
-
MEM: set Branch, MemRead, MemWrite
-
WB: set MemtoReg, RegWrite
-
Datapath with control
-
Example: running this code through the pipeline in 9 cycles.
lw $10, 20($1)
sub $11, $2, $3
and $12, $4, $5
or $13, $6, $7,
add $14, $8, $9
CS385 Computer Architecture, Lecture 16
Reading: Section: 6.4 - 6.9
Topic: Implementing data and branch hazard control
Lecture
slides (PDF)
Lecture Notes
-
Detecting data dependencies
-
EX/MEM.Rd = ID/EX.Rs
EX/MEM.Rd = ID/EX.Rt
-
MEM/WB.Rd = ID/EX.Rs
MEM/WB.Rd = ID/EX.Rt
-
Forwarding
-
if (EX/MEM.RegWrite and EX/MEM.Rd = ID/EX.Rs)ForwardA = 10
if (EX/MEM.RegWrite and EX/MEM.Rd = ID/EX.Rt)ForwardB = 10
-
if (MEM/WB.RegWrite and MEM/WB.Rd = ID/EX.Rs)ForwardA = 01
if (MEM/WB.RegWrite and MEM/WB.Rd = ID/EX.Rt)ForwardB = 01
-
Data hazards and stalls
If (ID/EX.MemRead and
(ID/EX.Rt = IF/ID.Rs or
ID/EX.Rt = IF/ID.Rt))
stall the pipeline
-
Branch hazards
-
Reducing the delay of branches - move up the address calculation (move
the adder) and the branch decision (add XOR and AND gates)
-
Assuming the branch will not be taken
-
Flashing instructions in IF, ID, and EX stages, if the branch is taken
-
Advanced pipelining
-
Superpipelining
-
Superscalar
-
Dynamic scheduling
CS385 Computer Architecture, Lecture 17
Reading: Chapters 5, 6
Topic: Review of Datapath, Control and Pipelining
Lecture
slides (PDF)
Lecture
Notes
Datapath
-
Abstract level implementation:
-
Instruction memory
-
Program counter
-
Register file
-
ALU
-
Data memory
-
Basic building elements
-
Combinational logic
-
State elements: D-lathes and D flip-flops
-
Clocking methodology: edge triggered
-
Basic operations
-
Instructions fetch
-
Accessing register file and execution of R-type instructions
-
Datapath for lw and sw instructions (add data memory and sign extend)
-
Datapath for branch instructions
Control
-
ALU control: mapping the opcode and function bits to the ALU control inputs
-
Designing the main control unit
-
Operation of the Datapath (single-cycle implementation):
-
R-type instructions
-
Load (store) word
-
Branching instructions
-
Problems of the single-cycle implementation
-
Multicycle Approach to Processor Control
-
Basic principles of the Multicycle Approach to Processor Control
-
Breaking up instruction into steps
-
Reusing functional units in different steps
-
Storing intermediate results
-
Need for controlling the sequence of steps
-
Execution steps:
-
Instruction fetch
-
Instruction decoding and register fetch
-
Instruction dependent (memory reference, R-type execution or branch)
-
R-type completion or memory access
-
Memory read completion
-
Finite state machine control
-
Microprogramming
Pipelining
-
Basic principles of pipelining
-
Pipelining helps throughput of the entire workload
-
Multiple tasks operating simultaneously and using different resources
-
Potential speedup = number of pipe stages
-
The pipeline rate is limited by the slowest stage
-
Unbalanced lengths of pipe stages reduces the speedup
-
The time to "fill" the pipeline and the time "drain" it reduces the speedup
-
Stall for dependencies
-
MIPS pipelining: the five stages of the lw instruction
-
Problems with pipelining:
-
Structural hazards
-
Data hazards
-
Control hazards
-
Designing a pipelined processor
-
Transferring data forward and backward between the stages
-
Pipeline control
-
Implementing data and branch hazard control
-
Detecting data dependencies
-
Forwarding
-
Data hazards and stalls
-
Branch hazards
-
Advanced pipelining
Demo: http://www.web-ee.com/primers/files/MIPS/MIPS.htm
CS385 Computer Architecture, Lecture 18
Reading: Section 7.1
Topic: Memory Hierarchy
Lecture
slides (PDF)
Lecture
Notes
-
Memory technologies and trends
-
Impact on performance
-
The need of hierarchical memory organization
-
The principle of locality
-
Memory hierarchy terminology
-
Basics of RAM implementation
-
SRAM: D-latches, three-state buffers, address decoders, two level
addressing
-
DRAM: DRAM cell, refreshing
-
Error detection and correction
CS385 Computer Architecture, Lecture 19
Reading: Section 7.2
Topic: The Basics of caches
Lecture
slides (PDF)
Lecture Notes
-
Direct-mapped cache
-
Accessing a cache
-
Writing to the cache (write-through and write-back schemes)
-
Handling cache misses
-
Read miss: load the word from memory
-
Write miss: write both to the cache and to the memory (using write buffer)
-
Example: DECStation 3100 cache
-
Spatial locality caches: keeping consistency on write
-
Main memory organization
CS385 Computer Architecture, Lecture 20
Reading: Section 7.3
Topic: Improving cache performance
Lecture
slides (PDF)
Lecture
Notes
-
Measuring cache performance
-
Stall clock cycles = Instructions * Miss rate * Miss penalty
-
Example 1 (reducing CPI):
-
2% - instruction miss; 4% - data miss; 36% - lw/sw; 40 cyc - miss penalty.
-
2 CPI => CPI_stall = 3.36, i.e. perfect cache is 1.68 times faster.
-
1 CPI => CPI_stall = 2.36, i.e. perfect cache is 2.36 times faster.
-
Example 2: doubling clock rate => 80 cyc - miss penalty, CPI_stall = 4.75,
Performance: 3.36/(4.75/2) = 1.41 faster with stalls, 2 times faster without
stalls.
-
Conclusion: cache penalties increase as the machine becomes faster
-
Flexible placement of blocks in the cache
-
Direct mapped: Cache index = (Block address) modulo (Cache size); no search;
small tag.
-
Set associative: Cache index = (Block address) modulo (Number of sets in
cache); search the set; larger tag.
-
Fully associative: Cache index is not determined; search the whole cache;
tag = address.
-
Locating a block in the cache: N-way cache requires N comparators and N-way
multiplexor
-
Choosing which block to replace: least recently used
-
Multilevel caches
Exercises
-
Write a sequence of memory references for which:
-
the direct mapped cache performs better than the 2-way associative cache;
-
the 2-way associative cache performs better than the fully associative
cache.
-
Exercises 7.9, 7.10 (page 556), 7.29 (page 558).
CS385 Computer Architecture, Lecture 21
Reading: Section 7.4
Topic: Virtual Memory
Lecture
slides (PDF)
Lecture Notes
-
The need of VM
-
Many programs (processes) can use a single memory
-
Use a memory exceeding the size of the main memory
-
VM organization and terminology: virtual address, physical address, page,
page offset, page fault, memory mapping (translation).
-
Design decisions motivated by the very high cost of page faults:
-
Large pages (4K-64K)
-
Reducing page fault penalties: fully associative VM
-
Software management of page faults
-
Write-back instead of write-through
-
Addressing pages:
-
Page table, page table register
-
Processes (active, inactive) and page tables
-
Page faults
-
Replacing pages: LRU, reference (use) bit
-
Write-back scheme (dirty bit)
CS385 Computer Architecture, Lecture 22
Reading: Section 7.4
Topic: Virtual Memory optimization
Lecture
slides (PDF)
Lecture Notes
-
Optimizing address translation - Translation Lookaside Buffer (TLB):
-
TLB miss
-
Page fault
-
TLB associativity
-
MIPS R2000 (DECStation 3100) TLB
-
Overall operation of a memory hierarchy
-
Memory protection with VM
-
Using exceptions for handling TLB misses and pages faults: using EPC and
Cause registers
-
Summary of VM
CS385 Computer Architecture, Lecture 23
Reading: Section 7.5
Topic: A general framework of memory hierarchies
Lecture
slides (PDF)
Lecture Notes
-
Associativity schemes
-
Placing blocks
-
Miss rates and cache sizes
-
Finding blocks
-
Why do we use full associativity and a separate lookup table (page table)
in VM
-
Choosing a block to replace
-
Writing blocks
-
The sources of misses
-
The challenge: reducing the miss rate has a negative effect on the overall
performance
-
Pentium Pro and PowerPC 604
Exercises
CS385 Computer Architecture, Lecture 24
Reading: Section 8.4
Topic: Interfacing Processors and Peripherals - Buses
Lecture
slides (PDF)
Lecture Notes
-
Buses: lines, transactions, types
-
Synchronous and asynchronous buses
-
Handshaking protocol
-
Bus access: master and slave
-
Bus arbitration schemes
-
Bus standards
CS385 Computer Architecture, Lecture 25
Reading: Section 8.5
Topic: Interfacing I/O devices to Memory, CPU and OS
Lecture
slides (PDF)
Lecture Notes
-
The role of the operating system in interfacing I/O devices to Memory
-
Controlling the I/O devices
-
Memory mapped I/O
-
Special I/O instructions
-
Communicating with the processor
-
Polling
-
Interrupt-driven I/O
-
Direct memory access (DMA)
-
DMA and the memory system
-
Designing an I/O system: latency and bandwidth constraints.
CS385 Computer Architecture, Lecture 26
Reading: Section 9.1 - 9.3 (on CD)
Topic: Multiprocessors
Lecture
slides (PDF)
Lecture Notes
-
Basic approaches to sharing data and types of connectivity
-
Programming multiprocessors
-
Multiprocessors connected by a single bus
-
A parallel program
-
Multiprocessor cache coherency
-
Implementing a multiprocessor cache coherency protocol
-
Synchronization using coherency
CS385 Computer Architecture, Lecture 27
Reading: Section 9.4 - 9.6 (on CD)
Topic: Networks of muiltiprocessors and clusters
Lecture
slides (PDF)
Lecture Notes
-
Shared memory vs. multiple private memories
-
Centralized memory vs. distributed memory
-
Parallel programming by message passing
-
Distributed memory communication
-
Memory allocation
-
Clusters and network topology
-
Modern clusters:
CS385 Computer Architecture, Lecture 28
Reading: Section 4.1 - 4.4
Topic: The role of performance
Lecture
slides (PDF)
Lecture Notes
-
Measuring computer performance:
-
Program execution time
-
CPU time: user CPU time and system CPU time
-
Evaluating computer systems:
-
Relative performance
-
Workload
-
Benchmarks
-
Categories of parallelism
-
Single instruction stream, single data stream (SISD)
-
Single instruction stream, multiple data streams (SIMD)
-
Multiple instruction streams, single data stream (MISD)
-
Multiple instruction streams, multiple data streams (MIMD)
CS385 Assignment 1: Assembly Programming in
MIPS
Write a program in MIPS assembler to perform some useful computation (e.g.
calculate sales tax, convert temperature from Celsius into Fahrenheit).
The program must include:
-
At least one instruction from each instruction type: R-type arithmetic,
I-type arithmetic, Memory transfer and Branch.
-
Input and Output through system calls.
-
Comments explaining the type, format and the meaning of each instruction.
If you use a pseudo instruction, there must be an explanation how it translates
into real MIPS instructions.
Use the SPIM simulator
to debug and run the program. Appendix A of the text (on CD) provides reference
information about MIPS assembly programming. You may find additional information
about MIPS programming using SPIM at CS
254 - Computer organization and assembly language programming and Introduction
to RISC Assembly Language Programming.
Documentation and submission: Submit the source text of the program
(ASCII text) as an attachment through CCSU pipeline/Vista Courses/Computer
Architecture - CS-385-70 Spring07/Assignment 1.
CS385 Semester Project: Building a mini
MIPS machine (total grade 45 points, including 4 progress reports by 5
points each)
Posting date: February 12
Due date: May 16
Description: Not available at this time
CS385 Midterm Test
The midterm test topics include: number systems, MIPS assembly programming,
single-cycle datapath and control, multi-cycle datapath and control, Verilog
HDL, and solving pipeline hazards. The test can be downloaded from Vista
between March 28 and April 2 and must be submitted by
April 2.
CS385 Final Exam
The Final Exam topics include: Memory Hierarchy, Caches, Interfacing Peripherals,
and Multiprocessors. The test can be downloaded from Vista between
May
14 and May 18 and must be submitted by May 18.