May 7, 1:00pm - 3:00pm: Semester project presentation
May 7-10: Final Exam to be taken online in Blackboard
CS385 Computer Architecture (Section 01)
Spring-2018
Classes: MW 3:05pm - 4:20pm, Maria Sanford Hall 204
Instructor: Dr. Zdravko Markov, MS 30307, (860)-832-2711, http://www.cs.ccsu.edu/~markov/, e-mail: markovz at ccsu dot
edu
Office hours: MW 4:30pm - 6:00pm, TR 10:45am - 12:00pm, or by appointment
Catalog description: The architecture of the
computer is explored by studying its various levels: physical level,
operating-system level, conventional machine level and higher levels. An
introduction to microprogramming and computer networking is provided.
Course Prerequisites: CS 354
Prerequisites by topic
- Basic skills in software design and programming
- Assembly language programming and basics of computer organization
- Digital systems design
- Boolean algebra and discrete mathematics
Course description
The course provides a comprehensive coverage
of computer architecture. It discusses the main components of the computer and
the basic principles of its operation. It demonstrates the relationship between
the software and the hardware and focuses on the foundational concepts that are
the basis for current computer design. The course is based on the MIPS
processor, a simple clean RISC processor whose architecture is easy to learn and
understand. The major topics covered in the course are the following:
- MIPS instruction set
- Computer arithmetic and ALU design
- Datapath and control
- Using Hardware Description Language to design and simulate the CPU
- Pipelining
- Memory hierarchy, caches and virtual memory
- Interfacing CPU and peripherals, buses
- Multiprocessors, networks of multiprocessors, parallel programming
- Performance issues
Course Learning Outcomes (CLO)
- Understand the fundamentals of different instruction set architectures and
their relationship to the CPU design.
- Understand the principles and the implementation of computer
arithmetic.
- Understand the operation of modern CPUs including pipelining, memory
systems and busses.
- Understand the principles of operation of multiprocessor systems and
parallel programming.
- Design and emulate a single cycle or pipelined CPU by given specifications
using Hardware Description Language (HDL).
- Work in teams to design and implement CPUs.
- Write reports and make presentations of computer architecture
projects.
The CS 385 Course Learning Outcomes support the following Student
Outcomes (SO):
- SO-2: Design, implement, and evaluate a computing-based solution to meet a given set of computing requirements in the context of the programs discipline
(supported by CLO's 5, 6, 7).
- SO-6: Apply computer science theory and software development fundamentals to produce computing-based solutions
(supported by CLO's 1, 2, 3, 4).
Required textbook
Required software
- Icarus
Verilog: HDL compiler and simulator, available from the book companion
website or at http://bleyer.org/icarus/. Note about installation: don't use
folder names that include spaces (like Program Files). Read book sections B.4
and 5.8 for using HDL. An online simulator is availabe at
https://www.tutorialspoint.com/compile_verilog_online.php
- SPIM
simulator: A free software simulator for running MIPS R2000 assembly
language programs available for Windows and other platforms.
- Other simulators that may be used for drawing logic diagrams and
experimenting with small circuirs (note that the semester project should be
done with Verilog):
Semester project: There will be a semester project to build
a simplified MIPS machine. The projects will be done in teams of 2-3 people each
and will require three progress reports, a final report and a presentation. The
machine must be implemented in HDL Verilog, tested with a sample MIPS program
and properly documented. The progress and the final reports must be submitted in
Blackboard Learn at https://ccsu.blackboard.com/.
Class Participation: Active participation in
class is expected of all students. Regular attendance is also expected. If you
must miss a class, try to inform the instructor of this in advance.In case of
missed classes and work due to plausible reasons (such as illness or accidents)
limitted assistance will be offered. Unexcused absences will result in the
student being totally responsible for the make-up process.
Honesty policy: The CCSU honor code for
Academic Integrity is in effect in this class. It is expected that all students
will conduct themselves in an honest manner and NEVER claim work which is not
their own. Violating this policy will result in a substantial grade penalty, and
may lead to expulsion from the University. You may find it online at http://web.ccsu.edu/academicintegrity/. Please read it
carefully.
Grading: Grading will be based on one
programming assignment (10%), a midterm test (20%), a final exam (25%) and a
semester project (45%, including progress reports, the final documentation, and
the presentation). The letter grades will be calculated according to the
following table:
A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |
D+ |
D |
D- |
F |
95-100 |
90-94 |
87-89 |
84-86 |
80-83 |
77-79 |
74-76 |
70-73 |
67-69 |
64-66 |
60-63 |
0-59 |
Unexcused late submission policy: Submissions
made more than two days after the due date will be graded one letter
grade down. Submissions made more than a week late will receive
two letter grades down. No submissions will be accepted more than two
weeks after the due date.
Students with disabilities: Students who
believe they need course accommodations based on the impact of a disability,
medical condition, or emergency should contact me privately to discuss
their specific needs. I will need a copy of the accommodation letter from
Student Disability Services in order to arrange class accommodations. Contact
Student Disability Services, Willard Hall, 101-04 if you are not already
registered with them. Student Disability Services maintains the confidential
documentation of your disability and assists you in coordinating reasonable
accommodations with your faculty.
Tentative schedule of classes and assignments
Note:
Dates for classes, assignments and tests may change (see also University Calendar). The lecture notes may also be updated.
Check the schedule and the class pages regularly for updates!
- Jan 22: Introduction:
Computer Architecture = Instruction Set Architecture + Machine
Organization
- Jan 24: Review of HDL
- Jan 29: MIPS
Instructions: arithmetic, registers, memory, fecth&execute cycle
- Jan 31: MIPS
Instructions: control and addressing modes
- Jan 31 : Submit Digital
Design Review Assignment (extra credit)
- Feb 5: Computer
arithmetic and ALU design: representing numbers, arithmetic and logic
operations
- Feb 7: Snow day
- Feb 12: ALU
design: full adder, slt operation, HDL design, carry lookahead
- Feb 12: Assignment
1 due (10 pts.)
- Feb 14: ALU
design: multiplication, representing floating point numbers
- Feb 21: The
Processor: Building a datapath
- Feb 26: The
Processor: Control (single cycle approach)
- Feb 28: Using
a Hardware Description Language to Design and Simulate the MIPS
processor. Review of Semester Project Reports #1 and #2
- March 5: Introduction
to pipelining
- March 7: Progress Report #1 due (10 pts.): A simpilfied
single-cycle datapath capable of executing the addi instruction and all
R-type instructions. See Semester
Project for details.
- March 7: Snow day
- March 19: Solving
pipeline hazards
- March 21: Snow day
- March 23-25: Midterm
Test (20 pts.) to be taken online in Blackboard
- March 26: Implementing
pipeline datapath and control, Implementing
data and branch hazards control
- March 28: Review
of Datapath, Control and Pipelining, HDL implementation
- April 2: Progress Report #2 due (10 pts.): Complete
single-cycle datapath. See Semester
Project for details.
- April 2: Memory
hierarchy
- April 4: The
Basics of caches
- April 9: Improving
cache performance
- April 11: Virtual
Memory basics
- April 16: Progress Report #3 due (10 pts.): 3-stage
pipelined datapath for addi and R-type instructions. See Semester
Project for details.
- April 16: Virtual
Memory optimization
- April 18: A
general framework of memory hierarchies
- April 23: Interfacing
Processors and Peripherals - Buses
- April 25: Interfacing
I/O devices to Memory, CPU and OS
- April 30: Multiprocessors
- May 2: Networks of muiltiprocessors,
Review of memory System, Buses, I/O and Multiprocessors
- May 7-10: Final
Exam (25 pts.) to be taken online in Blackboard
- May 10: Semester
Project Final Report due (10 pts.)
- May 7, 1:00pm - 3:00pm: Semester project presentation (5 pts.)
CS385 Computer Architecture, Lecture
1
Reading: Chapter 1
Topics: Introduction, Computer
Architecture = Instruction Set Architecture + Machine Organization.
Lecture slides (PDF)
Lecture Notes
- Levels of Abstraction
- Computer Architecture = Instruction Set Architecture + Machine
Organization
- Instruction Set The Software Hardware Interface
- Levels of Computer Architecture in More Depth
- Software:
- Application
- Operating System
- Firmware
- Instruction Set Architecture:
- Organization of Programmable Storage
- Data type and Structures: encodings and machine representation
- Instruction set
- Instruction Formats
- Addressing Modes and Accessing Data and Instructions
- Exception Handling
- Hardware:
- Instruction Set Processing
- I/O System
- Digital Design
- Circuit Design
- Layout
- Basic Components of a Computer
- Processor: Datapath and Control
- Memory
- I/O
- Computer Organization
- Capabilities and Performance of the Basic Functional Units
- The Way These Units are Interconnected
- Information Flow between components
- Information Flow Control
- Performance (Lecture slides (PDF)
- Measuring and improving computer performance:
- Program execution time
- CPU time: user CPU time and system CPU time
- Power
- Evaluating computer systems:
- Relative performance
- Workload
- Benchmarks
- Multiprocessors and Parallelism
- Single instruction stream, single data stream (SISD)
- Single instruction stream, multiple data streams (SIMD)
- Multiple instruction streams, single data stream (MISD)
- Multiple instruction streams, multiple data streams
(MIMD)
CS385 Computer Architecture, Lecture
2
Reading: Patterson & Hennessy - Sections 2.1 - 2.3, 2.5 - 2.7,
2.10, 2.13, 2.16 - 2.20, A.9, Tutorials/Getting Started with PCSpim (book
companion website).
Topics: MIPS instructions, arithmetic,
registers, memory, fecth& execute cycle, SPIM simulator
Lecture slides (PDF)
Lecture Notes
- Design goal: maximize performance and minimize cost. Primitive (low level)
and very restrictive instructions (fixed number and type of operands).
- Design principles:
- Simplicity favors regularity (uniform instruction format)
- Smaller is faster (only 32 registers)
- Good design demands a compromise (I-type instructions)
- MIPS arithmetic: 3 operands, fixed order, registers only.
- Using only registers: R-type instructions.
- Registers: 32-bits long, conventions.
- Memory organization: words and byte addressing.
- Data transfer (load and store) instructions. Example: accessing array
elements.
- Translating C code into MIPS instructions the swap example.
- Machine Language: instruction format, I-type (Immediate) format for data
transfer
- Stored program concept: programs in memory, fetch & execute cycle
- Von Neumann Architecture
- CPU, Memory System, I/O system
- Stored program concept: programs in memory, fetch & execute
cycle
- Instructions are executed sequentially
- Turing machines
- Non-Von Neumann Architecture:
- Various parallel and multiprocessor architectures
- In broader sense: NN, GA etc.
Exercises: Load this program in the SPIM simulator and analyze the format of
the insturctions. Run the program with different values of X and Y and trace the
execution in step mode.
CS385 Computer Architecture, Lecture
3
Reading: Patterson & Hennessy - Chapter 2, Appendix A
Topics: MIPS Instructions: control and addressing modes
Lecture slides (PDF)
Lecture Notes
- Implementing the C code for if in MIPS: conditional branch.
- Implementing the C code for ifelse in MIPS: unconditional branch
- Simple for loop
- Check for less-than: building a pseudoinstuction for branch if
less-than.
- Addressing in branch instructions: PC-relative and pseudodirect.
- Constants: use of immediate addressing (constants as operands addi,
slti, andi, ori)).
- 32-bit constants manipulate upper 2 bytes separately (load upper
immediate)
- Summary of MIPS addressing: register (add), immediate (addi), base or
displacement (lw), PC-relative (bne), pseudodirect (j)
CS385 Computer Architecture, Lecture
4
Reading: Patterson & Hennessy - Sections 3.1, 3.2, B1-6.
Topics: Computer arithmetic and ALU design: representing numbers,
arithmetic and logic operations
Lecture slides (PDF)
Lecture Notes
- Representing numbers: sign bit, one's complement, two's complement.
- Arithmetic: addition, subtraction, detecting overflow.
- Logical operations: shift, and, or.
- Basic ALU building components: and-gate, or-gate, inverter,
multiplexor.
- ALU for logical operations.
- ALU for add, and, or.
- Supporting subtraction
Exercises: Implement an overflow
detection unit using only the CarryIn and CarryOut bits of ALU-31
Tutorials and practice quizzes on twos complement numbers:
CS385 Computer Architecture, Lecture
5
Reading: Patterson & Hennessy - B1-6.
Topics: ALU
design: full adder, slt operation, HDL design, carry lookahead
Lecture slides (PDF)
Programs:
4-bit-adder.vl, more examples of Verilog programs, mips-alu.vl, ALU4-mixed.vl
Lecture Notes
- Implementation of a full adder:
- Carry out logic
- Result logic: using 'and', 'or' and inverter and using xor-gate.
- Supporting set on less-than (slt).
- Test for equality (needed for branching)
- Designing the ALU in Verilog
- Carry Lookahead
Exercises: Implement the 4-bit adder with
carry lookahead logic in Verilog using the structural specification approach
(gate-level modeling).
CS385 Computer Architecture, Lecture
6
Reading: Patterson & Hennessy - Sections 3.3, 3.5.
Topics: ALU design: multiplication, representing floating point
numbers
Lecture slides (PDF)
Lecture Notes
- Implementing multiplication:
- Using 64-bit adder;
- Using 32-bit adder for the upper 32-bit of the product;
- Avoiding the use of the multiplier register.
- Floating point numbers
- Scientific notation: (-1)^sign * significand * 2^exponent
- Range and precision (overflow and underflow).
- IEEE 754 floating point standard - allows integer comparison:
- normalized representation
- implicit leading 1
- exponent is biased: exponent in [0..0 (most negative), 1..1 (most
positive)]
- bits of the significand represent the fraction between 0 and 1.
- (-1)^S * (1 + s1*2^-1 + s2*2^-2 + ...) * 2^(exponent-bias)
- Problems with floating point arithmetic
Tutorials and
practice quizzes on floating point numbers:
CS385 Computer Architecture, Lecture
7
Reading: Sections 4.1 - 4.3
Topics: The Processor,
Building a Datapath
Lecture slides (PDF)
Programs: mips-regfile.vl, mips-r-type.vl, mips-r-type_addi.vl
Lecture Notes
- Abstract level implementation:
- Instruction memory
- Program counter
- Register file
- ALU
- Data memory
- Basic building elements
- Combinational logic
- State elements: D-lathes and D flip-flops
- Clocking methodology: edge triggered
- Fetching instructions and incrementing the program counter
- Register file and execution of R-type instructions
- Datapath for lw and sw instructions (add data memory and sign extend)
- Datapath for branch instructions
Demo:
CS385 Computer Architecture, Lecture
8
Reading: Patterson & Hennessy - Section 4.4
Topics:
Single-cycle control
Lecture slides (PDF)
Programs: mips-r-type.vl, mips-r-type_addi.vl, mips-simple.vl
Lecture Notes
- ALU control: mapping the opcode and function bits to the ALU control
inputs
- Designing the main control unit
- Operation of the Datapath (single-cycle implementation):
- R-type instructions
- Load (store) word
- Branching instructions
- Problems of the single-cycle implementation
CS385 Computer
Architecture, Lecture 11
Reading: Patterson & Hennessy - B.4,
Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topic: Using Hardware Description Language to Design and Simulate the
MIPS processor
- Behavior model of MIPS - single cycle implementation: mips-simple.vl
- Project version (progress report #2). Changes needed:
- Adjust the word size (ALU, registers, memory etc.) to 16 bit
- Modify datapath control to reflect the instruction set architecture
- Add addi and bne instructions
- Write a complete program for testing and test the
CPU
Exercises
- Implement the addi instruction by:
CS385 Computer Architecture, Lecture
13
Reading: Patterson & Hennessy - Section 4.5
Topic:
Introduction to Pipelining
Lecture slides (PDF)
Lecture Notes
- Pipelining by analogy (laundry example):
- Pipelining helps throughput of the entire workload
- Multiple tasks operating simultaneously and using different
resources
- Potential speedup = number of pipe stages
- The pipeline rate is limited by the slowest stage
- Unbalanced lengths of pipe stages reduces the speedup
- The time to "fill" the pipeline and the time "drain" it reduces the
speedup
- Stall for dependencies
- Five stages of the load MIPS instruction
- The pipelined datapath
- Single cycle, multiple cycle vs. pipeline
- Advantages of pipelined execution
- Problems with pipelining (pipeline hazards)
- Structural hazards
- Data hazards
- Control hazards
CS385 Computer Architecture, Lecture
14
Reading: Patterson & Hennessy - Section 4.5, 4.6
Topic: Solving pipeline hazards, Designing a pipelined processor
Lecture
slides I (PDF)
Lecture slides II (PDF)
Lecture Notes
- Structural hazards: single memory
- Control hazards:
- Stall: wait until decision is clear
- Predict: fixed prediction (e.g. fail), dynamic prediction (based on
history)
- Delayed brach (software solution):
add $4, $5,
$6 beq $1,
$2, $40
beq $1, $2, 40
==> add $4, $5, $6
lw $3,
300($0) lw
$3, 300($0)
- Data hazards (dependecies backwards in time):
- Forwarding (bypassing)
- Reordering code
lw $t0,
0($t1)
lw $t0, 0($t1)
lw $t2, 4($t1)
==> lw $t2, 4($t1)
sw $t2,
0($t1)
sw $t0, 4($t1)
sw $t0,
4($t1)
sw $t2, 0($t1)
- Designing a pipelined processor
CS385 Computer Architecture, Lecture
15
Reading: Patterson & Hennessy - Section 4.6, Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topic: Implementing pipeline datapath and control
Lecture slides (PDF)
Lecture Notes
- Splitting datapath into stages: using registers to store parts of the
instruction
- Transferring data forward and backward between the stages: lw example
- Corrected datapath: storing rd for the write back stage.
- Graphically representing pipelines: multiple-clock-cycle vs.
single-clock-cycle diagram
- Pipeline control:
- IF: no control signals to store for later stages (they are always
asserted).
- ID: no control signals to store for later stages (they are always
asserted).
- EX: set RegDst, ALUOp, ALUSrc
- MEM: set Branch, MemRead, MemWrite
- WB: set MemtoReg, RegWrite
- Datapath with control
- Example: running this code through the pipeline in 9
cycles.
lw $10, 20($1)
sub $11, $2,
$3
and $12, $4, $5
or $13, $6,
$7,
add $14, $8, $9
Exercises: Section 4.13
(http://booksite.elsevier.com/9780124077263/appendices.php),
pages 16-30.
CS385 Computer Architecture, Lecture
16
Reading: Patterson & Hennessy - Section 4.7, 4.8
Topic:
Implementing data and branch hazard control
Lecture slides (PDF)
Lecture Notes
- Detecting data dependencies
- EX/MEM.Rd = ID/EX.Rs
EX/MEM.Rd = ID/EX.Rt
- MEM/WB.Rd = ID/EX.Rs
MEM/WB.Rd = ID/EX.Rt
- Forwarding
- if (EX/MEM.RegWrite and EX/MEM.Rd = ID/EX.Rs)ForwardA =
10
if (EX/MEM.RegWrite and EX/MEM.Rd = ID/EX.Rt)ForwardB =
10
- if (MEM/WB.RegWrite and MEM/WB.Rd = ID/EX.Rs)ForwardA =
01
if (MEM/WB.RegWrite and MEM/WB.Rd = ID/EX.Rt)ForwardB =
01
- Data hazards and stalls
If (ID/EX.MemRead and
(ID/EX.Rt = IF/ID.Rs or
ID/EX.Rt = IF/ID.Rt))
stall the pipeline
- Branch hazards
- Reducing the delay of branches - move up the address calculation (move
the adder) and the branch decision (add XOR and AND gates)
- Assuming the branch will not be taken
- Flashing instructions in IF, ID, and EX stages, if the branch is
taken
- Advanced pipelining
- Superpipelining
- Superscalar
- Dynamic scheduling
CS385 Computer Architecture, Lecture
17
Reading: Patterson & Hennessy - Chapter 4, Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topics: Review of Datapath, Control and Pipelining, HDL implementation (mips-pipe.vl),
3-stage pipeline (mips-pipe3.vl)
Programs: mips-pipe.vl, mips-pipe3.vl
Lecture
slides (PDF)
Lecture Notes
Datapath
- Abstract level implementation:
- Instruction memory
- Program counter
- Register file
- ALU
- Data memory
- Basic building elements
- Combinational logic
- State elements: D-lathes and D flip-flops
- Clocking methodology: edge triggered
- Basic operations
- Instructions fetch
- Accessing register file and execution of R-type instructions
- Datapath for lw and sw instructions (add data memory and sign
extend)
- Datapath for branch instructions
Control
- ALU control: mapping the opcode and function bits to the ALU control
inputs
- Designing the main control unit
- Operation of the Datapath (single-cycle implementation):
- R-type instructions
- Load (store) word
- Branching instructions
- Problems of the single-cycle implementation
Pipelining
- Basic principles of pipelining
- Pipelining helps throughput of the entire workload
- Multiple tasks operating simultaneously and using different
resources
- Potential speedup = number of pipe stages
- The pipeline rate is limited by the slowest stage
- Unbalanced lengths of pipe stages reduces the speedup
- The time to "fill" the pipeline and the time "drain" it reduces the
speedup
- Stall for dependencies
- MIPS pipelining: the five stages of the lw instruction
- Problems with pipelining:
- Structural hazards
- Data hazards
- Control hazards
- Designing a pipelined processor
- Transferring data forward and backward between the stages
- Pipeline control
- Implementing data and branch hazard control
- Detecting data dependencies
- Forwarding
- Data hazards and stalls
- Branch hazards
- Advanced pipelining
Demo:
CS385 Computer Architecture, Lecture
18
Reading: Patterson & Hennessy - Section 5.1
Topic:
Memory Hierarchy
Lecture slides (PDF)
Lecture Notes
- Memory technologies and trends
- Impact on performance
- The need of hierarchical memory organization
- The principle of locality
- Memory hierarchy terminology
- Basics of RAM implementation
- SRAM: D-latches, three-state buffers, address decoders, two level
addressing
- DRAM: DRAM cell, refreshing
- Error detection and correction
CS385 Computer Architecture, Lecture
19
Reading: Patterson & Hennessy - Sections 5.1-5.3
Topic: The Basics of caches
Lecture slides (PDF)
Programs: cache.vl, cache2.vl
Lecture Notes
- Direct-mapped cache
- Accessing a cache
- Writing to the cache (write-through and write-back schemes)
- Handling cache misses
- Read miss: load the word from memory
- Write miss: write both to the cache and to the memory (using write
buffer)
- Example: DECStation 3100 cache
- Spatial locality caches: keeping consistency on write
- Main memory organization
CS385 Computer Architecture, Lecture
20
Reading: Patterson & Hennessy - Section 5.4
Topic:
Improving cache performance
Lecture slides (PDF)
Lecture Notes
- Measuring cache performance
- Stall clock cycles = Instructions * Miss rate * Miss penalty
- Example 1 (reducing CPI):
- 2% - instruction miss; 4% - data miss; 36% - lw/sw; 40 cyc - miss
penalty.
- 2 CPI => CPI_stall = 3.36, i.e. perfect cache is 1.68 times
faster.
- 1 CPI => CPI_stall = 2.36, i.e. perfect cache is 2.36 times
faster.
- Example 2: doubling clock rate => 80 cyc - miss penalty, CPI_stall =
4.75, Performance: 3.36/(4.75/2) = 1.41 faster with stalls, 2 times faster
without stalls.
- Conclusion: cache penalties increase as the machine becomes
faster
- Flexible placement of blocks in the cache
- Direct mapped: Cache index = (Block address) modulo (Cache size); no
search; small tag.
- Set associative: Cache index = (Block address) modulo (Number of sets in
cache); search the set; larger tag.
- Fully associative: Cache index is not determined; search the whole
cache; tag = address.
- Locating a block in the cache: N-way cache requires N comparators and
N-way multiplexor
- Choosing which block to replace: least recently used
- Multilevel caches
Exercises
- Write a sequence of memory references for which:
- the direct mapped cache performs better than the 2-way associative
cache;
- the 2-way associative cache performs better than the fully associative
cache.
- Exercises 5.1, 5.2, 5.3, 5.7.
CS385 Computer Architecture, Lecture
21
Reading: Patterson & Hennessy - Section 5.7
Topic:
Virtual Memory
Lecture slides (PDF)
Lecture Notes
- The need of VM
- Many programs (processes) can use a single memory
- Use a memory exceeding the size of the main memory
- VM organization and terminology: virtual address, physical address, page,
page offset, page fault, memory mapping (translation).
- Design decisions motivated by the very high cost of page faults:
- Large pages (4K-64K)
- Reducing page fault penalties: fully associative VM
- Software management of page faults
- Write-back instead of write-through
- Addressing pages:
- Page table, page table register
- Processes (active, inactive) and page tables
- Page faults
- Replacing pages: LRU, reference (use) bit
- Write-back scheme (dirty bit)
CS385 Computer Architecture, Lecture
22
Reading: Patterson & Hennessy - Section 5.8
Topic:
Virtual Memory optimization
Lecture slides (PDF)
Lecture Notes
- Optimizing address translation - Translation Lookaside Buffer (TLB):
- TLB miss
- Page fault
- TLB associativity
- MIPS R2000 (DECStation 3100) TLB
- Overall operation of a memory hierarchy
- Memory protection with VM
- Using exceptions for handling TLB misses and pages faults: using EPC and
Cause registers
- Summary of VM
CS385 Computer Architecture, Lecture
23
Reading: Patterson & Hennessy - Section 5.5, 5.6.
Topic: A general framework of memory hierarchies
Lecture slides (PDF)
Lecture Notes
- Associativity schemes
- Placing blocks
- Miss rates and cache sizes
- Finding blocks
- Why do we use full associativity and a separate lookup table (page table)
in VM
- Choosing a block to replace
- Writing blocks
- The sources of misses
- The challenge: reducing the miss rate has a negative effect on the overall
performance
- Pentium Pro and PowerPC 604
Exercises
5.7.1, 5.7.2, 5.7.3, 5.11
CS385 Computer Architecture, Lecture
24
Reading: Sections 6.1 - 6.5 (COD 4th Edition - see https://ccsu.blackboard.com/)
Topic: Interfacing
Processors and Peripherals - Buses
Lecture slides (PDF)
Lecture Notes
- Buses: lines, transactions, types
- Synchronous and asynchronous buses
- Handshaking protocol
- Bus access: master and slave
- Bus arbitration schemes
- Bus standards
CS385 Computer Architecture, Lecture 25
Reading: Section 6.6 - 6.8 (COD 4th Edition - see https://ccsu.blackboard.com/)
Topic: Interfacing I/O devices to Memory, CPU and OS
Lecture slides (PDF)
Lecture Notes
- The role of the operating system in interfacing I/O devices to Memory
- Controlling the I/O devices
- Memory mapped I/O
- Special I/O instructions
- Communicating with the processor
- Polling
- Interrupt-driven I/O
- Direct memory access (DMA)
- DMA and the memory system
- Designing an I/O system: latency and bandwidth constraints.
CS385 Computer Architecture, Lecture 26
Reading: Chapter 6, Section 2.11
Topic: Multiprocessors
Lecture
slides (PDF)
COD-Chapter7.pdf
Lecture Notes
- Amdahl's Law
- Basic approaches to sharing data and types of connectivity
- Programming multiprocessors
- Multiprocessors connected by a single bus
- A parallel program
- Multiprocessor cache coherency
- Implementing a multiprocessor cache coherency protocol
- Synchronization using coherency, locks, atomic swap
operation
again: addi $t0, $0, 1 # copy locked value
ll $t1, 0($s1)
# load linked
sc $t0, 0($s1) # store conditional
beq
$t0, $0, again # branch if store fails
add $s4, $0, $t1 # put
load value in $s4
CS385 Computer Architecture, Lecture 27
Reading: Chapter 6
Topic: Networks of
muiltiprocessors and clusters
Lecture slides (PDF)
COD-Chapter7.pdf
Lecture Notes
- Shared memory vs. multiple private memories
- Centralized memory vs. distributed memory
- Parallel programming by message passing
- Distributed memory communication
- Memory allocation
- Clusters and network topology
- Modern clusters:
Digital Design Review
Assignment
Log on to Blackboard to see and submit the assignment.
CS385 Assignment 1: Assembly Programming in MIPS
(maximum grade 10 points)
Log on to Blackboard to see and submit the
assignment.
CS385 Semester Project: Building a mini MIPS
machine (maximum grade 45 points including the presentation)
Log on to Blackboard to see and submit the project.
CS385 Midterm Test (maximal grade 20 points)
The midterm test will be available in Blackboard Learn. There will be 20
multiple choice and short answer questions that have to be answered in 90
minutes. The topics include:
- Number systems: binary, two's complement, floating point, conversions between decimal and two's complement and floating point.
- MIPS instruction set architecture and assembly programming
- Instruction format and meaning
- Accessing memory. Note the byte order within the memory word (big-endian, little-endian), see book page A-43
- Single-cycle datapath and control (see http://www.cs.ccsu.edu/~markov/ccsu_courses/385SL8.pdf/slide #5)
- MIPS implementation in Verilog HDL
- Implementing basic CPU components (sample question: what does this code implement?)
- Implementing and addressing instruction and data memory
- Pipelining
- Number of pipeline stages each instruction takes. Note the particular implementation of the branch to reduce the delay on branching (moving branch decision earlier). See book Section 4.8, Fig. 4.65 (http://www.cs.ccsu.edu/~markov/ccsu_courses/385SL16.pdf/last slide).
- Executing code on the pipelined MIPS (sample questions: How many cycles does this code take?, What is the ALU doing in cycle 9?)
- Identifying data dependencies and hazards in the code
- Resolving hazards by changing code (inserting nops or reordering instrutions)
CS385 Final Exam (maximal grade 25 points)
The
Final Exam will be available in Blackboard Learn at https://ccsu.blackboard.com/. There will be 25 multiple
choice and short answer questions that have to be answered in 2 hours. The
topics include:
- Processor stages and timing
- Pipelining
- Forwarding (checking conditions on pipeline registers)
- Hazard detection and stalling
- Solving branch hazards
- Memory System
- Temporal and spatial locality in programs
- Cache hits and misses
- Cache size
- Virtual memory
- Overall operation of a memory hierarchy
- Interfacing Processors and Peripherals
- Buses
- Bus transactions
- I/O system
- Multiprocessors
- Amdahl's Law
- Multiprocessor architectures
- Processor Synchronization
- Instruction and data streams
- Parallel programming