May 3-8: Final Exam to be taken online in Blackboard

May 8: Semester Project Final Report due

May 8, 1-3pm: Semester project presentation


CS385 – Computer Architecture

Spring-2017

Classes: MW 3:05pm - 4:20pm, Maria Sanford Hall 204
Instructor: Dr. Zdravko Markov, MS 30307, (860)-832-2711, http://www.cs.ccsu.edu/~markov/, e-mail: markovz at ccsu dot edu
Office hours: MW 10:45am - 12:15pm, TR 2:00pm - 3:30pm, or by appointment

Catalog description: The architecture of the computer is explored by studying its various levels: physical level, operating-system level, conventional machine level and higher levels. An introduction to microprogramming and computer networking is provided.

Course Prerequisites: CS 354

Prerequisites by topic

Course description

The course provides a comprehensive coverage of computer architecture. It discusses the main components of the computer and the basic principles of its operation. It demonstrates the relationship between the software and the hardware and focuses on the foundational concepts that are the basis for current computer design. The course is based on the MIPS processor, a simple clean RISC processor whose architecture is easy to learn and understand. The major topics covered in the course are the following:
  1. MIPS instruction set
  2. Computer arithmetic and ALU design
  3. Datapath and control
  4. Using Hardware Description Language to design and simulate the CPU
  5. Pipelining
  6. Memory hierarchy, caches and virtual memory
  7. Interfacing CPU and peripherals, buses
  8. Multiprocessors, networks of multiprocessors, parallel programming
  9. Performance issues

Course Learning Outcomes (CLO)

  1. Understand the fundamentals of different instruction set architectures and their relationship to the CPU design.
  2. Understand the principles and the implementation of computer arithmetic.
  3. Understand the operation of modern CPUs including pipelining, memory systems and busses.
  4. Understand the principles of operation of multiprocessor systems and parallel programming.
  5. Design and emulate a single cycle or pipelined CPU by given specifications using Hardware Description Language (HDL).
  6. Work in teams to design and implement CPUs.
  7. Write reports and make presentations of computer architecture projects.
CS 385 is part of the core CS program and is designed in accordance with the Program Educational Objectives (PEO) and the Student Outcomes (SO) as specified in the Department Mission Statement. The Course Learning Outcomes are used to assess the following Student Outcomes (SO): The CS 385 Course Learning Outcomes also support the following Student Outcomes (SO) part of the corresponding Program Educational Objectives (PEO):

Required textbook

Required software

  1. Icarus Verilog: HDL compiler and simulator, available from the book companion website or at http://bleyer.org/icarus/. Note about installation: don't use folder names that include spaces (like Program Files). Read book sections B.4 and 5.8 for using HDL.
  2. SPIM simulator: A free software simulator for running MIPS R2000 assembly language programs available for Windows and other platforms.
  3. Other simulators that may be used for drawing logic diagrams and experimenting with small circuirs (note that the semester project should be done with Verilog):
Semester project: There will be a semester project to build a simplified MIPS machine. The projects will be done in teams of 2-3 people each and will require three progress reports, a final report and a presentation. The machine must be implemented in HDL Verilog, tested with a sample MIPS program and properly documented. The progress and the final reports must be submitted in Blackboard Learn at https://ccsu.blackboard.com/.

Class Participation: Active participation in class is expected of all students. Regular attendance is also expected. If you must miss a class, try to inform the instructor of this in advance.In case of missed classes and work due to plausible reasons (such as illness or accidents) limitted assistance will be offered. Unexcused absences will result in the student being totally responsible for the make-up process.

Honesty policy: The CCSU honor code for Academic Integrity is in effect in this class. It is expected that all students will conduct themselves in an honest manner and NEVER claim work which is not their own. Violating this policy will result in a substantial grade penalty, and may lead to expulsion from the University. You may find it online at http://web.ccsu.edu/academicintegrity/. Please read it carefully.

Grading: Grading will be based on one programming assignment (10%), a midterm test (20%), a final exam (25%) and a semester project (45%, including progress reports, the final documentation, and the presentation). The letter grades will be calculated according to the following table:
 
A A- B+ B B- C+ C C- D+ D D- F
95-100 90-94 87-89 84-86 80-83 77-79 74-76 70-73 67-69 64-66 60-63 0-59

Unexcused late submission policy: Submissions made more than two days after the due date will be graded one letter grade down. Submissions made more than a week late will receive two letter grades down. No submissions will be accepted more than two weeks after the due date.

Students with disabilities: Students who believe they need course accommodations based on the impact of a disability, medical condition, or emergency should contact me privately to discuss their specific needs. I will need a copy of the accommodation letter from Student Disability Services in order to arrange class accommodations. Contact Student Disability Services, Willard Hall, 101-04 if you are not already registered with them. Student Disability Services maintains the confidential documentation of your disability and assists you in coordinating reasonable accommodations with your faculty.


Tentative schedule of classes and assignments

Note: Dates for classes, assignments and tests may change (see also University Calendar). The lecture notes may also be updated. Check the schedule and the class pages regularly for updates!
  1. Jan 18: Introduction: Computer Architecture = Instruction Set Architecture + Machine Organization
  2. Jan 23: MIPS Instructions: arithmetic, registers, memory, fecth&execute cycle
  3. Jan 25: MIPS Instructions: control and addressing modes
  4. Jan 30: Submit Digital Design Review Assignment (extra credit)
  5. Jan 30: Computer arithmetic and ALU design: representing numbers, arithmetic and logic operations
  6. Feb 1: ALU design: full adder, slt operation, HDL design, carry lookahead
  7. Feb 6: Assignment 1 due (10 pts.)
  8. Feb 6: ALU design: multiplication, representing floating point numbers
  9. Feb 8: The Processor: Building a datapath
  10. Feb 13: The Processor: Control (single cycle approach)
  11. Feb 15: Using a Hardware Description Language to Design and Simulate the MIPS processor
  12. Feb 22: Introduction to pipelining
  13. Feb 27: Solving pipeline hazards
  14. March 1: Progress Report #1 due (10 pts.): A simpilfied single-cycle datapath capable of executing the addi instruction and all R-type instructions. See Semester Project for details.
  15. March 1: Implementing pipeline datapath and control
  16. March 6: Implementing data and branch hazards control
  17. March 8: No class (SIGCSE-2017)
  18. March 20: Review of Datapath, Control and Pipelining, HDL implementation
  19. March 20-22: Midterm Test (20 pts.) to be taken online in Blackboard
  20. March 22: Progress Report #2 due (10 pts.): Complete single-cycle datapath. See Semester Project for details.
  21. March 22: Implementing a 3-stage pipeline in HDL (mips-pipe3.vl)
  22. March 27: Memory hierarchy
  23. March 29: The Basics of caches
  24. April 3, 5: Improving cache performance
  25. April 10: Virtual Memory basics
  26. April 10: Progress Report #3 due (10 pts.): 3-stage pipelined datapath for addi and R-type instructions. See Semester Project for details.
  27. April 12: Virtual Memory optimization
  28. April 17: A general framework of memory hierarchies
  29. April 19: Interfacing Processors and Peripherals - Buses
  30. April 24: Interfacing I/O devices to Memory, CPU and OS
  31. April 26: Multiprocessors
  32. May 1: Networks of muiltiprocessors
  33. May 3: Review of memory System, Buses, I/O and Multiprocessors
  34. May 3-8: Final Exam (25 pts.) to be taken online in Blackboard
  35. May 8: Semester Project Final Report due (10 pts.)
  36. May 8, 1-3pm: Semester project presentation (5 pts.)

CS385 – Computer Architecture, Lecture 1

Reading: Chapter 1
Topics: Introduction, Computer Architecture = Instruction Set Architecture + Machine Organization.
Lecture slides (PDF)

Lecture Notes

  1. Levels of Abstraction
  2. Computer Architecture = Instruction Set Architecture + Machine Organization
  3. Instruction Set – The Software Hardware Interface
  4. Levels of Computer Architecture in More Depth
  5. Basic Components of a Computer
  6. Computer Organization
  7. Performance (Lecture slides (PDF)

CS385 – Computer Architecture, Lecture 2

Reading: Patterson & Hennessy - Sections 2.1 - 2.3, 2.5 - 2.7, 2.10, 2.13, 2.16 - 2.20, A.9, Tutorials/Getting Started with PCSpim (book companion website).
Topics: MIPS instructions, arithmetic, registers, memory, fecth& execute cycle, SPIM simulator
Lecture slides (PDF)

Lecture Notes

  1. Design goal: maximize performance and minimize cost. Primitive (low level) and very restrictive instructions (fixed number and type of operands).
  2. Design principles:
  3. MIPS arithmetic: 3 operands, fixed order, registers only.
  4. Using only registers: R-type instructions.
  5. Registers: 32-bits long, conventions.
  6. Memory organization: words and byte addressing.
  7. Data transfer (load and store) instructions. Example: accessing array elements.
  8. Translating C code into MIPS instructions – the swap example.
  9. Machine Language: instruction format, I-type (Immediate) format for data transfer
  10. Stored program concept: programs in memory, fetch & execute cycle
Exercises: Load this program in the SPIM simulator and analyze the format of the insturctions. Run the program with different values of X and Y and trace the execution in step mode.

CS385 – Computer Architecture, Lecture 3

Reading: Patterson & Hennessy - Chapter 2, Appendix A
Topics: MIPS Instructions: control and addressing modes
Lecture slides (PDF)

Lecture Notes

  1. Implementing the C code for if in MIPS: conditional branch.
  2. Implementing the C code for if–else in MIPS: unconditional branch
  3. Simple for loop
  4. Check for less-than: building a pseudoinstuction for branch if less-than.
  5. Addressing in branch instructions: PC-relative and pseudodirect.
  6. Constants: use of immediate addressing (constants as operands – addi, slti, andi, ori)).
  7. 32-bit constants – manipulate upper 2 bytes separately (load upper immediate)
  8. Summary of MIPS addressing: register (add), immediate (addi), base or displacement (lw), PC-relative (bne), pseudodirect (j)

CS385 – Computer Architecture, Lecture 4

Reading: Patterson & Hennessy - Sections 3.1, 3.2, B1-6.
Topics: Computer arithmetic and ALU design: representing numbers, arithmetic and logic operations
Lecture slides (PDF)

Lecture Notes

  1. Representing numbers: sign bit, one's complement, two's complement.
  2. Arithmetic: addition, subtraction, detecting overflow.
  3. Logical operations: shift, and, or.
  4. Basic ALU building components: and-gate, or-gate, inverter, multiplexor.
  5. ALU for logical operations.
  6. ALU for add, and, or.
  7. Supporting subtraction
Exercises: Implement an overflow detection unit using only the CarryIn and CarryOut bits of ALU-31

Tutorials and practice quizzes on two’s complement numbers:


CS385 – Computer Architecture, Lecture 5

Reading: Patterson & Hennessy - B1-6.
Topics: ALU design: full adder, slt operation, HDL design, carry lookahead
Lecture slides (PDF)
Programs: 2-1-mux.vl, 4-bit-adder.vl, more examples of Verilog programs, mips-alu.vl
Lecture Notes
  1. Implementation of a full adder:
  2. Supporting set on less-than (slt).
  3. Test for equality (needed for branching)
  4. Designing the ALU in Verilog
  5. Carry Lookahead
Exercises: Implement the 4-bit adder with carry lookahead logic in Verilog using the structural specification approach (gate-level modeling).

CS385 – Computer Architecture, Lecture 6

Reading: Patterson & Hennessy - Sections 3.3, 3.5.
Topics: ALU design: multiplication, representing floating point numbers
Lecture slides (PDF)

Lecture Notes

  1. Implementing multiplication:
  2. Floating point numbers
Tutorials and practice quizzes on floating point numbers:

CS385 – Computer Architecture, Lecture 7

Reading: Sections 4.1 - 4.3
Topics: The Processor, Building a Datapath
Lecture slides (PDF)
Programs: mips-regfile.vl, mips-r-type.vl, mips-r-type_addi.vl

Lecture Notes

  1. Abstract level implementation:
  2. Basic building elements
  3. Fetching instructions and incrementing the program counter
  4. Register file and execution of R-type instructions
  5. Datapath for lw and sw instructions (add data memory and sign extend)
  6. Datapath for branch instructions
Demo:

CS385 – Computer Architecture, Lecture 8

Reading: Patterson & Hennessy - Section 4.4
Topics: Single-cycle control
Lecture slides (PDF)
Programs: mips-r-type.vl, mips-r-type_addi.vl, mips-simple.vl

Lecture Notes

  1. ALU control: mapping the opcode and function bits to the ALU control inputs
  2. Designing the main control unit
  3. Operation of the Datapath (single-cycle implementation):
  4. Problems of the single-cycle implementation

CS385 – Computer Architecture, Lecture 11

Reading: Patterson & Hennessy - B.4, Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topic: Using Hardware Description Language to Design and Simulate the MIPS processor
  1. Behavior model of MIPS - single cycle implementation: mips-simple.vl
  2. Project version (progress report #2). Changes needed:
Exercises


CS385 – Computer Architecture, Lecture 13

Reading: Patterson & Hennessy - Section 4.5
Topic: Introduction to Pipelining
Lecture slides (PDF)

Lecture Notes

  1. Pipelining by analogy (laundry example):
  2. Five stages of the load MIPS instruction
  3. The pipelined datapath
  4. Single cycle, multiple cycle vs. pipeline
  5. Advantages of pipelined execution
  6. Problems with pipelining (pipeline hazards)

CS385 – Computer Architecture, Lecture 14

Reading: Patterson & Hennessy - Section 4.5, 4.6
Topic: Solving pipeline hazards, Designing a pipelined processor
Lecture slides I (PDF)
Lecture slides II (PDF)

Lecture Notes

  1. Structural hazards: single memory
  2. Control hazards:
  3. add $4, $5, $6            beq $1, $2, $40
    beq $1, $2, 40     ==>    add $4, $5, $6
    lw $3, 300($0)            lw $3, 300($0)
  4. Data hazards (dependecies backwards in time):
  5. lw $t0, 0($t1)               lw $t0, 0($t1)
    lw $t2, 4($t1)      ==>      lw $t2, 4($t1)
    sw $t2, 0($t1)               sw $t0, 4($t1)
    sw $t0, 4($t1)               sw $t2, 0($t1)
  6. Designing a pipelined processor

CS385 – Computer Architecture, Lecture 15

Reading: Patterson & Hennessy - Section 4.6, Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topic: Implementing pipeline datapath and control
Lecture slides (PDF)

Lecture Notes

  1. Splitting datapath into stages: using registers to store parts of the instruction
  2. Transferring data forward and backward between the stages: lw example
  3. Corrected datapath: storing rd for the write back stage.
  4. Graphically representing pipelines: multiple-clock-cycle vs. single-clock-cycle diagram
  5. Pipeline control:
  6. Datapath with control
  7. Example: running this code through the pipeline in 9 cycles.

  8. lw   $10, 20($1)
    sub  $11, $2, $3
    and  $12, $4, $5
    or   $13, $6, $7,
    add  $14, $8, $9
Exercises: Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php), pages 16-30.


CS385 – Computer Architecture, Lecture 16

Reading: Patterson & Hennessy - Section 4.7, 4.8
Topic: Implementing data and branch hazard control
Lecture slides (PDF)

Lecture Notes

  1. Detecting data dependencies
  2. Forwarding
  3. Data hazards and stalls

  4. If (ID/EX.MemRead and
        (ID/EX.Rt = IF/ID.Rs or
         ID/EX.Rt = IF/ID.Rt))
       stall the pipeline
  5. Branch hazards
  6. Advanced pipelining


CS385 – Computer Architecture, Lecture 17

Reading: Patterson & Hennessy - Chapter 4, Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topic: Review of Datapath, Control and Pipelining, HDL implementation
Programs: mips-pipe.vl

Lecture slides (PDF)

Lecture Notes

Datapath

  1. Abstract level implementation:
  2. Basic building elements
  3. Basic operations

Control

  1. ALU control: mapping the opcode and function bits to the ALU control inputs
  2. Designing the main control unit
  3. Operation of the Datapath (single-cycle implementation):
  4. Problems of the single-cycle implementation
  5. Multicycle Approach to Processor Control
  6. Basic principles of the Multicycle Approach to Processor Control
  7. Execution steps:
  8. Finite state machine control
  9. Microprogramming

Pipelining

  1. Basic principles of pipelining
  2. MIPS pipelining: the five stages of the lw instruction
  3. Problems with pipelining:
  4. Designing a pipelined processor
  5. Transferring data forward and backward between the stages
  6. Pipeline control
  7. Implementing data and branch hazard control
  8. Advanced pipelining
HDL implementation: mips-pipe.vl

Demo:


CS385 – Computer Architecture, Lecture 18

Reading: Patterson & Hennessy - Section 5.1
Topic: Memory Hierarchy
Lecture slides (PDF)

Lecture Notes

  1. Memory technologies and trends
  2. Impact on performance
  3. The need of hierarchical memory organization
  4. The principle of locality
  5. Memory hierarchy terminology
  6. Basics of RAM implementation

CS385 – Computer Architecture, Lecture 19

Reading: Patterson & Hennessy - Sections 5.1-5.3
Topic: The Basics of caches
Lecture slides (PDF)
Programs: cache.vl, cache2.vl

Lecture Notes

  1. Direct-mapped cache
  2. Accessing a cache
  3. Writing to the cache (write-through and write-back schemes)
  4. Handling cache misses
  5. Example: DECStation 3100 cache
  6. Spatial locality caches: keeping consistency on write
  7. Main memory organization

CS385 – Computer Architecture, Lecture 20

Reading: Patterson & Hennessy - Section 5.4
Topic: Improving cache performance
Lecture slides (PDF)

Lecture Notes

  1. Measuring cache performance
  2. Flexible placement of blocks in the cache
  3. Locating a block in the cache: N-way cache requires N comparators and N-way multiplexor
  4. Choosing which block to replace: least recently used
  5. Multilevel caches
Exercises

CS385 – Computer Architecture, Lecture 21

Reading: Patterson & Hennessy - Section 5.7
Topic: Virtual Memory
Lecture slides (PDF)

Lecture Notes

  1. The need of VM
  2. VM organization and terminology: virtual address, physical address, page, page offset, page fault, memory mapping (translation).
  3. Design decisions motivated by the very high cost of page faults:
  4. Addressing pages:

CS385 – Computer Architecture, Lecture 22

Reading: Patterson & Hennessy - Section 5.8
Topic: Virtual Memory optimization
Lecture slides (PDF)

Lecture Notes

  1. Optimizing address translation - Translation Lookaside Buffer (TLB):
  2. MIPS R2000 (DECStation 3100) TLB
  3. Overall operation of a memory hierarchy
  4. Memory protection with VM
  5. Using exceptions for handling TLB misses and pages faults: using EPC and Cause registers
  6. Summary of VM

CS385 – Computer Architecture, Lecture 23

Reading: Patterson & Hennessy - Section 5.5, 5.6.
Topic: A general framework of memory hierarchies
Lecture slides (PDF)

Lecture Notes

  1. Associativity schemes
  2. Placing blocks
  3. Miss rates and cache sizes
  4. Finding blocks
  5. Why do we use full associativity and a separate lookup table (page table) in VM
  6. Choosing a block to replace
  7. Writing blocks
  8. The sources of misses
  9. The challenge: reducing the miss rate has a negative effect on the overall performance
  10. Pentium Pro and PowerPC 604

Exercises

5.7.1, 5.7.2, 5.7.3, 5.11

CS385 – Computer Architecture, Lecture 24

Reading: Sections 6.1 - 6.5 (COD 4th Edition - see https://ccsu.blackboard.com/)
Topic: Interfacing Processors and Peripherals - Buses
Lecture slides (PDF)

Lecture Notes

  1. Buses: lines, transactions, types
  2. Synchronous and asynchronous buses
  3. Handshaking protocol
  4. Bus access: master and slave
  5. Bus arbitration schemes
  6. Bus standards

CS385 – Computer Architecture, Lecture 25

Reading: Section 6.6 - 6.8 (COD 4th Edition - see https://ccsu.blackboard.com/)
Topic: Interfacing I/O devices to Memory, CPU and OS
Lecture slides (PDF)

Lecture Notes

  1. The role of the operating system in interfacing I/O devices to Memory
  2. Controlling the I/O devices
  3. Communicating with the processor
  4. Direct memory access (DMA)
  5. DMA and the memory system
  6. Designing an I/O system: latency and bandwidth constraints.

CS385 – Computer Architecture, Lecture 26

Reading: Chapter 6, Section 2.11
Topic: Multiprocessors
Lecture slides (PDF)
COD-Chapter7.pdf

Lecture Notes

  1. Amdahl's Law
  2. Basic approaches to sharing data and types of connectivity
  3. Programming multiprocessors
  4. Multiprocessors connected by a single bus
  5. A parallel program
  6. Multiprocessor cache coherency
  7. Implementing a multiprocessor cache coherency protocol
  8. Synchronization using coherency, locks, atomic swap operation
  9. again: addi $t0, $0, 1 # copy locked value
    ll $t1, 0($s1) # load linked
    sc $t0, 0($s1) # store conditional
    beq $t0, $0, again # branch if store fails
    add $s4, $0, $t1 # put load value in $s4


CS385 – Computer Architecture, Lecture 27

Reading: Chapter 6
Topic: Networks of muiltiprocessors and clusters
Lecture slides (PDF)
COD-Chapter7.pdf

Lecture Notes

  1. Shared memory vs. multiple private memories
  2. Centralized memory vs. distributed memory
  3. Parallel programming by message passing
  4. Distributed memory communication
  5. Memory allocation
  6. Clusters and network topology
  7. Modern clusters:

Digital Design Review Assignment

Log on to Blackboard to see and submit the assignment.

CS385 Assignment 1: Assembly Programming in MIPS (maximum grade 10 points)

Log on to Blackboard to see and submit the assignment.

CS385 Semester Project: Building a mini MIPS machine (maximum grade 45 points including the presentation)

Log on to Blackboard to see and submit the project.

CS385 Midterm Test (maximal grade 20 points)

The midterm test will be available in Blackboard Learn. There will be 20 multiple choice and short answer questions that have to be answered in 90 minutes. The topics include: number systems (binary, two's complement, floating point), MIPS instruction set architecture and assembly programming, single-cycle datapath and control, MIPS implementation in Verilog HDL, basics of pipelining, and solving pipeline hazards. To take the test log on to Blackboard Learn at https://ccsu.blackboard.com/.

CS385 Final Exam (maximal grade 25 points)

The Final Exam will be available in Blackboard Learn at https://ccsu.blackboard.com/. There will be 25 multiple choice and short answer questions that have to be answered in 2 hours. The topics include: