CS385 - Computer Architecture

Spring-2024

Classes: TR 9:25am - 10:40am, Maria Sanford Hall 204
Instructor: Dr. Zdravko Markov, MS 30307, (860)-832-2711, http://www.cs.ccsu.edu/~markov/, e-mail: markovz at ccsu dot edu
Office hours: MW 4:30pm-6:00 pm, TR 10:45am-12:00pm, in person. Book an appointment here.

Catalog description: The architecture of the computer is explored by studying its various levels: physical level, operating-system level, conventional machine level and higher levels. An introduction to microprogramming and computer networking is provided.

Course Prerequisites: CS 354

Prerequisites by topic

Course description

The course provides a comprehensive coverage of computer architecture. It discusses the main components of the computer and the basic principles of its operation. It demonstrates the relationship between the software and the hardware and focuses on the foundational concepts that are the basis for current computer design. The course is based on the MIPS processor, a simple clean RISC processor whose architecture is easy to learn and understand. The major topics covered in the course are the following:
  1. MIPS instruction set
  2. Computer arithmetic and ALU design
  3. Datapath and control
  4. Using Hardware Description Language to design and simulate the CPU
  5. Pipelining
  6. Memory hierarchy, caches and virtual memory
  7. Interfacing CPU and peripherals, buses
  8. Multiprocessors, networks of multiprocessors, parallel programming
  9. Performance issues

Course Learning Outcomes (CLO)

  1. Understand the fundamentals of different instruction set architectures and their relationship to the CPU design.
  2. Understand the principles and the implementation of computer arithmetic.
  3. Understand the operation of modern CPUs including pipelining, memory systems and busses.
  4. Understand the principles of operation of multiprocessor systems and parallel programming.
  5. Design and emulate a single cycle or pipelined CPU by given specifications using Hardware Description Language (HDL).
  6. Work in teams to design and implement CPUs.
  7. Write reports and make presentations of computer architecture projects.
The CS 385 Course Learning Outcomes support the following Student Outcomes (SO):

Required textbook

Required software

  1. Icarus Verilog: HDL compiler and simulator available for download from http://bleyer.org/icarus/ and online at https://www.jdoodle.com/execute-verilog-online.
  2. SPIM simulator: A software simulator for running MIPS32 programs available for Windows and other platforms.
Semester project: There will be a semester project to build a simplified MIPS machine. The projects will be done in teams of 2-3 people and will require four progress reports and a presentation. The machine must be implemented in HDL Verilog, tested with a sample MIPS program and properly documented. The progress reports must be submitted via Blackboard at https://ccsu.blackboard.com/.

Class Participation: Active participation in class is expected of all students. Regular attendance is also expected. If you must miss a class, try to inform the instructor of this in advance. In case of missed classes and work due to plausible reasons (such as illness or accidents) limitted assistance will be offered. Unexcused absences will result in the student being totally responsible for the make-up process.

Course Expectations for Out-of-Class Work: To succeed in this 3-credit class, it is expected that you commit a total of 12 hours per week to master the course material. This includes 2.5 hours of lecture time and an additional 9.5 hours dedicated to independent study and coursework. This time commitment aligns with the expectations set by the Computer Science department for major courses and adheres to university policies. Recognizing that dedicating this amount of time outside the classroom is a significant commitment, it is nevertheless necessary for success. Please plan your course load accordingly.

Honesty policy: The CCSU honor code for Academic Integrity is in effect in this class. It is expected that all students will conduct themselves in an honest manner and NEVER claim work which is not their own. Violating this policy will result in a substantial grade penalty, and may lead to expulsion from the University. You may find it online at http://web.ccsu.edu/academicintegrity/. Please read it carefully.

Grading: Grading will be based on one programming assignment (10%), a midterm test (20%), a final exam (25%) and a semester project (45%, including progress reports, the final documentation, and the presentation). The letter grades will be calculated according to the following table:
 
A A- B+ B B- C+ C C- D+ D D- F
95-100 90-94 87-89 84-86 80-83 77-79 74-76 70-73 67-69 64-66 60-63 0-59

Unexcused late submission policy: Submissions made more than two days after the due date will be graded one letter grade down. Submissions made more than a week late will receive two letter grades down. No submissions will be accepted more than two weeks after the due date.

Students with disabilities: Students who believe they need course accommodations based on the impact of a disability, medical condition, or emergency should contact me privately to discuss their specific needs. I will need a copy of the accommodation letter from Accessibility Services in order to arrange class accommodations. Contact Office of Accessibility Services, Willard-DiLoreto Hall, Suite W201 if you are not already registered with them. Office of Accessibility Services maintains the confidential documentation of your disability and assists you in coordinating reasonable accommodations with your faculty.


Tentative schedule of classes and assignments

Note: Dates for classes, assignments and tests may change (see also University Calendar). The lecture notes may also be updated. Check the schedule and the class pages regularly for updates!
  1. Jan 18: Introduction: Computer Architecture = Instruction Set Architecture + Machine Organization
  2. Jan 23: Review of HDL (Figure 6.5, behavioral_serial_adder.vl, Digital Design Review Assignment)
  3. Jan 25: MIPS Instructions: arithmetic, registers, memory, fecth&execute cycle
  4. Jan 30: MIPS Instructions: control and addressing modes
  5. Feb 1: Computer arithmetic and ALU design: representing numbers, arithmetic and logic operations
  6. Feb 1: Submit Digital Design Review Assignment (optional, for extra credit)
  7. Feb 6: ALU design: multiplication, representing floating point numbers
  8. Feb 8: ALU design: full adder, slt operation, HDL design
  9. Feb 13: Class canceled due to snow storm
  10. Feb 13: Assignment 1 due (10 pts.)
  11. Feb 15: The Processor: Building a datapath
  12. Feb 15: Semester Project is posted. Form teams for the Semester Project. Email me the team members.
  13. Feb 20: The Processor: Control (single cycle approach)
  14. Feb 22: Using Hardware Description Language to Design and Simulate the MIPS processor. Review of Semester Project Report #1.
  15. Feb 27: Introduction to pipelining
  16. Feb 29: Progress Report #1 due (10 pts.): A simpilfied single-cycle datapath capable of executing immediate and R-type instructions. See Semester Project for details.
  17. Feb 29: Solving pipeline hazards
  18. March 5: Implementing pipeline datapath and control
  19. March 7: Implementing data and branch hazards control
  20. March 19: Review of Datapath, Control and Pipelining. Review of Progress Report 2.
  21. March 21: Review for Midterm Test
  22. March 26: Midterm Test (20 pts.)
  23. March 26: Progress Report #2 due (10 pts.): Complete single-cycle datapath. See Semester Project for details.
  24. March 28: Implementing a 3-stage pipeline in HDL (mips-pipe3.png, mips-pipe3.vl). Progress Report #3 posted.
  25. April 2: Memory hierarchy, The Basics of caches
  26. April 4: Improving cache performance
  27. April 9: Virtual Memory
  28. April 11: Virtual Memory optimization
  29. April 11: Progress Report #3 due (10 pts.): 3-stage pipelined datapath for immediate and R-type instructions. See Semester Project for details.
  30. April 16: Review of Final Report (complete 5-stage pipeline, mips-pipe.vl)
  31. April 18: A general framework of memory hierarchies
  32. April 23: Multiprocessors
  33. April 25: Networks of muiltiprocessors
  34. April 30: Final Project Report and Presentation Slides due (10 pts.)
  35. April 30: Semester Project Presentations, Review for Final Exam
  36. May 2: Semester Project Presentations, Review for Final Exam
  37. May 7, 8:00 AM - 10:00 AM Final Exam (25 pts.)

CS385 Computer Architecture, Lecture 1

Reading: Patterson & Hennessy - Chapter 1
Topics: Introduction, Computer Architecture = Instruction Set Architecture + Machine Organization, Performance.
Lecture slides

Lecture Notes

  1. Levels of Abstraction
  2. Computer Architecture = Instruction Set Architecture + Machine Organization
  3. Instruction Set The Software Hardware Interface
  4. Levels of Computer Architecture in More Depth
  5. Basic Components of a Computer
  6. Computer Organization
  7. Performance

CS385 Computer Architecture, Lecture 2

Reading: Patterson & Hennessy - Sections 2.1 - 2.3, 2.5, 2.6, 2.10, 2.13, A.9, A.10, Introduction to MIPS Assembly Language.
Topics: MIPS instructions, arithmetic, registers, memory, fecth& execute cycle, SPIM simulator
Lecture slides

Lecture Notes

  1. Design goal: maximize performance and minimize cost. Primitive (low level) and very restrictive instructions (fixed number and type of operands).
  2. Design principles:
  3. MIPS arithmetic: 3 operands, fixed order, registers only.
  4. Using only registers: R-type instructions.
  5. Registers: 32-bits long, conventions.
  6. Memory organization: words and byte addressing.
  7. Data transfer (load and store) instructions. Example: accessing array elements.
  8. Translating C code into MIPS instructions the swap example.
  9. Machine Language: instruction format, I-type (Immediate) format for data transfer
  10. Stored program concept (Von Neumann Architecture): programs in memory, fetch & execute cycle
Exercises: Load this program in the SPIM simulator and analyze the format of the insturctions. Run the program with different values of X and Y and trace the execution in step mode.

CS385 Computer Architecture, Lecture 3

Reading: Patterson & Hennessy - Sections 2.7, 2.10, A.9, A.10
Topics: MIPS Instructions: control and addressing modes
Lecture slides
Book slides

Lecture Notes

  1. Implementing the C code for if in MIPS: conditional branch.
  2. Implementing the C code for if else in MIPS: unconditional branch
  3. Simple for loop
  4. Check for less-than: building a pseudoinstuction for branch if less-than.
  5. Addressing in branch instructions: PC-relative and pseudodirect.
  6. Constants: use of immediate addressing (constants as operands addi, slti, andi, ori)).
  7. 32-bit constants manipulate upper 2 bytes separately (load upper immediate)
  8. Summary of MIPS addressing: register (add), immediate (addi), base or displacement (lw), PC-relative (bne), pseudodirect (j)
Exercises: Load this program in the SPIM simulator and run it with and without pseudo instructions. See how the compiler translates pseudo instructions into machine instructions.

CS385 Computer Architecture, Lecture 4

Reading: Patterson & Hennessy - Sections 2.4, 3.2, B.5.
Topics: Computer arithmetic and ALU design: representing numbers, arithmetic and logic operations
Lecture slides

Lecture Notes

  1. Representing numbers: sign bit, one's complement, two's complement.
  2. Arithmetic: addition, subtraction, detecting overflow.
  3. Logical operations: shift, and, or.
  4. Basic ALU building components: and-gate, or-gate, inverter, multiplexor.
  5. ALU for logical operations.
  6. ALU for add, and, or.
  7. Supporting subtraction

Tutorials and practice quizzes on two s complement numbers


CS385 Computer Architecture, Lecture 5

Reading: Patterson & Hennessy - Section B.5.
Topics: ALU design: full adder, slt operation, HDL design
Lecture slides
Programs: 4-bit-adder.vl, mips-alu.vl, ALU4-mixed.vl

Lecture Notes

  1. Implementation of a full adder:
  2. Supporting set on less-than (slt).
  3. Test for equality (needed for branching)
  4. Designing the ALU in Verilog
  5. Carry Lookahead

CS385 Computer Architecture, Lecture 6

Reading: Patterson & Hennessy - Sections 3.3, 3.5.
Topics: ALU design: multiplication, representing floating point numbers
Lecture slides

Lecture Notes

  1. Implementing multiplication:
  2. Floating point numbers
Tutorials and practice quizzes on floating point numbers:

CS385 Computer Architecture, Lecture 7

Reading: Sections 4.1 - 4.3, B.8.
Topics: The Processor, Building a Datapath
Lecture slides
Programs: mips-regfile.vl, mips-r-type_addi.vl

Lecture Notes

  1. Abstract level implementation:
  2. Basic building elements
  3. Fetching instructions and incrementing the program counter
  4. Register file and execution of R-type instructions
  5. Datapath for lw and sw instructions (add data memory and sign extend)
  6. Datapath for branch instructions
Exercises: Run MIPS single cycle animation in Blackboard.

CS385 Computer Architecture, Lecture 8

Reading: Patterson & Hennessy - Section 4.4
Topics: Single-cycle control
Lecture slides
Programs: mips-r-type_addi.vl, mips-simple.vl

Lecture Notes

  1. ALU control: mapping the opcode and function bits to the ALU control inputs
  2. Designing the main control unit
  3. Operation of the Datapath (single-cycle implementation):
  4. Problems of the single-cycle implementation

CS385 Computer Architecture, Lecture 11

Reading: Patterson & Hennessy - B.4, Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topic: Using Hardware Description Language to Design and Simulate the MIPS processor


CS385 Computer Architecture, Lecture 13

Reading: Patterson & Hennessy - Section 4.5
Topic: Introduction to Pipelining
Lecture slides (PDF)

Lecture Notes

  1. Pipelining by analogy (laundry example):
  2. Five stages of the load MIPS instruction
  3. The pipelined datapath
  4. Single cycle, multiple cycle vs. pipeline
  5. Advantages of pipelined execution
  6. Problems with pipelining (pipeline hazards)

CS385 Computer Architecture, Lecture 14

Reading: Patterson & Hennessy - Section 4.5, 4.6
Topic: Solving pipeline hazards, Designing a pipelined processor
Lecture slides I (PDF)
Lecture slides II (PDF)

Lecture Notes

  1. Structural hazards: single memory
  2. Control hazards:
  3. add $4, $5, $6            beq $1, $2, $40
    beq $1, $2, 40     ==>    add $4, $5, $6
    lw $3, 300($0)            lw $3, 300($0)
  4. Data hazards (dependecies backwards in time):
  5. lw $t0, 0($t1)               lw $t0, 0($t1)
    lw $t2, 4($t1)      ==>      lw $t2, 4($t1)
    sw $t2, 0($t1)               sw $t0, 4($t1)
    sw $t0, 4($t1)               sw $t2, 0($t1)
  6. Designing a pipelined processor

CS385 Computer Architecture, Lecture 15

Reading: Patterson & Hennessy - Section 4.6, Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topic: Implementing pipeline datapath and control
Lecture slides (PDF)

Lecture Notes

  1. Splitting datapath into stages: using registers to store parts of the instruction
  2. Transferring data forward and backward between the stages: lw example
  3. Corrected datapath: storing rd for the write back stage.
  4. Graphically representing pipelines: multiple-clock-cycle vs. single-clock-cycle diagram
  5. Pipeline control:
  6. Datapath with control
  7. Example: running this code through the pipeline in 9 cycles.

  8. lw   $10, 20($1)
    sub  $11, $2, $3
    and  $12, $4, $5
    or   $13, $6, $7,
    add  $14, $8, $9
Exercises: Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php), Figures 4.13.11 through 4.13.15.


CS385 Computer Architecture, Lecture 16

Reading: Patterson & Hennessy - Section 4.7, 4.8
Topic: Implementing data and branch hazard control
Lecture slides (PDF)

Lecture Notes

  1. Detecting data dependencies
  2. Forwarding
  3. Data hazards and stalls
  4. If (ID/EX.MemRead and
        (ID/EX.Rt = IF/ID.Rs or
         ID/EX.Rt = IF/ID.Rt))
       stall the pipeline
  5. Branch hazards
  6. Advanced pipelining


CS385 Computer Architecture, Lecture 17

Reading: Patterson & Hennessy - Chapter 4, Section 4.13 (http://booksite.elsevier.com/9780124077263/appendices.php)
Topics: Review of Datapath, Control and Pipelining, HDL implementation (mips-pipe.vl), 3-stage pipeline (mips-pipe3.vl)
Programs: mips-pipe.vl, mips-pipe3.vl

Lecture slides (PDF)

Lecture Notes

Datapath

  1. Abstract level implementation:
  2. Basic building elements
  3. Basic operations

Control

  1. ALU control: mapping the opcode and function bits to the ALU control inputs
  2. Designing the main control unit
  3. Operation of the Datapath (single-cycle implementation):
  4. Problems of the single-cycle implementation

Pipelining

  1. Basic principles of pipelining
  2. MIPS pipelining: the five stages of the lw instruction
  3. Problems with pipelining:
  4. Designing a pipelined processor
  5. Transferring data forward and backward between the stages
  6. Pipeline control
  7. Implementing data and branch hazard control
  8. Advanced pipelining

Demo:


CS385 Computer Architecture, Lecture 18

Reading: Patterson & Hennessy - Section 5.1
Topic: Memory Hierarchy
Lecture slides (PDF)

Lecture Notes

  1. Memory technologies and trends
  2. Impact on performance
  3. The need of hierarchical memory organization
  4. The principle of locality
  5. Memory hierarchy terminology
  6. Basics of RAM implementation

CS385 Computer Architecture, Lecture 19

Reading: Patterson & Hennessy - Sections 5.1-5.3
Topic: The Basics of caches
Lecture slides (PDF)
Programs: cache.vl, cache2.vl

Lecture Notes

  1. Direct-mapped cache
  2. Accessing a cache
  3. Writing to the cache (write-through and write-back schemes)
  4. Handling cache misses
  5. Spatial locality caches: keeping consistency on write
  6. Main memory organization

Exercises: Problems 5.1, 5.2, 5.3, 5.7 from Chapter 5 Exercises in Blackboard


CS385 Computer Architecture, Lecture 20

Reading: Patterson & Hennessy - Section 5.4
Topic: Improving cache performance
Lecture slides (PDF)

Lecture Notes

  1. Measuring cache performance
  2. Flexible placement of blocks in the cache
  3. Locating a block in the cache: N-way cache requires N comparators and N-way multiplexor
  4. Choosing which block to replace: least recently used
  5. Multilevel caches
Exercises

CS385 Computer Architecture, Lecture 21

Reading: Patterson & Hennessy - Section 5.7
Topic: Virtual Memory
Lecture slides (PDF)

Lecture Notes

  1. The need of VM
  2. VM organization and terminology: virtual address, physical address, page, page offset, page fault, memory mapping (translation).
  3. Design decisions motivated by the very high cost of page faults:
  4. Addressing pages:

CS385 Computer Architecture, Lecture 22

Reading: Patterson & Hennessy - Section 5.7
Topic: Virtual Memory optimization
Lecture slides (PDF)

Lecture Notes

  1. Optimizing address translation - Translation Lookaside Buffer (TLB):
  2. MIPS R2000 (DECStation 3100) TLB
  3. Overall operation of a memory hierarchy
  4. Memory protection with VM
  5. Using exceptions for handling TLB misses and pages faults: using EPC and Cause registers
  6. Summary of VM

CS385 Computer Architecture, Lecture 23

Reading: Patterson & Hennessy - Section 5.8.
Topic: A commmon framework for memory hierarchies
Lecture slides (PDF)

Lecture Notes

  1. Associativity schemes
  2. Placing blocks
  3. Miss rates and cache sizes
  4. Finding blocks
  5. Why do we use full associativity and a separate lookup table (page table) in VM
  6. Choosing a block to replace
  7. Writing blocks
  8. The sources of misses
  9. The challenge: reducing the miss rate has a negative effect on the overall performance
  10. Pentium Pro and PowerPC 604

Exercises (from Chapter 5 Exercises in Blackboard)


CS385 Computer Architecture, Lecture 26

Reading: Chapter 6, Section 2.11
Topic: Multiprocessors
Lecture slides (PDF)
COD-Chapter7.pdf

Lecture Notes

  1. Amdahl's Law
  2. Basic approaches to sharing data and types of connectivity
  3. Programming multiprocessors
  4. Multiprocessors connected by a single bus
  5. A parallel program
  6. Multiprocessor cache coherency
  7. Implementing a multiprocessor cache coherency protocol
  8. Synchronization using coherency, locks, atomic swap operation
  9. again: addi $t0, $0, 1 # copy locked value
    ll $t1, 0($s1) # load linked
    sc $t0, 0($s1) # store conditional
    beq $t0, $0, again # branch if store fails
    add $s4, $0, $t1 # put load value in $s4


CS385 Computer Architecture, Lecture 27

Reading: Chapter 6
Topic: Networks of muiltiprocessors and clusters
Lecture slides (PDF)
COD-Chapter7.pdf

Lecture Notes

  1. Shared memory vs. multiple private memories
  2. Centralized memory vs. distributed memory
  3. Parallel programming by message passing
  4. Distributed memory communication
  5. Memory allocation
  6. Clusters and network topology
  7. Modern clusters:

Digital Design Review Assignment

Log on to Blackboard to see and submit the assignment.

Assignment 1: Assembly Programming in MIPS (maximum grade 10 points)

Log on to Blackboard to see and submit the assignment.

Semester Project: Building a mini MIPS machine (maximum grade 45 points including the presentation)

Log on to Blackboard to see and submit the project.

Midterm Test (20 points)

There will be 20 multiple choice and short answer questions covering the following topics:

Final Exam (25 points)

There will be 25 multiple choice, multiple answer, and short answer questions from the following topics (see the Review Questions in Blackboard):