CS385 – Computer Architecture

Spring-2007

Classes: MW 5:15pm - 6:30pm, Robert Vance Academic Center 108
Instructor: Dr. Zdravko Markov, MS 307, (860)-832-2711, http://www.cs.ccsu.edu/~markov/, e-mail: markovz at ccsu.edu
Office hours: MW 10:00am - 12:00pm, 6:30pm - 7:00pm, or by appointment

Catalog description: The architecture of the computer is explored by studying its various levels: physical level, operating-system level, conventional machine level and higher levels. An introduction to microprogramming and computer networking is provided.

Course Prerequisites: CS 354

Prerequisites by topic:

Course description: The course provides a comprehensive coverage of computer architecture. It discusses the main components of  computers and the basic principles of their operation. It demonstrates the relationship between the software and hardware and focuses on the foundational concepts that are the basis for current computer design. The course is based on the MIPS processor, a simple clean RISC processor whose architecture is easy to learn and understand. The major topics covered by the course are the following:
  1. MIPS instruction set
  2. Computer arithmetic and ALU design
  3. Datapath and control
  4. Pipelining
  5. Memory hierarchy, caches and virtual memory
  6. Interfacing CPU and peripherals, buses
  7. Multiprocessors, networks of multiprocessors, parallel programming
  8. Performance issues
Course Goals: Upon successful completion of the course the student will be able to Required textbook: David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Third Edition, Morgan Kaufmann Publishers, 2004, ISBN: 1-55860-604-1.

Required software:

  1. Icarus Verilog (available on the Patterson and Hennessy's book CD or at http://armoid.com/icarus/). Note about installation: don't use folder names that include spaces (like Program Files). Read book sections B.4 and 5.8 for using HDL.
  2. Other simulators that may be used for experiments (note that the project should be done with a Verilog simulator):
  3. SPIM simulator: A free software simulator for running MIPS R2000 assembly language programs available for Unix, DOS, and Windows.
WEB resources: Semester project: There will be a semester project to build a simplified MIPS machine. The project will require four progress reports which will be graded too. The machine must be implemented in HDL Verilog, tested with a sample MIPS program and properly documented.

Class Participation: Active participation in class is expected of all students. Regular attendance is also expected. If you must miss a test, try to inform the instructor of this in advance.

Honesty policy: It is expected that all students will conduct themselves in an honest manner (see the CCSU Student handbook), and NEVER claim work which is not their own.  Violating this policy will result in a substantial grade penalty, and may lead to expulsion from the University.

Grading: Grading will be based on one programming assignment (5%), a midterm test (25%), a final exam (25%) and a semester project (45%, including progress reports and the final documentation). The letter grades will be calculated according to the following table:
 
A A- B+ B B- C+ C C- D+ D D- F
95-100 90-94 87-89 84-86 80-83 77-79 74-76 70-73 67-69 64-66 60-63 0-59



Tentative schedule of classes and assignments
  1. Introduction: Computer Architecture = Instruction Set Architecture + Machine Organization
  2. MIPS Instructions: arithmetic, registers, memory, fecth&execute cycle
  3. MIPS Instructions: control and addressing modes
  4. Computer arithmetic and ALU design: representing numbers, arithmetic and logic operations
  5. Assignment 1 due (5 pts.). ALU design: full adder, slt operation, HDL design, carry lookahead
  6. ALU design: multiplication, representing floating point numbers
  7. The Processor: Building a datapath
  8. The Processor: Control (single cycle approach)
  9. March 2: Progress Report #1 due (5 pts.): ALU, Register File. Use template files: ALU4.vl (extend it to 16 bit) and regfilewrite.vl (use a behavioral 4-to-1 multiplexer for the read port).
  10. Multicycle approach to processor control
  11. Implementing finite state machine control, Microprogramming
  12. Using a Hardware Description Language to Design and Simulate the MIPS processor
  13. Turing machines
  14. March 28: Progress Report #2 due (5 pts.) - Memory, Instruction Fetch logic and Stage Control logic: use the project template (mips2.vl) and (1) adjust the word size to 16 bit, (2) implement fetching with structural design and (3) test it.
  15. Introduction to pipelining
  16. Solving pipeline hazards
  17. April 2:  Midterm Test (25 pts.) due
  18. Implementing pipeline datapath and control
  19. Implementing data and branch hazards control
  20. Review of Datapath, Control and Pipelining
  21. Memory hierarchy
  22. April 16: Progress Report #3 due (5 pts.) - Branching logic, Execution control. See the project description for more details.
  23. The Basics of caches
  24. Improving cache performance
  25. Virtual Memory basics
  26. Virtual Memory optimization
  27. A general framework of memory hierarchies
  28. Interfacing Processors and Peripherals - Buses
  29. May2: Progress Report #4 due (5 pts.) - add register file, ALU and connecting MUX's. Add data cache (optional).
  30. Interfacing I/O devices to Memory, CPU and OS
  31. Multiprocessors
  32. Networks of muiltiprocessors
  33. The role of performance
  34. Final Exam due (25 pts.)
  35. May 16: Semester Project due (25 pts.)

CS385 – Computer Architecture, Lecture 1

Reading: Chapter 1
Topics: Introduction, Computer Architecture = Instruction Set Architecture + Machine Organization.
Lecture slides (PDF)

Lecture Notes

  1. Levels of Abstraction
  2. Computer Architecture = Instruction Set Architecture + Machine Organization
  3. Instruction Set – The Software Hardware Interface
  4. Levels of Computer Architecture in More Depth
  5. Basic Components of a Computer
  6. Computer Organization

CS385 – Computer Architecture, Lecture 2

Reading: Sections 2.1 - 2.5
Topics: MIPS instructions, arithmetic, registers, memory, fecth&execute cycle
Lecture slides (PDF)

Lecture Notes

  1. Design goal: maximize performance and minimize cost. Primitive (low level) and very restrictive instructions (fixed number and type of operands).
  2. Design principles:
  3. MIPS arithmetic: 3 operands, fixed order, registers only.
  4. Using only registers: R-type instructions.
  5. Registers: 32-bits long, conventions.
  6. Memory organization: words and byte addressing.
  7. Data transfer (load and store) instructions. Example: accessing array elements.
  8. Translating C code into MIPS instructions – the swap example.
  9. Machine Language: instruction format, I-type (Immediate) format for data transfer
  10. Stored program concept: programs in memory, fetch&execute cycle

CS385 – Computer Architecture, Lecture 3

Reading: Sections 2.6 - 2.9, 2.16
Topics: MIPS Instructions: control and addressing modes
Lecture slides (PDF)

Lecture Notes

  1. Implementing the C code for if in MIPS: conditional branch.
  2. Implementing the C code for if–else in MIPS: unconditional branch
  3. Simple for loop
  4. Check for less-than: building a pseudoinstuction for branch if less-than.
  5. Addressing in branch instructions: PC-relative and pseudodirect.
  6. Constants: use of immediate addressing (constants as operands – addi, slti, andi, ori)).
  7. 32-bit constants – manipulate upper 2 bytes separately (load upper immediate)
  8. Summary of MIPS addressing: register (add), immediate (addi), base or displacement (lw), PC-relative (bne), pseudodirect (j).
  9. Alternative approaches: IA-32

CS385 – Computer Architecture, Lecture 4

Reading: Sections 3.1 - 3.3, B.5 (CD)
Topics: Computer arithmetic and ALU design: representing numbers, arithmetic and logic operations
Lecture slides (PDF)

Lecture Notes

  1. Representing numbers: sign bit, one's complement, two's complement.
  2. Arithmetic: addition, subtraction, detecting overflow.
  3. Logical operations: shift, and, or.
  4. Basic ALU building components: and-gate, or-gate, inverter, multiplexor.
  5. ALU for logical operations.
  6. ALU for add, and, or.
  7. Supporting subtraction
Exercises:Implement an overflow detection unit using only the CarryIn and CarryOut bits of ALU-31

CS385 – Computer Architecture, Lecture 5

Reading: Sections B.4, B.5, B.6 (CD)
Topics: ALU design: full adder, slt operation, HDL design, carry lookahead
Lecture slides (PDF)
Programs: 2-1-mux.vl, 4-bit-adder.vl, more examples of Verilog programs

Lecture Notes

  1. Implementation of a full adder:
  2. Supporting set on less-than (slt).
  3. Test for equality (needed for branching)
  4. Designing the ALU in Verilog
  5. Carry Lookahead
Exercises: Implement and test half and full adders in Verilog with using the structural specification approach (gate-level modeling).

CS385 – Computer Architecture, Lecture 6

Reading: Section 3.4, 3.6 - 3.10
Topics: ALU design: multiplication, representing floating point numbers
Lecture slides (PDF)

Lecture Notes

  1. Implementing multiplication:
  2. Floating point numbers
Exercises

CS385 – Computer Architecture, Lecture 7

Reading: Sections 5.1 - 5.3
Topics: The Processor, Building a Datapath
Lecture slides (PDF)

Lecture Notes

  1. Abstract level implementation:
  2. Basic building elements
  3. Fetching instructions and incrementing the program counter
  4. Register file and execution of R-type instructions
  5. Datapath for lw and sw instructions (add data memory and sign extend)
  6. Datapath for branch instructions
Demo: http://www.it.jcu.edu.au/Subjects/cp2005/resources/animation/wk6animations.ppt

CS385 – Computer Architecture, Lecture 8

Reading: Section 5.4
Topics: Single-cycle control
Lecture slides (PDF)

Lecture Notes

  1. ALU control: mapping the opcode and function bits to the ALU control inputs
  2. Designing the main control unit
  3. Operation of the Datapath (single-cycle implementation):
  4. Problems of the single-cycle implementation
Demo: http://www.web-ee.com/primers/files/MIPS/MIPS.htm

CS385 – Computer Architecture, Lecture 9

Reading: Section 5.5, 5.6
Topics: Multicycle Approach to Processor Control, Exceptions
Lecture slides (PDF)

Lecture Notes

  1. Basic principles:
  2. Execution steps:
  3. Implementing the control unit - finite state machine
Exercises

CS385 – Computer Architecture, Lecture 10

Reading: Section 5.7 (CD)
Topics: Implementing Finite State Machine Control, Microprogramming
Lecture slides (PDF)

Lecture Notes

  1. ROM implementation
  2. Programmable Logic Array (PLA)
  3. Implementing the Next-step function with a Sequencer
  4. Implementing datapath control by microprogramming
  5. Microprogramming vs. ROM and PLA implementation
  6. Handling exceptions and interrupts

CS385 – Computer Architecture, Lecture 11

Reading: Section: 5.8 (CD). Note the numerous errors in the Verilog code.
Topic: Using Hardware Description Language to Design and Simulate the MIPS processor
  1. Behavior model of MIPS (mips.vl)
  2. Project version (two stage control) of MIPS (mips2.vl). Changes needed to complete the project:

CS385 – Computer Architecture, Lecture 12

Turing machines

  1. Turing machines
  2. A Turing Machine Applet
  3. Alan Turing web page


CS385 – Computer Architecture, Lecture 13

Reading: Section: 6.1 - 6.2.
Topic: Introduction to Pipelining
Lecture slides (PDF)

Lecture Notes

  1. Pipelining by analogy (laundry example):
  2. Five stages of the load MIPS instruction
  3. The pipelined datapath
  4. Single cycle, multiple cycle vs. pipeline
  5. Advantages of pipelined execution
  6. Problems with pipelining (pipeline hazards)

CS385 – Computer Architecture, Lecture 14

Reading: Section: 6.1 - 6.2.
Topic: Solving pipeline hazards, Designing a pipelined processor
Lecture slides (PDF)

Lecture Notes

  1. Structural hazards: single memory
  2. Control hazards:
  3. add $4, $5, $6            beq $1, $2, $40
    beq $1, $2, 40     ==>    add $4, $5, $6
    lw $3, 300($0)            lw $3, 300($0)
  4. Data hazards (dependecies backwards in time):
  5. lw $t0, 0($t1)               lw $t0, 0($t1)
    lw $t2, 4($t1)      ==>      lw $t2, 4($t1)
    sw $t2, 0($t1)               sw $t0, 4($t1)
    sw $t0, 4($t1)               sw $t2, 0($t1)
  6. Designing a pipelined processor

CS385 – Computer Architecture, Lecture 15

Reading: Section: 6.2 - 6.3
Topic: Implementing pipeline datapath and control
Lecture slides (PDF)

Lecture Notes

  1. Splitting datapath into stages: using registers to store parts of the instruction
  2. Transferring data forward and backward between the stages: lw example
  3. Corrected datapath: storing rd for the write back stage.
  4. Graphically representing pipelines: multiple-clock-cycle vs. single-clock-cycle diagram
  5. Pipeline control:
  6. Datapath with control
  7. Example: running this code through the pipeline in 9 cycles.

  8. lw   $10, 20($1)
    sub  $11, $2, $3
    and  $12, $4, $5
    or   $13, $6, $7,
    add  $14, $8, $9


CS385 – Computer Architecture, Lecture 16

Reading: Section: 6.4 - 6.9
Topic: Implementing data and branch hazard control
Lecture slides (PDF)

Lecture Notes

  1. Detecting data dependencies
  2. Forwarding
  3. Data hazards and stalls

  4. If (ID/EX.MemRead and
        (ID/EX.Rt = IF/ID.Rs or
         ID/EX.Rt = IF/ID.Rt))
       stall the pipeline
  5. Branch hazards
  6. Advanced pipelining


CS385 – Computer Architecture, Lecture 17

Reading: Chapters 5, 6
Topic: Review of Datapath, Control and Pipelining
Lecture slides (PDF)

Lecture Notes

Datapath

  1. Abstract level implementation:
  2. Basic building elements
  3. Basic operations

Control

  1. ALU control: mapping the opcode and function bits to the ALU control inputs
  2. Designing the main control unit
  3. Operation of the Datapath (single-cycle implementation):
  4. Problems of the single-cycle implementation
  5. Multicycle Approach to Processor Control
  6. Basic principles of the Multicycle Approach to Processor Control
  7. Execution steps:
  8. Finite state machine control
  9. Microprogramming

Pipelining

  1. Basic principles of pipelining
  2. MIPS pipelining: the five stages of the lw instruction
  3. Problems with pipelining:
  4. Designing a pipelined processor
  5. Transferring data forward and backward between the stages
  6. Pipeline control
  7. Implementing data and branch hazard control
  8. Advanced pipelining
Demo: http://www.web-ee.com/primers/files/MIPS/MIPS.htm


CS385 – Computer Architecture, Lecture 18

Reading: Section 7.1
Topic: Memory Hierarchy
Lecture slides (PDF)

Lecture Notes

  1. Memory technologies and trends
  2. Impact on performance
  3. The need of hierarchical memory organization
  4. The principle of locality
  5. Memory hierarchy terminology
  6. Basics of RAM implementation

CS385 – Computer Architecture, Lecture 19

Reading: Section 7.2
Topic: The Basics of caches
Lecture slides (PDF)

Lecture Notes

  1. Direct-mapped cache
  2. Accessing a cache
  3. Writing to the cache (write-through and write-back schemes)
  4. Handling cache misses
  5. Example: DECStation 3100 cache
  6. Spatial locality caches: keeping consistency on write
  7. Main memory organization

CS385 – Computer Architecture, Lecture 20

Reading: Section 7.3
Topic: Improving cache performance
Lecture slides (PDF)

Lecture Notes

  1. Measuring cache performance
  2. Flexible placement of blocks in the cache
  3. Locating a block in the cache: N-way cache requires N comparators and N-way multiplexor
  4. Choosing which block to replace: least recently used
  5. Multilevel caches
Exercises

CS385 – Computer Architecture, Lecture 21

Reading: Section 7.4
Topic: Virtual Memory
Lecture slides (PDF)

Lecture Notes

  1. The need of VM
  2. VM organization and terminology: virtual address, physical address, page, page offset, page fault, memory mapping (translation).
  3. Design decisions motivated by the very high cost of page faults:
  4. Addressing pages:

CS385 – Computer Architecture, Lecture 22

Reading: Section 7.4
Topic: Virtual Memory optimization
Lecture slides (PDF)

Lecture Notes

  1. Optimizing address translation - Translation Lookaside Buffer (TLB):
  2. MIPS R2000 (DECStation 3100) TLB
  3. Overall operation of a memory hierarchy
  4. Memory protection with VM
  5. Using exceptions for handling TLB misses and pages faults: using EPC and Cause registers
  6. Summary of VM

CS385 – Computer Architecture, Lecture 23

Reading: Section 7.5
Topic: A general framework of memory hierarchies
Lecture slides (PDF)

Lecture Notes

  1. Associativity schemes
  2. Placing blocks
  3. Miss rates and cache sizes
  4. Finding blocks
  5. Why do we use full associativity and a separate lookup table (page table) in VM
  6. Choosing a block to replace
  7. Writing blocks
  8. The sources of misses
  9. The challenge: reducing the miss rate has a negative effect on the overall performance
  10. Pentium Pro and PowerPC 604

Exercises


CS385 – Computer Architecture, Lecture 24

Reading: Section 8.4
Topic: Interfacing Processors and Peripherals - Buses
Lecture slides (PDF)

Lecture Notes

  1. Buses: lines, transactions, types
  2. Synchronous and asynchronous buses
  3. Handshaking protocol
  4. Bus access: master and slave
  5. Bus arbitration schemes
  6. Bus standards

CS385 – Computer Architecture, Lecture 25

Reading: Section 8.5
Topic: Interfacing I/O devices to Memory, CPU and OS
Lecture slides (PDF)

Lecture Notes

  1. The role of the operating system in interfacing I/O devices to Memory
  2. Controlling the I/O devices
  3. Communicating with the processor
  4. Direct memory access (DMA)
  5. DMA and the memory system
  6. Designing an I/O system: latency and bandwidth constraints.

CS385 – Computer Architecture, Lecture 26

Reading: Section 9.1 - 9.3 (on CD)
Topic: Multiprocessors
Lecture slides (PDF)

Lecture Notes

  1. Basic approaches to sharing data and types of connectivity
  2. Programming multiprocessors
  3. Multiprocessors connected by a single bus
  4. A parallel program
  5. Multiprocessor cache coherency
  6. Implementing a multiprocessor cache coherency protocol
  7. Synchronization using coherency

CS385 – Computer Architecture, Lecture 27

Reading: Section 9.4 - 9.6 (on CD)
Topic: Networks of muiltiprocessors and clusters
Lecture slides (PDF)

Lecture Notes

  1. Shared memory vs. multiple private memories
  2. Centralized memory vs. distributed memory
  3. Parallel programming by message passing
  4. Distributed memory communication
  5. Memory allocation
  6. Clusters and network topology
  7. Modern clusters:

CS385 – Computer Architecture, Lecture 28

Reading: Section 4.1 - 4.4
Topic: The role of performance
Lecture slides (PDF)

Lecture Notes

  1. Measuring computer performance:
  2. Evaluating computer systems:
  3. Categories of parallelism

CS385 Assignment 1: Assembly Programming in MIPS

Write a program in MIPS assembler to perform some useful computation (e.g. calculate sales tax, convert temperature from Celsius into Fahrenheit). The program must include:
  1. At least one instruction from each instruction type: R-type arithmetic, I-type arithmetic, Memory transfer and Branch.
  2. Input and Output through system calls.
  3. Comments explaining the type, format and the meaning of each instruction. If you use a pseudo instruction, there must be an explanation how it translates into real MIPS instructions.
Use the SPIM simulator to debug and run the program. Appendix A of the text (on CD) provides reference information about MIPS assembly programming. You may find additional information about MIPS programming using SPIM at CS 254 - Computer organization and assembly language programming and Introduction to RISC Assembly Language Programming.

Documentation and submission: Submit the source text of the program (ASCII text) as an attachment through CCSU pipeline/Vista Courses/Computer Architecture - CS-385-70 Spring07/Assignment 1.


CS385 Semester Project: Building a mini MIPS machine (total grade 45 points, including 4 progress reports by 5 points each)

Posting date: February 12
Due date: May 16

Description: Not available at this time


CS385 Midterm Test

The midterm test topics include: number systems, MIPS assembly programming, single-cycle datapath and control, multi-cycle datapath and control, Verilog HDL, and solving pipeline hazards. The test can be downloaded from Vista between March 28 and April 2 and must be submitted by April 2.

CS385 Final Exam

The Final Exam topics include: Memory Hierarchy, Caches, Interfacing Peripherals, and Multiprocessors. The test can be downloaded from Vista between May 14 and May 18 and must be submitted by May 18.