ece552
spring 2026 · projects
PROJECTS
Project 1: RISC-V Assembly Language Project 2: Digital Design and Debugging Project 3: Single-Cycle Processor Project 4: Basic Pipelined Processor Project 5: Forwarding and Branch Prediction Project 6: Multi-Cycle Memory Project 7: Cache Integration Project 8: Extra Credit
REFERENCE
Verilog Rules Reference Materials
ece 552 / projects › Project 4: Basic Pipelined Processor
Home

Project 4: Basic Pipelined Processor

UW–Madison ECE 552: Introduction to Computer Architecture · Spring 2026

Project Introduction

In this project phase, you will create a fully functional pipelined processor without optimizations, implementing the RISC-V RV32I ISA. It is recommended to start this project early.

Implementations must be created using Verilog; SystemVerilog will not be allowed and will result in a grade of 0 for Problem 2 if used. This assignment is designed to be completed in a group of 2-3. Collaboration between groups is not permitted. IMPORTANT: Make your final submissions in your groups on Gradescope. If you do not, the teaching staff will have to do this manually.

You may not use generative artificial intelligence (e.g. ChatGPT). For this assignment, LLMs will not be helpful and overly relying on them may make this project phase and subsequent phases significantly harder to complete.

This project will be worth 5% of your final course grade. The automated tests on Gradescope comprise the majority of the points. The schematic is worth a fair portion.

All material is available in the Project 4 and 5 repository on Gitlab. You can clone the repository locally or on a CSL machine.

If there are any issues with Gradescope or the files provided, please post on Piazza rather than emailing. If you are having difficulty with this project, the TAs are holding office hours several times a week (check Canvas) where you can receive assistance.

Good luck!

Important Note

DO NOT try to skip steps and add the optimizations from Project 5 before getting your processor to work with stalls. While it is possible to implement the optimizations first and elide the stalling phase since the optimizations replace the stalling anyway, it is a nightmare to debug. Both of these projects are released at the same time, and your solution from Project 5 will be able to pass the tests for Project 4, but it is much easier to complete these in order without skipping steps.

Problem 1: Schematic

Update your schematic from Project 3 to include the elements required to pipeline your processor (e.g. pipeline registers).

Since labelling the bit width of every signal (with a width of greater than 1) for every pipeline stage is tedious, you may label the bit width in the first stage in which the signal appears and represent the width of that signal (if greater than 1) with just a slash in subsequent stages.

The textbook is a good resource for this problem.

Problem 2: Pipeline Implementation

Update your processor to be pipelined. In order to get all of the points for this project, you must pass accuracy tests and do not need to worry about CPI. Project 5, which was released at the same time as this project, requires you to pass both accuracy and CPI tests.

Therefore, what you should be doing for this project is getting your processor to work with stalls in the case of data or control hazards.

Again, all material is available in the Project 4 and 5 repository on Gitlab.

Step 1: Add Pipeline Registers

Since in this project the processor does not include stalling or forwarding logic, in order to handle RAW dependencies, you need to modify the program to insert nop instructions. A quick note on register file bypassing: Recall from Project 2, you implemented bypassing logic in the register file. Starting from this step, you can enable it by setting BYPASS_EN to 1 when instantiating the register file. If it is easier for you to reason about pipelining without enabling it, you can include the extra stall cycle and pipeline the processor without bypassing first.

Consider the following simple program:

L0: addi x7, x0, 0xf
L1: addi x1, x0, 10
L2: addi x2, x1, 1
L3: addi x3, x2, 1 
L4: addi x4, x3, 1
L5: addi x5, x4, 1
L6: addi x6, x5, 1
L7: beq x6, x7, L10
L8: lui a0, 0xdead
L9: ebreak
L10: lui a0, 0x1
L11: ebreak

How would you modify this program so that it is executed correctly on your simplistic pipelined processor (no forwarding, no hazard detection, register file bypassing enabled, branch resolved in EX)? More specifically, where and how many nop instructions do you need to add? It's a good idea to draw pipeline diagrams when determining nop quantities, and when conceptualizing pipelining in general.

Once you insert nop instructions, take this modified program and run it with your simplistic pipeline processor to verify that it is executed correctly. You are also encouraged to write more programs like above with various dependencies. For example, also consider a RAW dependency established by load instructions and how that would be handled. These programs can also be used as simple test cases in the next step.

This version of the processor will NOT pass the tests on Gradescope. In order to pass the tests, you must implement stalls using a hazard detection unit.

Step 2: Add Hazard Detection Unit

Now that your pipelined processor is working correctly with programs explicitly annotated with nop instructions, the next step is to implement a hazard detection unit that identifies all the dependencies and sends control signals to the pipeline registers. If an instruction is dependent on one or multiple older instructions in flight, stall it in the decode (ID) stage. Think about how you would implement a stall/pipeline bubble in your pipeline.

Once you have correctly implemented the hazard detection unit and integrated it into your processor, remove all the nop instructions you inserted into the test program previously and run it again on your processor. Verify that it still behaves as expected.

Note that you will need to update the retire interface (o_retire_*) provided in hart.v to retire signals in the writeback stage.

Resources

The RISCV-32I specification document is the primary document to reference how instructions are implemented.

The reference sheet is helpful when writing programs or as a quick reference.

You may find this simulator helpful for visualizing how your processor should execute, and for verifying functional correctness and performance.

We also recommend using this decoder to assemble RISC-V instructions to test and examine instructions when debugging.

Gradescope

Submit your code to Gradescope as often as you would like to verify your processor. Please note that we are enforcing Verilog coding rules for this project. You can find a centralized list of these rules here. The rule-checking script will run every time you submit your processor. The rule checking script will first check for compiler errors, then disallowed constructs. Once your code is passing the rules check, we will perform a trace comparison between your processor and our reference implementation.

Submission Instructions

Submit the following files to the appropriate Gradescope assignment:

Deliverable Points Notes
schematic.pdf 20 See Problem 1.
All Verilog files required for
your processor to run (no folder hierarchy)
125 If it is used in your design, include it.
See Problem 2.
project4.txt 0 Submission Template
[Group Member 1]: Name
[Group Member 2]: Name
[Group Member 3]: Name

Filenames must match exactly. Please double check to ensure that you have submitted all of the required files.

This project will be worth 5% of your final course grade. The results of any manually graded content will be made available after the deadline.

# On this page
Project Introduction Problem 1: Schematic Problem 2: Pipeline Implementation Submission Instructions
ECE 552 © 2026 Course Staff UW–Madison
Introduction to Computer Architecture