pa03 : Application of graphs to machine learning

num	ready?	description	assigned	due
pa03	true	Application of graphs to machine learning	Wed 02/25 09:00AM	Fri 03/13 11:59PM

Required video watching before you start the lab

Watch this video before starting for a visual introduction to neural networks and how values flow forward through them.

Video 1 (19 min): 3Blue1Brown: But what is a neural network?

Lab Goals

This lab is designed to offer a real-life application of graphs for a very relevant field of computer science: machine learning, by implementing the underlying graph algorithms of a neural network.

What this Lab IS NOT about

You will NOT be expected to understand the mathematical derivations behind a neural network.
You will NOT need to implement everything in the neural network - the heavy lifting is done for you.
You will NOT need to worry about the architecture and design of the neural network.

What this Lab IS about

Applying graph algorithms to a graph data structure.
Learning and understanding the underlying graph structure of a neural network.
Implementing the basic class methods.

Collaboration Policy

You may work on this programming assignment with a partner. Make sure you add both your names at the top of main.cpp along with your perm numbers. Partnering is highly encouraged: working in groups is a highly beneficial skill to have moving forward as well as in the industry.

Academic Integrity

All work submitted for this programming assignment must be your own, or from your partner.

Tips

Don’t be intimidated by the nature of the lab: focus on what you know about graphs and class structure and apply your skills as necessary. The neural network is meant to expose you to upper-division concepts earlier on and hopefully excite you for future classes.
Start early and ask questions! It may take some time to understand what we are requiring you to do. This project is larger than normal. There is a lot of code involved, but try to focus on the subset of code you are required to implement.
The purpose of this lab is to write algorithms to traverse graphs (Breadth-First and DepthFirst) in an applied way. Please read the specifications closely to see which functions require which algorithm. We have done the heavy lifting for you in terms of the neural network implementation and abstracted the math away behind visit functions. So, your plan should be to write a BFT and DFT algorithm and decide where to place these visit functions within them. First and foremost, this is a Graph centered lab in the context of neural networks, not a neural network-centered lab.

Project Structure

The files you will work in are NeuralNetwork.cpp and Graph.cpp. Graph.cpp/hpp contains the following classes:

NodeInfo: contains information of a node.
Connection: contains information about a connection between two nodes.

Why is Connection a class and not just a weight? In a simple weighted graph, an edge can be represented as a single number. But during training, each connection needs to store additional state: the current weight and a delta term that accumulates how much the weight should change. Since an edge needs to carry mutable state across multiple phases (forward pass, backward pass, update), it is modeled as an object rather than a plain number. This is a common pattern when edges in a graph carry rich metadata.

Graph: contains the base graph structure you learned about in class, using an adjacency list.

NeuralNetwork.cpp/hpp contains the following class:

NeuralNetwork: Inherits the graph structure and exposes other neural network methods.

Other important classes/files that are part of the project are:

DataLoader contains an implementation for an object that contains a dataset and is used to send data into the neural network.
utility contains the activation functions (identity, ReLU, sigmoid) and their derivatives, as well as other miscellaneous functions used by the neural network.

Graph Structure

This particular graph structure maintains an adjacency list. The adjacency list uses integer identifiers to refer to each NodeInfo object. So, there is a std::vector<NodeInfo*> nodes member variable which maps an identifier to its corresponding NodeInfo. Note that the vector stores pointers to NodeInfo objects — this is why updateNode is required to allocate on the heap. The adjacency list is of type: std::vector<std::unordered_map<int, Connection>>, that is, a vector of hashmaps.

NeuralNetwork Structure

This class inherits from the graph class. You have not learned what this means, but for now, it means you can access graph members within the NeuralNetwork class. This is because a NeuralNetwork is a specific type of graph, much like how a binary tree is a type of graph.

A neural network is a specific type of graph, just like a binary tree is a type of graph. Here are the specifications of the neural network structure:

The Neural Network consists of “layers” which collectively partition the graph’s nodes. Every node is part of a layer, and layers may not share the same node. Layers are sequential.
There is at least one input layer and one output layer. Every other layer is a “hidden” layer.
The layers are ordered such that the input layer is first, and the output layer is last, each hidden layer specifically has a predecessor layer and a successor layer.
Every node in one layer has an outgoing connection to every node in the successor layer. A node may never have an outgoing connection to a layer not in the successor layer.
Every node can be “activated” with an activation function, which transforms the value held within the node.
Nodes in the input layer may not have incoming connections. Nodes in the output layer may not have outgoing connections.
Every connection is weighted.
Every node, except for nodes in the input layer, contains a term called the “bias”.

Here is a diagram which outlines the structure of a neural network with three layers: NeuralNetwork Example

The purpose of a neural network is to find a pattern of some training dataset in order to make accurate predictions on new, unseen data. In the image above, each $w_i$ and $b_i$ represent something called a weight and bias, respectively.

Step by Step Instructions

Step 0: Create a Git Repo and get the starter code

Refer to lab01 for instructions on how to set up a GitHub repository and pull the starter code for this lab. Here is the link for this lab’s starter code: https://github.com/ucsb-cs24-w26/STARTER-pa03

Step 1: Run the executable

Review the Makefile provided and look on top of each file to understand how the code is organized. If you are having trouble, take a look at this Makefile tutorial guide: https://zackglazewski.github.io/UCSBCS24-MakefileIntroduction/

We provided a simple test suite in test_neuralnet.cpp. For now, running ./test_neuralnet will fail, this is because some important functions have not been initialized yet. Once you implement steps 2 and 3, try playing around with main.cpp which will load and test the accuracy of a pretrained neural network model.

Step 2: Implement Getters and Setters

Your first step is implementing the getters and setters for the Graph and NeuralNetwork classes.

The following getters and setters need to be completed:

void NeuralNetwork::eval()
void NeuralNetwork::train()
void NeuralNetwork::setLearningRate(double lr)
void NeuralNetwork::setInputNodeIds(std::vector<int> inputNodeIds)
void NeuralNetwork::setOutputNodeIds(std::vector<int> outputNodeIds)
std::vector<int> NeuralNetwork::getInputNodeIds() const
std::vector<int> NeuralNetwork::getOutputNodeIds() const

Despite the name NeuralNetwork::eval() and NeuralNetwork::train(), these functions actually work to set the evaluating member in NeuralNetwork. When NeuralNetwork::eval is called, the neural network should be put in “evaluation mode”, where evaluating is set to true. Likewise, when NeuralNetwork::train is called, the neural network should be put in “training mode”, where evaluating is set to false.

Step 3: Implement Graph methods

The next step is implementing some of the class methods for the Graph class.

Notice the function void Graph::resize(int size). Before adding or updating any part of the structure, resize must be called first.

For example, calling resize(20) will initialize nodes to be a vector of size 20, where every element is a nullptr. Likewise, adjacencyList will be a vector of 20 empty maps. When writing these functions, assume that resize has already been called: it happens in the NeuralNetwork’s constructors.

You will need to implement the following functions:

void Graph::updateNode(int id, NodeInfo n)
- Allocates a NodeInfo object on the heap and updates it in nodes.
NodeInfo* Graph::getNode(int id) const
- This method should return a pointer to the NodeInfo object at the index id.
void Graph::updateConnection(int v, int u, double w)
- This method takes in a source id: v, a destination id: u and a weight w. The input represents a weighted and directed edge in the graph (from v to u). Update the adjacency list to reflect this update. Connections do not need to be allocated on the heap. If the connection already exists, just update the weight.
void Graph::clear()
- This method should deallocate any allocated memory from the heap.

One indication that you have implemented the functions so far correctly is that the test_structure test case ./test_neuralnet 4 should pass.

Step 4: Implement Predict with BFT

The next two steps will be the most challenging part of the programming assignment. We recommend taking the self-test assignment on Gradescope before implementing these functions.

You will need to implement a BFT (Breadth First Traversal) algorithm for:

std::vector<double> predict(DataInstance instance)
- This function takes an input training example, instance, and returns the neural network’s prediction (a vector, in the case where a neural network is defined to have more than one output).

To implement a BFT, you should not use the NeuralNetwork::layers member variable. Instead, implement it using a queue as stated in the starter code of the predict function.

A BFT is required for this function because of how input flows through a neural network, as described below.

We have provided you two important functions that take care of the math for you:

NeuralNetwork::visitPredictNode(int vId):
- takes care of the neural network math for visiting a node during the prediction phase. Computation will be performed on the NodeInfo whose id is vId.
NeuralNetwork::visitPredictNeighbor(Connection c):
- takes care of the neural network math for visiting a connection during the prediction phase. Computation will be performed on the nodes that make up the connection c.

Your job is to write the BFT algorithm that visits nodes and connections in the right order.

Here is a rundown of how the prediction algorithm works.

NeuralNetwork Example

For a neural network, the collections of weights and biases serve as the model or - the mechanism in which we make a prediction. We insert the input into each node of the input layer. This diagram takes in two inputs; since there are two nodes in the input layer, each node gets a different input. The value is then transformed by each weight as it “flows” to the next layer. Once it reaches the next layer, it again gets transformed by the bias and activation function - in that order. After this, the same process happens as it flows to the next layer. In this example, it flows until it has been transformed by the output layer, which leaves the value in a state we interpret as a prediction. For any given layer, we first visit all the nodes and connections to the next layer before visiting the nodes and connections of the next layer. Here is an example of how input flows from the input layer to the hidden layer in our example:

Assume the node with id 1 contains the input value $x_1$, and the node with id 2 contains the input value $x_2$.
To compute the value for node 3: $h_1$, we first accumulate the weighted sum: $h_1 = x_1w_1 + x_2w_4$
Next, we add the bias term: $h_1 = x_1w_1 + x_2w_4 + b_1$
Finally, activate the node with the activation function for its layer, which we will call activate:

\[h_1 = activate(x_1w_1 + x_2w_4 + b_1)\]

Using these steps, we can compute the value for every other node, including the output node ($y_1$), which will be our prediction.

$h_2 = activate(x_1w_2 + x_2w_5 + b_2)$
$h_3 = activate(x_1w_3 + x_2w_6 + b_3)$
$y_1 = activate(h_1w_7 + h_2w_8 + h_3w_9 + b_4)$

Activation functions transform the value of a node after the weighted sum and bias are applied:

ReLU (Rectified Linear Unit): outputs 0 if the input is negative, otherwise passes the value through unchanged. It introduces non-linearity, which allows the network to learn complex patterns.
Sigmoid: squashes any value into the range (0, 1). Useful for output nodes that represent a probability or confidence score.

The specific activation function used at each layer is set by the model — your code doesn’t need to choose it, just call visitPredictNode.

NOTE: Keep these two facts in mind when writing your prediction algorithm.

visiting a connection accumulates a part of the weighted sum.
- for example: visiting the connection with $w_5$ is responsible for calculating the $x_2w_5$ term.
visiting a node adds the bias and activates.

The other important thing to realize is that a layer must wait for the computation of the previous layer to finish, hence the need for a BFT traversal.

What problem is training solving — and how? (Optional)

Training adjusts every weight and bias in the network to make better predictions. The math behind how those adjustments are computed is deliberately abstracted in this assignment — the visit functions handle it. But training is solving a real and interesting mathematical problem. If you’re curious about what that problem is and how backpropagation solves it, these questions are worth sitting with:

How is the base case at the output node computed? Node 5 seeds the entire backward pass — but what is the formula, and where does it come from?
Why do these deltas actually make predictions better? Subtracting a delta from a weight moves it in a direction that reduces the loss — but why? What guarantees that?
Why does the learning rate matter? What happens if the step is too large? Too small?
Why average the deltas across the batch? One training example gives a noisy estimate. Averaging stabilizes it — but why?

These videos answer those questions, visually and without heavy math:

Video 2 (20 min) — Gradient descent, how neural networks learn: 3Blue1Brown: Gradient descent, how neural networks learn

Video 3 (20 min) — Backpropagation, intuitively: Backpropagation, intuitively

If the videos feel abstract, this interactive visualizer walks through the same concepts — cost function, loss landscape, gradient descent, and backpropagation — step by step:

Training Neural Networks: Cost · Gradient Descent · Backpropagation

If you watch them, here is how the key visuals map to this assignment:

What you saw in the video	What it is in the code
Neurons “lighting up” layer by layer	`predict()` — your BFT visits nodes and connections in this order
The cost/loss landscape (hilly surface)	The error value computed from the difference between prediction and ground truth
Gradient descent — rolling downhill	`weight -= lr * delta` in `update()`
Error flowing backward through the network	`contribute()` — your DFT traversal

Step 5: Implement Contribute with DFS

Your next task is to implement a DFT (Depth First Traversal) algorithm for:

bool NeuralNetwork::contribute(double y, double p);
- This is the main, public version of this function. y is the ground truth label and p was the neural network’s prediction.
double NeuralNetwork::contribute(int nodeId, const double& y, const double& p);
- This is the recursive helper function for the main contribute function. y and p remain the same throughout each call, nodeId indicates the node that is currently being visited.

A DFT is required because of the following structural problem.

The goal of contribute() is to compute a delta (a nudge) for every weight and bias in the network, based on how wrong the prediction was. To compute the delta for any node, you need the delta estimates from the nodes it connects to in the next layer. But the edges only go forward — there is no backward edge to follow. And the error is only measurable at the output node, where you can compare prediction to truth.

The question is: how do you get that error from the output back to every node in the network, when all edges point the other way?

DFS. Recursive calls let you dive forward to the output first, compute the error there, then pass the result back up the call stack — naturally reversing the direction of traversal. visitContributeNode and visitContributeNeighbor abstract the math; your job is to write the traversal that calls them in the right order.

Step through the interactive visualizer below to see exactly how contribute(), visitContributeNeighbor(), and visitContributeNode() are called during one backprop pass — and why the contributions map prevents recomputation. Use the arrow keys or the Prev/Next buttons to advance through each step.

Open in full screen

Why do we use DFT for back-propagation? There are a couple of reasons:

The error received in one layer depends on some new error that the next layer computes and passes down. The error a node receives from its out-neighbors is an “incoming contribution” and the error that a node passes down to its in-neighbors is its “outgoing contribution”.
The connections are only forward-directed; you cannot traverse backward in the network without the help of a recursive call.

NeuralNetwork Example

Here is an example, referring back to our original neural network: suppose we are currently at node 3

Iterate through all your neighbors. For each neighbor:
- get the neighbor’s incoming contribution by recursively calling contribute on node 6.
- Using that incoming contribution, visit the connection corresponding to that neighbor.
- In this case, we visit the connection with weight $w_7$.
Visit node 3 and pass in your running outgoing contribution to be updated.

Now, an example using node 1:

Get contribution from node 3, and visit the connection with $w_1$.
Get contribution from node 4, and visit the connection with $w_2$.
Get contribution from node 5, and visit the connection with $w_3$.
Visit node 1 and pass in your running outgoing contribution to be updated.

We have provided you with two important functions that take care of the math for you:

void visitContributeNode(int vId, double& outgoingContribution):
- Updates outgoingContribution and computes the contribution of the nodes delta term.
void visitContributeNeighbor(Connection& c, double& incomingContribution, double& outgoingContribution):
- Updates outgoingContribution and computes the contribution to the connection’s delta term.

In the starter code for the recursive helper, two local variables are declared for you: incomingContribution and outgoingContribution. incomingContribution holds the value returned by a recursive call to contribute on a neighbor — it is the error signal coming back from the next layer. outgoingContribution accumulates across all neighbors during the loop and is eventually returned to the caller, where it becomes that node’s incomingContribution. Both are passed by reference into the visit functions, which update them with the appropriate math.

In place of a “visited” set, there is a map called contributions. This acts as a way to keep track of nodes that we have already visited, as well as store their previously computed contributions.

Your job is to write the DFT algorithm that visits nodes and connections in the right order.

Step 6: Implement Update

After calling contribute, the update function assumes (precondition) that the delta of every node and connection has been accumulated. Now, each delta must be applied to every node’s bias and every connection’s weight.

Your goal in the update is to traverse the graph (in any way you want) to update every weight and bias. A single call to update affects every node and connection and reset the $delta$ values to zero.

For each update, follow these steps:

To update the bias: $bias_{new} = bias_{old} - (lr * delta)$
To update the weight: $weight_{new} = weight_{old} - (lr * delta)$
Reset the $delta$ values for each node and connection to zero.

where $lr$ is the learning rate (a member variable of NeuralNetwork), and controls how impactful we consider contributions to be.

Step 7: Test and Submit

test_neuralnet.cpp has been given as a brief way to check your neural network. Please test your code against this file and make sure you pass all the tests before trying to submit to gradescope.

If you are having issues, it is helpful to run a debugger like gdb to make sure your code is following the logic described above.

Step 8: Rest

Take a break! You’ve done well.

External Resources

Graphviz

The graph class overloads the « operator and outputs the graph in dot format. Copy the portion that looks like: digraph G {...} and paste it here to visualize the graph:

BFS and DFS

GeeksForGeeks BFS: https://www.geeksforgeeks.org/breadth-first-search-or-bfs-for-a-graph/
GeeksForGeeks DFS: https://www.geeksforgeeks.org/depth-first-search-or-dfs-for-a-graph/

Neural Networks

3Blue1Brown - But what is a neural network? (required, Video 1): https://youtu.be/aircAruvnKk?si=KZt2AsbD7URc58-L
3Blue1Brown - Gradient descent, how neural networks learn (required, Video 2): https://youtu.be/IHZwWFHWa-w?si=_e4wkFW624ari842
StatQuest - Neural Network Basics (great for understanding the prediction algorithm): https://youtu.be/CqOfi41LfDw?si=8waS2U01uMWcpH2i
StatQuest - Back Propagation (great for understanding the contribute algorithm): https://youtu.be/IN2XmBhILt4?si=bnDft-3T4DQ2iO9X

TensorFlow Playground

TensorFlow Playground

An interactive neural network running live in your browser. You can add and remove neurons and layers, choose activation functions, adjust the learning rate, and watch the network train in real time on classification and regression problems. This is useful for building intuition about what the weights and biases in your PA03 network are actually doing — you can see how the decision boundary shifts as training progresses, and experiment with how architecture and hyperparameter choices affect whether the network learns at all. No installation required.

Credits

This assignment was conceived and created by UCSB CS ULA, Zackary Glazewski in consultation with Diba Mirza for CS 24. Thanks to the UCSB CS24 teaching staff for reviewing and providing feedback: Torin Schlunk, Mehak Dhaliwal, Nawel Alioua, Joseph Ng, Shinda Huang, Xinlei Feng, Yaoyi Bai, Ally Chu, and Sanjana Shankar.

CC BY-NC-SA 2.0, Zackary Glazewski and Diba Mirza, May 2025.