Graphs 101

So a graph data structure is like a map of connections, like a bunch of dots that represent stuffs like people, cities or even web pages. Then you've got lines which connects these dots, showing some kind off relationships, like who's friends with who or which cities have direct roads between them.

Unlike linear data structures like arrays or linked lists, graphs represent relationships between entities in way that mirrors how connections are formed. At its core, graph is simply collection of nodes connected by edges. Its like a network where each point represents an entity, and the lines between them represent relationships or connections

Covering the fundamentals

Graphs are simple and flexible, every graph consists of just two fundamental components which are vertices and edges. Vertices are the individual data point or nodes that store information, while edges are the connections that define relationships between these vertices.

Taking an example like LinkedIN. Yeahh LinkedIN XDD Each user would be represented as a vertex, containing information like name, age and location. The friendship/connection between users would be represented as edges connecting these vertices. This simple model can represent millions of users and their complex web of relationships

Edges can carry additional information too. In a weighted graph, each edge has numerical value associated with it. For instance we in a road network, vertices might represent cities, and edges might represent roads with weights indicating the distance or travel time between cities. This additional information transforms a simple connection into a rich data relationship.

Now that we understand the basic building blocks, how do we actually store and organize this information in computer memory?

Graph Representation and Memory

When it comes to storing graphs in computer memory, we have two primary approaches: adjacency matrices and adjacency lists. Each method has its own strengths and is suited for different scenarios, depending on whether the graph is directed or undirected. In a directed graph, edges have a direction, meaning the relationship from vertex A to vertex B is not necessarily reciprocal. In an undirected graph, edges are bidirectional, so an edge between A and B implies a mutual connection..

Adjacency matrices use a 2D array where entry (i,j) indicates whether there's an edge between vertex i and vertex j. For a graph with n vertices, this creates an n×n matrix. In an undirected graph, the matrix is symmetric because an edge from i to j implies an edge from j to i, whereas in a directed graph, the matrix may be asymmetric since edges are one-way. While this approach uses more memory (O(n²) space), it provides constant-time lookup (O(1)) to check if two vertices are connected, making it ideal for dense graphs or when frequent edge queries are needed.

class AdjacencyMatrix {
  constructor(n) {
    this.numVertices = n;
    // Initialize a 2D matrix with false (no edges initially)
    this.matrix = new Array(n);
    for (let i = 0; i < n; i++) {
        this.matrix[i] = new Array(n).fill(false);
    }
}

    // Add an undirected edge between u and v
    addEdge(u, v) {
      this.matrix[u][v] = true;
      this.matrix[v][u] = true; // For undirected graph
    }

    // Check if there is an edge between u and v
    hasEdge(u, v) {
      return this.matrix[u][v];
    }
}

Lets talk about some memory here, For n vertices, adjacency matrices require exactly n² memory locations, regardless of the actual number of edges. With boolean values, this means n²/8 bytes (since booleans can be packed). For a social network with 1 million users, this translates to 125 GB of memory just for the adjacency matrix, even if most users have only a few hundred connections. The memory layout is cache-friendly for row-wise access patterns, but checking all neighbors of a vertex requires scanning an entire row, touching n memory locations regardless of the actual degree.

Adjacency lists, conversely, store a graph by maintaining a list for each vertex, where each list contains the vertices adjacent to it. For an undirected graph, each edge appears in the lists of both vertices it connects, while in a directed graph, an edge from i to j appears only in i's list. This method is more memory-efficient for sparse graphs (O(V + E) space, where V is vertices and E is edges) but requires O(degree(v)) time to check if an edge exists, where degree(v) is the number of neighbors of vertex v.

In adjaceny lists emory usage scales with O(V + E), where V is vertices and E is edges. For our million-user social network with an average of 300 connections per user, this requires only about 2.4 GB of memory - a 50x improvement over adjacency matrices. However, checking if a specific edge exists becomes O(degree) operation, requiring a linear search through the neighbor list. This can be optimized using hash sets instead of vectors for neighbor storage.

class AdjacencyList {
    constructor(n) {
        this.numVertices = n;
        this.adjList = new Array(n);
        for (let i = 0; i < n; i++) {
            this.adjList[i] = []; // Initialize each vertex's neighbor list
        }
    }

    // Add an undirected edge between u and v
    addEdge(u, v) {
        this.adjList[u].push(v);
        this.adjList[v].push(u); // For undirected graph
    }

    // Get neighbors of vertex u
    getNeighbors(u) {
        return this.adjList[u];
    }
}

const graph = new AdjacencyList(5);

graph.addEdge(0, 1);
graph.addEdge(1, 2);
graph.addEdge(2, 3);

console.log(graph.getNeighbors(0)); // [1]
console.log(graph.getNeighbors(1)); // [0, 2]
console.log(graph.getNeighbors(2)); // [1, 3]

With our data properly organized, what can actually do with these graph structures ??

Exploring Graphs

Graph traversal algorithms are the foundation for most graph operations. They allow us to systematically visit every vertex in a graph, forming the basis for more complex algorithms. The two fundamental traversal methods are Depth-First Search (DFS) and Breadth-First Search (BFS).

For new audience (if any, yeah I am delusional) I have explained Depth First Search and Breadth First Search in my previous blog about Algorithms from Beginners POV do check it out. I will be just skimming through the topics here :))

Depth-First Search explores a graph by going as deep as possible along each branch before backtracking. It's like we are exploring a maze by always taking the first available path and only turning back when we hit a dead end. DFS uses a stack (either explicitly or through recursion) to keep track of vertices to visit. This approach is excellent for problems like detecting cycles, finding connected components, or exploring all possible paths.

Breadth-First Search, conversely, explores all vertices at the current depth before moving to vertices at the next depth level. It's like ripples spreading out from a stone dropped in water. BFS uses a queue to ensure vertices are visited in order of their distance from the starting point. This makes it perfect for finding the shortest path in unweighted graphs or for level-order traversals.

These traversal methods sound useful, but how do they help us solve real-world problems like finding the shortest route between two locations?

Finding the Optimal Route

Finding the shortest path between two points is one of the most practical applications of graph algorithms. While BFS can find the shortest path in unweighted graphs, real-world scenarios often involve weighted edges where we need more sophisticated approaches.

Dijkstra's algorithm is the gold standard for finding shortest paths in weighted graphs with non-negative edge weights. It works by maintaining a set of vertices whose shortest distance from the source is known, gradually expanding this set by always choosing the vertex with the minimum tentative distance. Think of it as simultaneously exploring all possible routes from your starting point, but always prioritizing the most promising paths.

The algorithm maintains a priority queue of vertices to visit, always processing the one with the smallest known distance first. As it visits each vertex, it updates the distances to its neighbors if a shorter path is found. This greedy approach guarantees finding the optimal solution.

For graphs with negative edge weights, the Bellman-Ford algorithm provides a solution, though it's slower than Dijkstra's algorithm. It works by repeatedly relaxing all edges, gradually improving distance estimates until no further improvements are possible.

These algorithms power GPS navigation systems, network routing protocols, and any application where finding optimal paths is crucial.

Shortest paths are fascinating, but what about scenarios where we need to connect multiple points efficiently, like designing a network infrastructure ??

Connecting Everything Efficiently

When we need to connect all vertices in a graph with the minimum total cost, we are looking for a Minimum Spanning Tree (MST). This is crucial in network design, where we want to ensure all nodes are connected while minimizing the total cost of connections.

A spanning tree of a graph is a subgraph that includes all vertices and is connected (you can reach any vertex from any other vertex) but contains no cycles. The minimum spanning tree is the spanning tree with the smallest total edge weight.

Kruskal's algorithm approaches this problem by sorting all edges by weight and adding them to the MST one by one, skipping any edge that would create a cycle. It uses a disjoint set data structure to efficiently detect cycles. This greedy approach works because the optimal solution always includes the cheapest available connection that doesn't create redundancy.

Prim's algorithm takes a different approach, starting with an arbitrary vertex and growing the MST by repeatedly adding the cheapest edge that connects a vertex in the MST to a vertex outside it. Both algorithms guarantee finding the optimal solution, but they approach the problem from different angles.

class PriorityQueue {
    constructor() {
        this.items = [];
    }

    enqueue(item, priority) {
        this.items.push({ item, priority });
        this.items.sort((a, b) => a.priority - b.priority);
    }

    dequeue() {
        return this.items.shift().item;
    }

    isEmpty() {
        return this.items.length === 0;
    }
}

function primMST(graph) {
    const vertices = Object.keys(graph);
    if (vertices.length === 0) return [];

    // Initialize data structures
    const parent = {};
    const key = {};
    const inMST = {};
    const pq = new PriorityQueue();

    // Initialize all keys to Infinity and parents to null
    vertices.forEach(vertex => {
        key[vertex] = Infinity;
        parent[vertex] = null;
        inMST[vertex] = false;
    });

    // Start with the first vertex
    const startVertex = vertices[0];
    key[startVertex] = 0;
    pq.enqueue(startVertex, 0);

    while (!pq.isEmpty()) {
        const currentVertex = pq.dequeue();
        inMST[currentVertex] = true;

        // Explore all adjacent vertices
        for (const neighbor in graph[currentVertex]) {
            const weight = graph[currentVertex][neighbor];

            // If neighbor is not in MST and weight is less than current key
            if (!inMST[neighbor] && weight < key[neighbor]) {
                parent[neighbor] = currentVertex;
                key[neighbor] = weight;
                pq.enqueue(neighbor, key[neighbor]);
            }
        }
    }

    // Construct the MST edges (excluding the root)
    const mst = [];
    for (const vertex in parent) {
        if (parent[vertex] !== null) {
            mst.push({
                from: parent[vertex],
                to: vertex,
                weight: graph[parent[vertex]][vertex]
            });
        }
    }

    return mst;
}

// using Prim's algorithm
const graph = {
    'A': { 'B': 2, 'D': 6 },
    'B': { 'A': 2, 'C': 3, 'D': 8, 'E': 5 },
    'C': { 'B': 3, 'E': 7 },
    'D': { 'A': 6, 'B': 8, 'E': 9 },
    'E': { 'B': 5, 'C': 7, 'D': 9 }
};

const minimumSpanningTree = primMST(graph);
console.log("Minimum Spanning Tree Edges:");
console.log(minimumSpanningTree);

These algorithms are essential in designing telecommunication networks, electrical grids, and any infrastructure where you need universal connectivity at minimum cost.

Directed Graphs and Network Flow

Directed graphs introduce concepts like in-degree and out-degree (the number of incoming and outgoing edges for each vertex). They also enable topological sorting, which arranges vertices in a linear order such that for every directed edge from vertex A to vertex B, A appears before B in the ordering. This is crucial for scheduling tasks with dependencies.

Network flow problems take directed graphs further by adding capacity constraints to edges. Like a network of pipes where each pipe can only carry a certain amount of fluid. The maximum flow problem asks: what's the maximum amount of flow you can push from a source to a sink?

The Ford-Fulkerson algorithm solves this by repeatedly finding augmenting paths (paths from source to sink with available capacity) and pushing flow along them until no more augmenting paths exist. This approach has applications in traffic management, resource allocation, and even matching problems.

/**
* Ford-Fulkerson Algorithm for Maximum Flow in a Flow Network
* 
* This implementation uses the Edmonds-Karp approach (BFS for finding augmenting paths)
* to ensure polynomial time complexity O(VE²).
* 
* The algorithm finds the maximum possible flow from a source node to a sink node
* in a directed graph where each edge has a capacity.
*/

class FlowNetwork {
    /**
    * Initialize the flow network with adjacency list representation
    * @param {number} vertices - Number of vertices in the graph
    */
    constructor(vertices) {
        this.V = vertices; // Number of vertices
        this.graph = new Array(vertices); // Residual graph
        
        // Initialize adjacency list for each vertex
        for (let i = 0; i < vertices; i++) {
            this.graph[i] = new Array(vertices).fill(0);
        }
    }

    /**
    * Add an edge to the flow network with a given capacity
    * @param {number} u - Source vertex
    * @param {number} v - Destination vertex
    * @param {number} capacity - Maximum flow capacity of the edge
    */
    addEdge(u, v, capacity) {
        this.graph[u][v] = capacity; // Forward edge capacity
        this.graph[v][u] = 0; // Backward edge initially 0 (for residual graph)
    }

    /**
    * Breadth-First Search to find if there's a path from source to sink with available capacity
    * Also stores the path found in parent[] array
    * @param {number} s - Source vertex
    * @param {number} t - Sink vertex
    * @param {number[]} parent - Array to store the path
    * @returns {boolean} - True if path exists, False otherwise
    */
    bfs(s, t, parent) {
        // Create a visited array and mark all vertices as not visited
        const visited = new Array(this.V).fill(false);
        
        // Create a queue for BFS, enqueue source vertex
        const queue = [];
        queue.push(s);
        visited[s] = true;
        parent[s] = -1; // Source has no parent

        // Standard BFS loop
        while (queue.length > 0) {
            const u = queue.shift();

            // Explore all adjacent vertices
            for (let v = 0; v < this.V; v++) {
                // If vertex not visited and residual capacity > 0
                if (!visited[v] && this.graph[u][v] > 0) {
                    // If we reach the sink, we have a path
                    if (v === t) {
                        parent[v] = u;
                        return true;
                    }
                    
                    queue.push(v);
                    parent[v] = u;
                    visited[v] = true;
                }
            }
        }

        // We didn't reach the sink
        return false;
    }

    /**
    * Main function implementing Ford-Fulkerson algorithm
    * @param {number} source - Source vertex
    * @param {number} sink - Sink vertex
    * @returns {number} - Maximum flow from source to sink
    */
    fordFulkerson(source, sink) {
        // Validate input
        if (source < 0 || source >= this.V || sink < 0 || sink >= this.V) {
            throw new Error("Invalid source or sink vertex");
        }
        if (source === sink) {
            return 0; // No flow if source and sink are same
        }

        // This array is filled by BFS and stores path
        const parent = new Array(this.V).fill(-1);
        let maxFlow = 0; // Initialize max flow to 0

        // Augment the flow while there is path from source to sink
        while (this.bfs(source, sink, parent)) {
            // Find minimum residual capacity of the edges along the path
            let pathFlow = Infinity;
            
            // Traverse from sink to source using parent array
            for (let v = sink; v !== source; v = parent[v]) {
                const u = parent[v];
                pathFlow = Math.min(pathFlow, this.graph[u][v]);
            }

            // Update residual capacities of the edges and reverse edges
            for (let v = sink; v !== source; v = parent[v]) {
                const u = parent[v];
                // Subtract path flow from forward edge
                this.graph[u][v] -= pathFlow;
                // Add path flow to reverse edge
                this.graph[v][u] += pathFlow;
            }

            // Add path flow to overall flow
            maxFlow += pathFlow;
        }

        return maxFlow;
    }
}


function main() {
    // Create a flow network with 6 vertices (0 to 5)
    const g = new FlowNetwork(6);

    // Add edges with capacities
    g.addEdge(0, 1, 16); // s -> v1
    g.addEdge(0, 2, 13); // s -> v2
    g.addEdge(1, 2, 10); // v1 -> v2
    g.addEdge(1, 3, 12); // v1 -> v3
    g.addEdge(2, 1, 4);  // v2 -> v1
    g.addEdge(2, 4, 14); // v2 -> v4
    g.addEdge(3, 2, 9);  // v3 -> v2
    g.addEdge(3, 5, 20); // v3 -> t
    g.addEdge(4, 3, 7);  // v4 -> v3
    g.addEdge(4, 5, 4);  // v4 -> t

    const source = 0; // Source vertex (s)
    const sink = 5;   // Sink vertex (t)

    console.log("Running Ford-Fulkerson algorithm...");
    const maxFlow = g.fordFulkerson(source, sink);
    console.log(`The maximum possible flow from ${source} to ${sink} is: ${maxFlow}`);
}

main();

Here's my few notes on graphs and why am I investing my time in graphs

Web search engines model the internet as a massive directed graph where web pages are vertices and hyperlinks are edges. Google's PageRank algorithm uses this graph structure to determine page importance based on the link structure, revolutionizing how we find information online.

After spending so much time with machine learning algorithms and neural network, ml on graphs is an ongoing thing which everyone should once go through, with graph neural networks learning representations directly from graph structure. This opens new possibilities for problems like node classification, link prediction, and graph generation.

This is all from myside on Graphs. Hope I was able to add few value to your today's learning :)