Graph Algorithms: Part I

Graph theory is one of the most important topics of study in computer science. It is so, because a lot of real world problems can be modelled using graphs. A lot of natural and human-made structures, are in essence, graphs. For example, an organization’s administrative hierarchy can be modeled as a graph with certain employees working under certain project leads, which in turn report to certain department heads. It can be used to model transportation systems, electric circuits, human interactions, and telecommunication systems. Basically anything representing a relation of some sort between members of a set/group, can be essentially represented as a graph. That is why study of graphs is fundamental to computer modeling. In this first piece in a series of posts I want to write about graphs, we are going to introduce graphs, how they can be represented in computation, and some basic but very important algorithms that deal with graph traversal and exploration.

Introduction:

Formally, a graph comprises of a set of vertices, represented as V, and a set of edges between members of V, represented as E. So a graph, labelled as G, is represented in symbols as, G = (V, E). E is a binary relation between members of V. The total number of vertices in V is represented as |V|, and the number of edges in E as |E|.

A graph representing a group of people, and which member of the group is related to which other member of the group, would be represented as:

V = {Jon, Emma, Ali, Tom, Sam}

E = {(Jon, Emma), (Tom, Sam), (Ali, Emma)}

Which shows the members (vertices) in the graph are 5 people named as given in V, and the relations that exist between two members of the group, in E.

We usually just label the vertices using integers, for convenience.

A graph can be undirected, or directed. In an undirected graph, if there is an edge (x, y), it implies there is also an edge (y, x). In a directed graph, if there is an edge (x, y), it does not necessarily mean there is an edge (y, x) present as well, unless it is given explicitly.

A graph can be weighted or unweighted. In weighted graphs, each edge have an associated weight with them, which can be any real number. This weight represents the cost of moving from one vertex to another using this edge. For example a graph representing the road structure of a city, the weight of an edge in the graph can represent the distance between two junctions. In an unweighted graph, no weight is associated with edges, and the weight is implied to be a single unit for all edges.

Representation of Graphs:

Graphs can be represented in a couple of different ways in computational models.

Adjacency Lists:

To represent a graph as adjancey list, we have a list of lists, having a list for each vertex in the graph. A list of a vertex v, contains all other vertices, which have an edge coming from v.

Adjacency list

The benefit of using adjacency list to represent a graph is that it takes much less space when the number of edges is lower compared to the total number of vertices present in the graph. The space complexity of adjacency list is O(V+E). As we will see, the other method of graph representation has a much higher space complexity. The downside of adjacency list is that an edge look up takes linear time. So to check whether vertex 1 has an edge to vertex 5, we have to traverse through the complete list of adjacency of vertex 1.

Adjacency Matrix:

In adjacency matrix, we represent a graph as a matrix of dimensions V*V, where each cell in row r, and column c, represent whether there is an edge from vertex r, to vertex c, or not.

Adjacency matrix

If an edge is present, we mark the cell as true, or 1. And it there is no edge between the two vertices, we mark the cell as false, or 0. Whether or not an edge is present between two vertices, we have to have a cell representing that. This makes the space complexity of adjacency matrix, O(V*V). If total number of edges is much less then V*V, meaning the graph is sparse, this consumes lots of extra unnecessary space. But the plus side is that adjacency matrices are very easy to construct and traverse. The look up time is O(1), as we can directly access a cell to check whether an edge exists or not. If a graph is unweighted, we can use a boolean matrix, taking up only one bit per cell, to indicated presence/absence of an edge. This greatly reduces the space usage. For a weighted graph, we can use the weight of an edge to fill up the cell. Putting a nil value, or 0, to indicated absence of an edge.

Breadth First Search:

Breadth First Search, abbreviated as BFS, is one of the fundamental graph traversal techniques. It is also one of the simplest graph traversal mechanism, and archetype of many other ‘advanced’ graph search algorithms. In BFS, we start with a source vertex ‘s’, and visit all other vertices that are reachable from this source vertex, in a breadth first fashion. That is, we first visit all the vertices that are directly linked to s, then in the second phase, we visit all the vertices that are two edges away from s, and so on. BFS works as the shortest distance finder from a source vertex, to a target vertex, where the shortest distance means the shortest number of edges. That is why BFS works well on unweight graphs, when the shortest distance between any pair of vertices means the shortest number of edges that we need to travel to reach from source to target. In BFS, we visit each vertex only once, and mark it as visited once we are done discovering all of its neighbors. We need not visit it again, otherwise it can lead to infinite loops, never coming to a halt. As we discover a new unvisited vertex, we can also mark its parent, i.e. the current vertex we discovered this new vertex from. We can use this later on to reconstruct the path that was taken to reach a certain vertex from the source.

BFS works by keeping a queue of discovered but yet unvisited vertices. At first we only have the source vertex. In each step, we pull out the vertex at front of this queue, process it, and queue up its adjacent unvisited vertices. Since a queue is a FIFO structure, traversing the graph in this way leads to visiting a vertex and all of its siblings first, before any of its children are visited.

Pseudo code for BFS is as following:

We start wight a source vertex s. We enqueue this vertex and mark it. In the while loop, we keep on enqueuing unmarked vertices, till we either reach our target vertex, or exhaust all the vertices and our target isn’t found. The while loop executes a total of |V| times, while the inner for loop, overall, executes |E| times. So the time complexity of BFS is O(V+E).

Depth First Search:

Depth First Search, abbreviated as DFS, differs from BFS in the order in which it visits the vertices. It starts with a source vertex ‘s’, and visits all its children, before visiting any of its siblings. We use DFS when finding the shortest distance to the target isn’t a requirement. DFS is pretty similar to BFS, but instead of a queue, we use a stack. For this reason, DFS can be implemented as a recursive method too, since recursive functions use an implicit stack.

Pseudo code for BFS is as following:

We can use DFS to detect cycles in a graph. If we ever reach at a vertex that is already marked, it means it is an ancestor of the current vertex, and a descendant pointing at an ancestor shows there is a cycle. Time complexity of DFS is same as BFS, i.e. O(V+E).

Conclusion:

In this part, we discussed what graphs are, how to represent them in computational models, and two ways of traversing graphs. In the subsequent parts, we are going to see how we can use these fundamental traversal techniques to perform some interesting analysis and searches on a graph.

Be Sociable, Share!

Leave a Reply