A Graph is an important data structure in computer science; it is defined as a collection of nodes with “edges” between some of the nodes. When we talk about Graphs that category includes Trees, however not all Graphs are Trees.

“Graphs arise in various real-world situations as there are road networks, computer networks and, most recently, social networks! If you’re looking for the fastest time to get to work, cheapest way to connect a set of computers into a network or efficient algorithm to automatically find communities and opinion leaders hot in Facebook, you’re going to work with graphs and algorithms on graphs” (Ref#: I). Google Maps, for example, will make heavy use of graph-based algorithms, but there are more applications than it’s possible to enumerate in this article.

Graph Theory & History

There is indeed a whole theoretical area associated with Graphs which is strangely enough called Graph Theory and which is considered to have begun with the publication of the work ‘Seven Bridges of Königsberg’ by Leonhard Euler back in 1736 (Ref#: K). This work explored the problem of devising a walk through a city of seven bridges (Königsberg, Prussia now Kaliningrad, Russia) where one would cross each of the bridges once and only once, and Euler explored the problem by abstracted the problem removing all unnecessary detail – this work prefigured graph theory.

Directed or Undirected

In a graph, edges that are directed are “like a one-way street, undirected edges are like a two-way street”(Ref# A). “For an undirected graph, an unordered pair of nodes that specify a line joining these two nodes are said to form an edge. For a directed graph, the edge is an ordered pair of nodes” (Ref#: C).

Connected Vs Disconnected Graphs

It’s also possible for a Graph to consist of multiple isolated sub-graphs but if a path exists between every pair of vertices then that would be called a connected graph.

Source: Ref#:M

It’s important to note that “islands of nodes can cause unexpected behavior” with things like getting stuck inside disconnected components, or possibly failing to properly process disconnected components of the graph when using a particular graph algorithm.

Cyclic Vs Acyclic Graphs

Furthermore, graphs can have “cycles” in them, but if a Graphs does not have any cycles we call that an acyclic graph or a Directed Acyclic Graph (or DAG). The below illustration is the simplest representation of this difference:

A really simple and old example of DAGs is found in the simple representation of Family Trees, dating back at least as far as Roman times. More modern use cases are found in things like version history, data compression algorithms, and data processing networks. We see a manifestation of DAGs indeed in the rise of the blockchain phenomenon with a really cool use case coming from the merging of DAG and blockchains in allowing for a new kind of distributed ledger scalability (Ref#: K)

“In graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by at most one path, or equivalently an acyclic undirected graph, or equivalently a disjoint union of trees.

Unweighted Vs Weighted Graphs

In weighted graphs, each edge is assigned a weight. This weight can represent the strength of the connection between two nodes, or a value associated with moving between two nodes.

A weighted graph an interesting an powerful data structure and can be used, in combination with other techniques (with graph databases), for things like analyzing links between identified individuals in a potential terrorist network, as well as more mundane applications like finding the cheapest flight between two airports.

A lot of graph algorithms expect weights, and we would naturally, therefore, see significant differences in performance and results when they’re ignored.

Monopartite, Bipartite, and k-Partite Graphs

Graphs with only one node type and relationship type are called monopartite graphs, graphs whose node can be divided into two sets are called bipartite graphs, and graphs with any given number of node types are called k-partite graphs and reference the number of node types the data has as k (Ref#: M).

Representing Graphs in Code

In the context of computer programming, we use either an Adjacency List or Adjacency Matrices to represent a graph.

Adjacency List in Swift

“The basic idea of an adjacency list is you store every single vertex. Each vertex will hold an adjacency list. This describes the outgoing edges” (Ref#: B).

There are different ways of implementing an adjacency list such as an Array of Arrays, an Array of Linked-Lists, or a Dictionary of Arrays. In the array of arrays approach the outer array represents vertices, providing an index with the inner array contains the edges. An array of linked-lists where each index in the array represents a vertex, and each value in the array stores a linked-list (fast insertion or deletion times). Finally, with a dictionary of arrays, each key in the dictionary is a vertex, and each value is the corresponding array of edges (Ref#: B).

For a full rundown of implementation using an Adjacency List in Swift, I would refer you to this Ray Wenderlich article and the associated code in the Swift Algorithm Club repo: https://www.raywenderlich.com/773-swift-algorithm-club-graphs-with-adjacency-list

Adjacency Matrices in Swift

“In an adjacency matrix implementation, a matrix with rows and columns representing vertices stores a weight to indicate if vertices are connected and by what weight” (Ref#: D).

An example of code for representing a Graph using an Adjacency Matrix can be found here: http://www.letscodethemup.com/graph-representation-adjacency-matrix/

Adjacency Matrix Vs Adjacency List

It turns out that for most-used cases, an adjacency list tends to be the most optimal approach to representing and using a graph(Ref#: D).

Graph Algorithms

“Graph algorithms are a subset of tools for graph analytics. Graph analytics is something we do—it’s the use of any graph-based approach to analyze connected data. There are various methods we could use: we might query the graph data, use basic statistics, visually explore the graphs, or incorporate graphs into our machine learn‐ ing tasks. Graph pattern-based querying is often used for local data analysis, whereas graph computational algorithms usually refer to more global and iterative analysis. Although there is overlap in how these types of analysis can be employed, we use the term graph algorithms to refer to the latter, more computational analytics and data science uses”(Ref#: M).

Types of Graph Algorithms

Some types of graph algorithms include Pathfinding, Centrality, and Community Detection:

With Pathfinding, we are typically looking to establish things like the shortest path between two given nodes, and this kind of pathfinding is a precursor for several different types of analysis. With Centrality, our algorithm looks to identify the most “important” nodes in a given network. Connectivity algorithms are used to find communities and also then to quantify the quality of particular groupings (Ref#: M).

Graph Search

There are a variety of graph search algorithms, and these tend to be very in how optimal they are based on what you want to do with them.

SOURCE: https://cs.stanford.edu/people/abisee/gs.pdf

DEMO: https://cs.stanford.edu/people/abisee/tutorial/astar.html

SOURCE: Ref#:F

Breadth-First vs Depth-First Search

Breadth-First Search of a Graph typically works using a Queue, and a Depth-First Search of a Graph typically works using a Stack (see my article of Stacks & Queues).

Breadth-First Search (using a queue)

For BFS, “We start at the root (or another arbitrarily selected node) and explore each neighbor before going on to any of their children”

Here’s a simple swift example:

Source: Swift Algorithm Club

Depth-First Search (DFS)

For DFS, “We start at the root (or another arbitrarily selected node) and explore each branch completely before moving onto the next branch”, basically we go deep first (Ref#: A).

Graph Databases

“In computing, a graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data“. “Graph databases are part of the NoSQL databases created to address the limitations of the existing relational databases” (Wikipedia).

“A NoSQL (originally referring to “non SQL” or “non relational”) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but did not obtain the “NoSQL” moniker until a surge of popularity in the early 21st century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in big data and real-time web applications. NoSQL systems are also sometimes called “Not only SQL” to emphasize that they may support SQL-like query languages, or sit alongside SQL databases in polyglot persistent architectures”(Wikipedia).

Graph Database Use Cases

Use cases for Graph Databases include Fraud Detection, Real-Time Recommendation Engines (companies like Walmart use this to recommend products to online shoppers), Master Data Management, Network & IT Operations, and Identity & Access Management (Source: Neo4j Whitepaper).

Neo4j

One way of working with Graph Databases is by using software like Neo4j.

Neo4j is a graph database management system developed by Neo4j, Inc. Described by its developers as an ACID-compliant transactional database with native graph storage and processing, Neo4j is the most popular graph database according to DB-Engines ranking, and the 22nd most popular database overall” (Wikipedia).

Neo4j is implemented in Java and accessible from software written in other languages using the Cypher Query Language through a transactional HTTP endpoint, or through the binary “bolt” protocol.

Neo4j uses its own query language called Cypher, which looks like this:

“Neo4j is available in a GPL3-licensed open-source “community edition”, with online backup and high availability extensions licensed under a closed-source commercial license. Neo also licenses Neo4j with these extensions under closed-source commercial terms.

More Random Information on Graphs?

Hypergraphs In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. (Also See: “Hyperedges”, and “Vertex Clusters”).

Logic of Graphs – “In the mathematical fields of graph theory and finite model theory, the logic of graphs deals with formal specifications of graph properties using formulas of mathematical logic”.

Graphical Models

“Graphical models bring together graph theory and probability theory in a powerful formalism for multivariate statistical modeling. In various applied fields including bioinformatics, speech processing, image processing and control theory, statistical models have long been formulated in terms of graphs, and algorithms for computing basic statistical quantities such as likelihoods and score functions have often been expressed in terms of recursions operating on these graphs; examples include phylogenies, pedigrees, hidden Markov models, Markov random fields, and Kalman filters” (Ref#: P).

Conclusion

Graphs are a super important data-structure, in the computer science world Graph are used for loads of purposes: everything from direction-finding between two addresses, to recommendation engines using graph databases powering the might of large online retailers. 

In this article, we covered the definition of what a Graph is, and we went on to look at the definition of different types of Graph Search and how these are commonly used, and which algorithms are typically the best for different purposes. Furthermore, we briefly looked at graph databases and their use cases.

References

A: Cracking the Coding Interview – 6th Edition

B: https://www.raywenderlich.com/773-swift-algorithm-club-graphs-with-adjacency-list

C: http://mathworld.wolfram.com/GraphEdge.html

D: https://github.com/raywenderlich/swift-algorithm-club/tree/master/Graph

E: https://cs.stanford.edu/people/abisee/gs.pdf

F: https://www.redblobgames.com/pathfinding/a-star/introduction.html

G: https://neo4j.com/blog/graph-search-algorithm-basics/

H: https://www.coursera.org/lecture/algorithms-graphs-data-structures/graph-search-overview-NX0BI

I: https://www.coursera.org/learn/algorithms-on-graphs?specialization=data-structures-algorithms

J: https://www.ebi.ac.uk/training/online/course/network-analysis-protein-interaction-data-introduction/introduction-graph-theory/graph-0

K: https://medium.com/fantomfoundation/history-and-use-cases-of-dags-25f05663d40d

L: http://www.letscodethemup.com/graph-representation-adjacency-matrix/

M: Needham, M & Hodler, A. E. (2019). Graph Algorithms – Practical Examples in Apache Spark & Neo4j. O’Reilly.

N: https://en.wikipedia.org/wiki/Graph_theory

O: https://en.wikipedia.org/wiki/Logic_of_graphs

P:  Wainwright, M. J.  &  Jordan, M. I. (2008). Graphical Models, Exponential Families, and Variational Inference. Foundations and TrendsR in Machine Learning Vol. 1, Nos. 1–2 (2008) 1–305.

Q: https://www.youtube.com/watch?v=Oy2nNPJ0oEI

Last modified: August 30, 2019

Author

Comments

Comments