Visualizing Diffs

The Myers difference algorithm

In 1986, Eugene Myers published An O(ND) Difference Algorithm and Its Variations, which unified the problems of finding the longest common subsequence of two sequences (the LCS of "driftwood" and "artwork" is "two") and finding the shortest edit script for transforming one sequence into another. Myers showed that these problems were equivalent to finding the shortest path over an "edit graph."

His algorithm improved the popular diff utility, a data comparison tool that displays the smallest set of line-by-line deletions and insertions to transform one file into another. Figure 1 contains an example of diff output, itself called a "diff", when given two similar poems: Colin Morton's 1981 "Empty Bottles" and David Morgan's 2011 plagiarism "Monkey Stops Whistling."

In the output, lines prefixed by a "+" are insertions from the destination file, lines prefixed by a "-" are deletions from the source file, and lines lacking a "+" or "-" prefix are lines shared by both sequences.

Figure 1 - Excerpted "diff" of Colin Morton's 1981 poem and David Morgan's plagiarism

To form an edit graph, Myers arranged the source sequence left-to-right along an x-axis and the destination sequence top-to-bottom along a y-axis with an orthogonal grid of horizontal and vertical edges projected from them. Starting in the upper-left and seeking the bottom-right, each move along the graph's edges toward a vertex would correspond to an edit instruction on the source sequence. Horizontal moves would be deletions from the source element; vertical moves would be insertions of the destination element. Where source and destination elements of the sequence are equivalent, diagonal movement is permitted.

Figure 2 - Edit graph with source sequence on x-axis and destination sequence on y-axis

Every trace across this directed, acyclic edit graph is a valid solution. Myers proposed a type of breadth-first search to discover the shortest traversal in either O(ND) space and time or, using a proposed refinement, in O(N) space and O(NlgN+D2) time where D is the size of the minimum edit script.

Figure 3 - Slowed, looped animation of Myers's search fanning out across the edit graph