New

How to Compare Two Text Files in the Linux Terminal

Illustration of a terminal window on LinuxFatmawati Achmad Zaenuri/Shutterstock.com

Have to see the variations between two revisions of a textual content file? Then diff is the command you want. This tutorial exhibits you methods to use diff on Linux and macOS, the straightforward means.

Diving into diff

The diff command compares two information and produces an inventory of the variations between the 2. To be extra correct, it produces an inventory of the modifications that may must be made to the primary file, to make it match the second file. For those who hold that in thoughts you’ll discover it simpler to know the output from diff. The diff command was designed to seek out variations between supply code information and to supply an output that might be learn and acted upon by different packages, such because the patch command. On this tutorial, we’re going to take a look at probably the most helpful human-friendly methods to make use of diff.

Let’s dive proper in and analyze two information. The order of the information on the command line determines which file diff considers to be the ‘first file’ and which it considers to be the “second file.” Within the instance under alpha1 is the primary file, and alpha2 is the second file. Each information include the phonetic alphabet however the second file, alpha2, has had some additional modifying in order that the 2 information are usually not equivalent.

We will examine the information with this command. Sort diff, an area, the identify of the primary file, an area, the identify of the second file, after which press Enter.

diff alpha1 alpha2

Output from diff command with no options

How can we dissect that output? As soon as you already know what to search for it’s not that dangerous. Every distinction is listed in flip in a single column, and every distinction is labeled. The label accommodates numbers both aspect of a letter, like 4c4. The primary quantity is the road quantity in alpha1, and the second quantity is the road quantity in alpha2.  The letter within the center could be:

  • c: The road within the first file must be modified to match the road within the second file.
  • d: The road within the first file have to be deleted to match the second file.
  • a: Additional content material have to be added to the primary file to make it match the second file.

The 4c4 in our instance inform us that line 4 of alpha1 have to be modified to match line 4 of alpha2. That is the primary distinction between the 2 information that diff discovered.

Strains that start with confer with the second file, alpha2. The road Dave tells us that the phrase Dave is the content material of line 4 in alpha2. To summarise then, we have to exchange Delta with Dave on line 4 in alpha1, to make that line match in each information.

The subsequent change is indicated by the 12c12. Making use of the identical logic, this tells us that line 12 in alpha1 incorporates the phrase Lima, however line 12 of alpha2 incorporates the phrase Linux.

The third change refers to a line that has been deleted from alpha2. The label 21d20 is deciphered as “line 21 must be deleted from the primary file to make each information synchronize from line 20 onwards.” The < Uniform line exhibits us the content material of the road which must be deleted from alpha1.

The fourth distinction is labeled 26a26,28. This alteration refers to 3 additional strains which were added to alpha2. Notice the 26,28 within the label. Two-line numbers separated by a comma represents a variety of line numbers. On this instance, the vary is from line 26 to line 28. The label is interpreted as “at line 26 within the first file, add strains 26 to 28 from the second file.” We’re proven the three strains in alpha2 that must be added to alpha1. These include the phrases Quirk, Unusual, and Allure.

Snappy One-Liners

For those who all you need to know is whether or not two information are the identical, use the -s (report equivalent information) choice.

diff -s alpha1 alpha3

Output of the diff command with -s option

You need to use the -q (temporary) choice to get an equally terse assertion about two information being totally different.

diff -q alpha1 alpha2

Output of the diff command with -q option

One factor to be careful for is that with two equivalent information the-q (temporary) choice utterly clams up and doesn’t report something in any respect.

An Various View

The -y (aspect by aspect) choice makes use of a special format to explain the file variations. It’s typically handy to make use of the -W (width) choice with the aspect by aspect view, to restrict the variety of columns which are displayed. This avoids ugly wrap-around strains that make the output troublesome to learn. Right here we have now advised diff to supply a aspect by aspect show and to restrict the output to 70 columns.

diff -y -W 70 alpha1 alpha2

Output of the diff command with side by side display

The primary file on the command line, alpha1, is proven on the left and the second line on the command line, alpha2, is proven on the best. The strains from every file are displayed, aspect by aspect. There are indicator characters alongside these strains in alpha2 which were modified, deleted or added.

  • |: A line that has been modified within the second file.
  • <: A line that has been deleted from the second file.
  • >: A line that has been added to the second file that isn’t within the first file.

For those who’d choose a extra compact aspect by aspect abstract of the file variations, use the –suppress-common-lines choice. This forces diff to record the modified, added or deleted strains solely.

diff -y -W 70 –suppress-common-lines alpha1 alpha2

Output of the diff command with --suppress-common-lines option

Add a Splash of Colour

One other utility referred to as colordiff provides colour highlighting to the diff output. This makes it a lot simpler to see which strains have variations.

Use apt-get to put in this package deal onto your system should you’re utilizing Ubuntu or one other Debian-based distribution. On different Linux distributions, use your Linux distribution’s package deal administration device as an alternative.

sudo apt-get set up colordiff

Use colordiff simply as you’d use  diff.

Output of the colordiff command with no options

In reality, colordiff is a wrapper for diff, and diff does all of the work behind the scenes. Due to that, all the diff choices will work with colordiff.

Output of the colordiff command with --suppress-common-lines option

Offering Some Context

To seek out some center floor between having all the strains within the information displayed on the display and having solely the modified strains listed, we will ask diff to offer some context. There are two methods to do that. Each methods obtain the identical objective, which is to point out some strains earlier than and after every modified line. You’ll be capable of see what’s happening within the file on the place the place the distinction was detected.

The primary technique makes use of the -c (copied context) choice.

colordiff -c alpha1 alpha2

Output of colordiff with -c option

The diff output has a header. The header lists the 2 file names and their modification occasions. There are asterisks

earlier than the identify of the primary file and dashes (-) earlier than the identify of the second file. Asterisks and dashes shall be used to point which file the strains within the output belong to.

A line of asterisks with 1,7 within the center signifies we’re taking a look at strains from alpha1. To be exact, we’re taking a look at strains one to seven. The phrase Delta is flagged as modified. It has an exclamation level ( ! ) alongside it, and it’s purple. There are three strains of unchanged textual content displayed earlier than and after that line so we will see the context of that line within the file.

The road of dashes with 1,7 within the center tells us we’re now taking a look at strains from alpha2. Once more, we’re taking a look at strains one to seven, with the phrase Dave on line 4 flagged as being totally different.

Three strains of context above and under every change is the default worth. You’ll be able to specify what number of strains of context you need diff to offer. To do that, use the -C (copied context) choice with a capital “C” and supply the variety of strains you’d like:

colordiff -C 2 alpha1 alpha2

Output of colordiff with -C 2 option

The second diff choice that gives context is the -u (unified context) choice.

colordiff -u alpha1 alpha2

Output of colordiff with -u option

As earlier than, we now have a header on the output. The 2 information are named, and their modification occasions are proven. There are dashes (-) earlier than the identify of alpha1 and plus indicators (+) earlier than the identify of alpha2. This tells us that dashes shall be used to discuss with alpha1 and plus indicators will probably be used to check with alpha2. Scattered all through the itemizing are strains that begin with at indicators (@). These strains mark the beginning of every distinction. Additionally they inform us which strains are being proven from every file.

We’re proven the three strains earlier than and after the road flagged as being totally different in order that we will see the context of the modified line. Within the unified view, the strains with the distinction are proven one above the opposite. The road from alpha1 is preceded by a touch and the road from alpha2 is preceded by a plus signal. This show achieves in eight strains what the copied context show above took fifteen to do.

As you’d anticipate, we will ask diff to offer precisely the variety of strains of unified context we’d wish to see. To do that, use the -U (unified context) choice with a capital “U” and supply the variety of strains you’d need:

colordiff -U 2 alpha1 alpha2

Output of colordiff with -U 2 option

Ignoring White Area and Case

Let’s analyze one other two information, test4 and test5. These have the names six of superheroes in them.

colordiff -y -W 70 test4 test5

Output of colordiff on test4 and test5 files

The outcomes present that diff finds nothing totally different with the Black Widow, Spider-Man and Thor strains. It does flag up modifications with the Captain America, Ironman, and The Hulk strains.

So what’s totally different? Properly, in test5 Hulk is spelled with a lowercase “h,” and Captain America has an additional area between “Captain” and “America.” OK, that’s plain to see, however what’s fallacious with the Ironman line? There are not any seen variations. Right here’s a superb rule of thumb. In case you can’t see it, the reply is white area. There’s virtually definitely a stray area or two, or a tab character, on the finish of that line.

In the event that they don’t matter to you, you’ll be able to instruct diff to disregard particular forms of line distinction, together with:

  • -i: Ignore variations in case.
  • -Z: Ignore trailing white area.
  • -b: Ignore modifications within the quantity of white area.
  • -w: Ignore all white area modifications.

Let’s ask diff to verify these two information once more, however this time to disregard any variations in case.

colordiff -i -y -W 70 test4 test5

output from colordiff ignore case

The strains with “The Hulk” and “The hulk” at the moment are thought-about a match, and no distinction is flagged for lowercase “h.” Let’s ask diff to additionally ignore trailing white area.

colordiff -i -Z -y -W 70 test4 test5

Output from colordiff ignore trailing white space

As suspected, trailing white area should have been the distinction on the Ironman line as a result of diff not flags a distinction for that line. That leaves Captain America. Let’s ask diff to disregard case and to disregard all white area points.

colordiff -i -w -y -W 70 test4 test5

Output from colordiff ignore all white space

By telling diff to disregard the variations that we’re not involved about, diff tells us that, for our functions, the information match.

The diff command has many extra choices, however the majority of them relate to producing machine-readable output. These may be reviewed on the Linux man web page. The choices we’ve used within the examples above will allow you to trace down all of the variations between variations of your textual content information, utilizing the command line and human eyeballs.

!perform(f,b,e,v,n,t,s)if(f.fbq)return;n=f.fbq=perform()n.callMethod?n.callMethod.apply(n,arguments):n.queue.push(arguments);if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!zero;n.model=’2.zero’;n.queue=[];t=b.createElement(e);t.async=!zero;t.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)(window,doc,’script’,’https://connect.facebook.net/en_US/fbevents.js’);fbq(‘init’,’335401813750447′);fbq(‘track’,’PageView’);