Text Manipulation with sort and uniq Commands

2023-03-24
By: O. Wolfson

In this tutorial, we will go through two important text manipulation commands in Unix-based systems, sort and uniq. The sort command allows you to sort lines in a file, while the uniq command is used for removing duplicate lines from a file.

Prerequisites

Basic knowledge of Unix-based command-line interfaces Access to a Unix-based terminal (Linux, macOS, or WSL on Windows)

1. Sort Command

1.1 Basic Usage

To sort the lines in a file alphabetically, use the following command:

bash
sort input.txt > output.txt

This command will read the lines from input.txt, sort them alphabetically, and write the sorted lines to output.txt. If you want to sort the lines in reverse order, add the -r option:

bash
sort -r input.txt > output.txt

1.2 Sorting Numerically

To sort a file with numbers, use the -n option:

bash
sort -n numbers.txt > sorted_numbers.txt

This command will sort the lines in numbers.txt numerically and save the result in sorted_numbers.txt.

1.3 Removing Duplicate Lines

The sort command can also remove duplicate lines when combined with the -u option:

bash
sort -u input.txt > output.txt

This command will sort the lines in input.txt, remove any duplicates, and write the unique lines to output.txt.

2. Uniq Command

2.1 Basic Usage To remove duplicate lines from a file using the uniq command, the file must first be sorted. The uniq command only removes consecutive duplicate lines. Here's an example:

bash
sort input.txt | uniq > output.txt

This command sorts the lines in input.txt and then removes duplicate lines using the uniq command. The result is saved to output.txt.

Notes on the The Pipe Command (|)

In Unix-based systems, the pipe command (|) is a powerful feature that allows you to connect the output of one command to the input of another command. This enables you to chain multiple commands together, creating a more efficient workflow.

When you use the pipe command (|), the output from the command on the left side of the pipe is sent directly to the command on the right side of the pipe as input. This eliminates the need for creating temporary files to store intermediate results.

2.2 Counting Duplicate Lines You can count the occurrences of each line using the -c option:

bash
sort input.txt | uniq -c > output.txt

This command will output a list of lines with the number of occurrences for each line.

2.3 Ignoring Case To ignore the case when comparing lines, use the -i option:

bash
sort -f input.txt | uniq -i > output.txt

This command will treat lines with different cases as the same.

Conclusion

In this tutorial, we have covered the basics of the sort and uniq commands for text manipulation in Unix-based systems. These commands are powerful tools for working with text files, and by combining them with other commands and options, you can handle a wide range of text processing tasks.

Download the input.txt file for testing.
Download the numbers.txt file for testing.