103.2 Process Text Streams Using Filters

LPIC-1 Exam Objective 103.2: Process Text Streams Using Filters #


In this tutorial, we’ll cover the essential UNIX commands for processing text streams using filters. These commands are foundational for working with text files and output streams in a UNIX/Linux environment, which is a crucial skill for the LPIC-1 certification exam.

Key Commands and Their Usage #

1. cat #

The cat command is used to concatenate and display the content of files.

Example:

# Display the content of a single file
cat file.txt

# Concatenate multiple files and display their combined content
cat file1.txt file2.txt

2. head #

The head command displays the first few lines of a file.

Example:

# Display the first 10 lines of a file
head file.txt

# Display the first 5 lines of a file
head -n 5 file.txt

3. tail #

The tail command displays the last few lines of a file.

Example:

# Display the last 10 lines of a file
tail file.txt

# Display the last 5 lines of a file
tail -n 5 file.txt

4. cut #

The cut command is used to extract specific sections from each line of a file.

Example:

# Extract the first field from each line, using a comma as the delimiter
cut -d ',' -f 1 file.csv

# Extract characters from positions 1 to 5
cut -c 1-5 file.txt

5. sort #

The sort command sorts lines of text files.

Example:

# Sort a file alphabetically
sort file.txt

# Sort a file numerically
sort -n numbers.txt

6. uniq #

The uniq command filters out repeated lines in a file.

Example:

# Remove duplicate lines from a sorted file
sort file.txt | uniq

# Count occurrences of unique lines
sort file.txt | uniq -c

7. tr #

The tr command translates or deletes characters.

Example:

# Translate lowercase to uppercase
echo "hello world" | tr 'a-z' 'A-Z'

# Delete specific characters
echo "hello 123" | tr -d '0-9'

8. sed #

The sed command is a stream editor used to perform basic text transformations.

Example:

# Replace 'foo' with 'bar' in a file
sed 's/foo/bar/g' file.txt

# Delete lines containing 'pattern'
sed '/pattern/d' file.txt

9. awk #

The awk command is a powerful programming language for pattern scanning and processing.

Example:

# Print the second column of a file
awk '{print $2}' file.txt

# Print lines where the third column is greater than 50
awk '$3 > 50' file.txt

10. grep #

The grep command searches for patterns in files.

Example:

# Search for 'pattern' in a file
grep 'pattern' file.txt

# Search recursively in all files under the current directory
grep -r 'pattern' .

Compression and Checksum Commands #

11. bzcat, xzcat, zcat #

These commands are used to view the contents of compressed files without decompressing them.

Example:

# View the content of a bzipped file
bzcat file.bz2

# View the content of an xzipped file
xzcat file.xz

# View the content of a gzipped file
zcat file.gz

12. md5sum, sha256sum, sha512sum #

These commands generate checksums for verifying file integrity.

Example:

# Generate an MD5 checksum
md5sum file.txt

# Generate a SHA-256 checksum
sha256sum file.txt

# Generate a SHA-512 checksum
sha512sum file.txt

Other Useful Commands #

13. nl #

The nl command numbers lines of files.

Example:

# Number the lines of a file
nl file.txt

14. od #

The od command displays files in various formats (octal, hexadecimal, etc.).

Example:

# Display the file in octal format
od -b file.txt

# Display the file in hexadecimal format
od -x file.txt

15. paste #

The paste command merges lines of files.

Example:

# Merge lines of two files
paste file1.txt file2.txt

16. split #

The split command splits files into pieces.

Example:

# Split a file into pieces of 1000 lines each
split -l 1000 file.txt part

17. wc #

The wc (word count) command displays the number of lines, words, and characters in a file.

Example:

# Display the number of lines, words, and characters
wc file.txt

# Display only the number of lines
wc -l file.txt

# Display only the number of words
wc -w file.txt

Practical Examples #

Example 1: Extracting and Sorting Data #

Let’s say you have a CSV file data.csv with the following content:

name,age,department
Alice,30,HR
Bob,25,Engineering
Charlie,35,Marketing
Dave,28,HR
Eve,40,Engineering

Task: Extract the names and sort them alphabetically.

cut -d ',' -f 1 data.csv | sort

Example 2: Removing Duplicate Lines and Counting Occurrences #

Suppose you have a file access.log with repeated IP addresses.

Task: Remove duplicates and count occurrences.

sort access.log | uniq -c

Example 3: Translating and Deleting Characters #

Assume you have a text file message.txt with the content:

hello world 123

Task: Translate lowercase to uppercase and delete numbers.

cat message.txt | tr 'a-z' 'A-Z' | tr -d '0-9'

Conclusion #

Mastering these commands will not only help you pass the LPIC-1 exam but also empower you to efficiently manage and process text files in a UNIX/Linux environment. Practice these commands with real-world examples to solidify your understanding and enhance your skills.