Disclaimer
Please be aware that the information and procedures described herein are provided "as is" and without any warranty, express or implied. I assume no liability for any potential damages or issues that may arise from applying these contents. Any action you take upon the information is strictly at your own risk.
All actions and outputs documented were performed within a virtual machine running a Linux Debian server as the host system. The output and results you experience may differ depending on the specific Linux distribution and version you are using.
It is strongly recommended that you test all procedures and commands in a virtual machine or an isolated test environment before applying them to any production or critical systems.
- No warranty for damages.
- Application of content at own risk.
- Author used a virtual machine with a Linux Debian server as host.
- Output may vary for the reader based on their Linux version.
- Strong recommendation to test in a virtual machine.
| Author | Nejat Hakan |
| License | CC BY-SA 4.0 |
| nejat.hakan@outlook.de | |
| PayPal Me | https://paypal.me/nejathakan |
Pipes and Redirection
Introduction Understanding the Flow
Welcome to the world of pipes and redirection in Linux! These concepts are absolutely fundamental to using the Linux command line effectively and are cornerstones of the "Unix philosophy." This philosophy encourages building systems from small, specialized tools that do one thing well, and then combining these tools in powerful ways to accomplish complex tasks. Pipes and redirection are the primary mechanisms for achieving this combination.
Before we dive into the specifics, let's understand the context. When you run a command in the Linux shell (like Bash), it typically interacts with three standard data streams:
- Standard Input (stdin): This is where a command receives its input data. By default, stdin is connected to your keyboard. When a command waits for you to type something (like
catwithout any arguments), it's reading from stdin. Its associated file descriptor number is0. - Standard Output (stdout): This is where a command sends its normal results or output. By default, stdout is connected to your terminal screen (the console). When
lslists files, it's sending that list to stdout, which you then see on your screen. Its associated file descriptor number is1. - Standard Error (stderr): This is where a command sends its error messages or diagnostic information. By default, stderr is also connected to your terminal screen. When you try to
lsa non-existent directory, the "No such file or directory" message is sent to stderr. Its associated file descriptor number is2.
Think of these streams as default communication channels. Redirection allows you to change where these streams point, connecting them to files instead of the keyboard or screen. Pipes allow you to connect the stdout of one command directly to the stdin of another command, creating a processing pipeline.
Mastering these concepts will dramatically increase your command-line efficiency, enabling you to manipulate data, automate tasks, and understand Linux systems at a much deeper level. We will explore how to control these streams precisely, first by understanding them individually, then learning how to redirect them, and finally how to connect commands using pipes.
1. Understanding Standard Streams
As briefly mentioned in the introduction, the standard streams (stdin, stdout, stderr) are the default communication channels for command-line programs in Linux and other Unix-like systems. Let's delve deeper into their nature and significance.
File Descriptors
At the operating system level, these streams are managed using file descriptors. A file descriptor is simply a non-negative integer that the kernel uses to represent an open file or I/O channel. When a process (like a running command) starts, the shell typically ensures that three file descriptors are already open and associated with the standard streams:
- File Descriptor 0: Standard Input (stdin)
- File Descriptor 1: Standard Output (stdout)
- File Descriptor 2: Standard Error (stderr)
Programs written in C, Python, or other languages often use libraries that provide convenient ways to read from file descriptor 0 (for input), write to file descriptor 1 (for normal output), and write to file descriptor 2 (for errors), without needing to know exactly where these descriptors are connected (keyboard, screen, file, another program). This abstraction is incredibly powerful because it allows the user or the shell to decide where input comes from and where output goes, without modifying the program itself.
Default Connections
By default, in an interactive shell session:
- stdin (0) is connected to the terminal's input device (your keyboard).
- stdout (1) is connected to the terminal's output device (your display/screen).
- stderr (2) is also connected to the terminal's output device (your display/screen).
This is why you normally type commands on the keyboard (stdin), see the results on the screen (stdout), and also see error messages on the same screen (stderr). The fact that stdout and stderr often go to the same place (the terminal) by default can sometimes be confusing, but it's crucial to remember they are distinct streams. This distinction allows us to redirect them independently, which is a common and powerful technique. For example, you might want to save the normal output of a command to a file but still see error messages immediately on your screen.
Why Separate Standard Error?
Separating normal output (stdout) from error messages (stderr) is a critical design choice. Imagine you run a command that processes thousands of files, generating useful output for most but encountering errors for a few.
- If errors were mixed with normal output on stdout, it would be very difficult to programmatically process the successful results later, as you'd constantly have to filter out error messages.
- It would also be hard for a user to quickly identify if any errors occurred amidst potentially voluminous successful output.
By sending errors to a separate stream (stderr), we gain flexibility:
- We can redirect stdout to a file for later processing, while letting stderr print to the screen so we see errors immediately.
- We can redirect stderr to a separate error log file.
- We can redirect both streams to different places or the same place as needed.
- We can simply discard one or both streams if we don't care about certain output.
Understanding these three streams and their associated file descriptors (0, 1, 2) is the absolute foundation for understanding redirection and pipes.
Workshop Observing Standard Streams
In this workshop, we'll use simple commands to observe the behavior of stdin, stdout, and stderr in their default configuration.
Objective: Visually distinguish between standard input, standard output, and standard error using common commands.
Steps:
-
Observe Standard Output (stdout):
- Open your terminal.
- Run the
lscommand to list files in your home directory: - Explanation: The list of files and directories you see is the normal output of the
lscommand. The shell directedls's stdout (file descriptor 1) to your terminal screen.
-
Observe Standard Input (stdin):
- Run the
catcommand without any arguments: - Your cursor will move to the next line, and the terminal will appear to wait.
- Type some text, for example:
- Press
Enterafter each line. Notice thatcatimmediately prints the line back to you. - To signal the end of input, press
Ctrl+Don a new, empty line. Thecatcommand will exit. - Explanation: When run without file arguments,
catreads from its stdin (file descriptor 0). By default, stdin is your keyboard. The text you typed was sent tocatvia stdin.cat's job is to read its input and print it to its stdout (file descriptor 1), which is, by default, your screen. PressingCtrl+Dsends an End-of-File (EOF) signal, tellingcatthere's no more input.
- Run the
-
Observe Standard Error (stderr):
- Try to list the contents of a directory that does not exist:
- You will likely see an error message similar to:
- Explanation: This message is not normal output; it's an error report. The
lscommand sent this message to its stderr stream (file descriptor 2). By default, stderr is also directed to your terminal screen, so you see it mixed with potential stdout (though in this specific case, there was no stdout).
-
Distinguishing stdout and stderr:
- Let's run a command that is likely to produce both normal output and error messages. The
findcommand is good for this, especially when searching system directories where you might lack permissions for some subdirectories. - You will likely see a mix of lines:
- Lines starting with
/etc/...ending in.conf(these are successful finds, sent to stdout). - Lines like
find: ‘/etc/some/subdir’: Permission denied(these are errors, sent to stderr).
- Lines starting with
- Explanation: Both stdout and stderr are going to your terminal by default, so they appear interleaved. This clearly shows that while they often land in the same place, they originate from distinct streams (fd 1 vs. fd 2). In the next section on redirection, we'll learn how to separate them.
- Let's run a command that is likely to produce both normal output and error messages. The
This workshop demonstrated the default behavior of the three standard streams, setting the stage for learning how to manipulate them using redirection.
2. Output Redirection Controlling Where Output Goes
Now that we understand standard output (stdout, fd 1) and standard error (stderr, fd 2), let's explore how to redirect them away from the terminal screen and into files. This is incredibly useful for saving command results, creating log files, or suppressing unwanted output.
The shell provides special operators for redirection. It's important to realize that the shell handles the redirection before it even executes the command. The command itself usually doesn't know or care that its output is going to a file instead of the screen; it just writes to its standard output or standard error file descriptors as usual.
Redirecting Standard Output (stdout)
-
>(Overwrite): The>operator redirects stdout to a specified file. If the file exists, it will be overwritten without warning. If the file does not exist, it will be created.- Syntax:
command > filename - Example: Save the list of files in
/etcto a file namedetc_contents.txt. After running this,etc_contents.txtwill contain the output ofls /etc. Any previous content ofetc_contents.txtis lost. You won't see thelsoutput on your terminal because it was redirected.
- Syntax:
-
>>(Append): The>>operator also redirects stdout to a specified file. However, if the file exists, the new output is added (appended) to the end of the file instead of overwriting it. If the file does not exist, it will be created.- Syntax:
command >> filename - Example: Add a timestamp and a separator line to a log file.
Each time you run these commands, the current date/time and the separator line will be added to the end of
system_log.txt.
- Syntax:
Redirecting Standard Error (stderr)
Redirecting stderr works similarly, but you need to explicitly specify the file descriptor number (2) before the redirection operator.
-
2>(Overwrite): Redirects stderr (file descriptor 2) to a file, overwriting it if it exists.- Syntax:
command 2> error_log_filename - Example: Run the
findcommand from the previous workshop, saving only the error messages (like "Permission denied") tofind_errors.log. The normal output (found files) will still appear on the terminal.
- Syntax:
-
2>>(Append): Redirects stderr (file descriptor 2) to a file, appending to it if it exists.- Syntax:
command 2>> error_log_filename - Example: Run a script repeatedly and append any errors to a persistent log file.
- Syntax:
Redirecting Both stdout and stderr
There are several ways to redirect both standard output and standard error.
-
To Separate Files: Simply use both stdout and stderr redirection on the same command line.
- Syntax:
command > output_file 2> error_file - Example: Save the found
.conffiles tofound_files.logand errors tofind_errors.log. Nothing appears on the terminal.
- Syntax:
-
To the Same File (Method 1: Bash
&>or&>>)- Bash (and some other modern shells) provides a shorthand
&>to redirect both stdout and stderr to the same file, overwriting it. - Syntax:
command &> combined_output_file - The
&>>operator appends both stdout and stderr to the same file. - Syntax:
command &>> combined_output_file - Example (Overwrite):
- Example (Append):
- Note:
&>and&>>are convenient but less portable than the next method, as they are not defined in the POSIX standard for shells.
- Bash (and some other modern shells) provides a shorthand
-
To the Same File (Method 2: POSIX
> file 2>&1)- This is the traditional, portable way to redirect both stdout and stderr to the same file. It looks a bit cryptic at first:
command > file 2>&1 - Let's break down
> file 2>&1:> file: This part redirects stdout (fd 1) tofile. This happens first.2>&1: This part redirects stderr (fd 2) to the current location of stdout (fd 1). Since fd 1 was just redirected tofile, this effectively sends stderr to the same file. The&before the1is crucial; it tells the shell that1refers to a file descriptor, not a file named "1".
- Syntax (Overwrite):
command > combined_output_file 2>&1 - Syntax (Append):
command >> combined_output_file 2>&1(Note: use>>for appending stdout, then2>&1sends stderr to the same appended stream). - Example (Overwrite):
- Example (Append):
- Common Mistake: Writing
command 2>&1 > fileusually does not work as intended. This redirects stderr (2) to where stdout (1) currently points (the terminal), then redirects stdout (1) to the file. The result is that errors still go to the terminal. The order matters!
- This is the traditional, portable way to redirect both stdout and stderr to the same file. It looks a bit cryptic at first:
Discarding Output
Sometimes you want to run a command but are not interested in its output (either stdout or stderr, or both). Linux provides a special "null device" /dev/null that accepts any input written to it and simply discards it.
- Discard stdout:
command > /dev/null - Discard stderr:
command 2> /dev/null - Discard both stdout and stderr:
command > /dev/null 2>&1orcommand &> /dev/null
Example: Run a background process but ignore all its output:
Understanding output redirection gives you precise control over where the results and errors of your commands are stored or whether they are displayed at all.
Workshop Processing Log File Data
In this workshop, we'll simulate processing server logs, separating successful entries from error entries using output redirection.
Objective: Use output redirection operators (>, >>, 2>, 2>&1) to manage command output effectively.
Scenario: Imagine you have a command or script that processes web server log entries. It prints successfully processed lines to stdout and error/warning messages about malformed lines or access issues to stderr. We want to save the good data and the error information into separate files, and also demonstrate saving combined output.
Steps:
-
Simulate Mixed Output: We'll use
echocommands strategically to simulate a program producing both stdout and stderr.- Run this command:
- Explanation:
echo "INFO: Processing user data...": Prints normal info to stdout.echo "ERROR: User database connection failed!" >&2: Prints an error message explicitly to stderr (file descriptor 2). This is a shell trick to sendecho's output to stderr instead of stdout.echo "INFO: User data processed.": Prints more normal info to stdout.
- You should see all three lines printed to your terminal because both stdout and stderr go there by default.
-
Redirect stdout (Overwrite): Let's capture only the informational messages.
- Run the same sequence but redirect stdout:
(Note: We put the commands in parentheses
(echo "INFO: Processing user data..."; echo "ERROR: User database connection failed!" >&2; echo "INFO: User data processed.") > info_log.txt(...)to create a subshell, allowing us to redirect the collective stdout of all commands within it easily). - Observe your terminal: You should only see the "ERROR: User database connection failed!" message.
- Check the contents of the new file
info_log.txt: - You should see:
- Explanation: We redirected stdout (
>) toinfo_log.txt. The stderr message (>&2) was not redirected, so it still went to the terminal.
- Run the same sequence but redirect stdout:
-
Redirect stderr (Overwrite): Now, let's capture only the error messages.
- Run the sequence, redirecting stderr this time:
- Observe your terminal: You should only see the "INFO" messages.
- Check the contents of the new file
error_log.txt: - You should see:
- Explanation: We redirected stderr (
2>) toerror_log.txt. The stdout messages were not redirected, so they went to the terminal.
-
Redirect Both to Separate Files: Capture info and errors in their respective files simultaneously.
- Run the sequence with both redirections:
- Observe your terminal: You should see no output.
- Check the contents of
info_log_separate.txt: (Should contain the INFO messages) - Check the contents of
error_log_separate.txt: (Should contain the ERROR message) - Explanation: Stdout was sent to one file, stderr to another. Nothing was left to go to the terminal.
-
Redirect Both to the Same File (Append): Let's create a combined log, appending each time we run the "process".
- First run (creates the file):
- Second run (appends to the file):
- Observe your terminal: No output from the main commands.
- Check the contents of
combined_log.txt: - You should see the output from both runs, including the separators, info messages, and error/warning messages, all interleaved in the order they were generated.
- Explanation: We used
>> combined_log.txtto append stdout to the file. Then,2>&1redirected stderr (2) to the same place stdout (1) was pointing (the appendedcombined_log.txt). This ensures all output goes into the same file in the correct order, and subsequent runs add to the end.
-
Discarding Errors: Run the process but ignore any errors.
- Run the sequence, redirecting stderr to
/dev/null: - Observe your terminal: You should only see the "INFO" messages. The error message was discarded.
- Explanation: Sending output to
/dev/nulleffectively throws it away. This is useful when you only care about successful output or want to suppress known, non-critical errors.
- Run the sequence, redirecting stderr to
This workshop provided hands-on practice with redirecting stdout and stderr to files, both separately and combined, using overwrite and append modes, and demonstrated how to discard unwanted output.
3. Input Redirection Supplying Input from Files
Just as we can control where a command's output goes, we can also control where it gets its input from. By default, commands read from standard input (stdin, fd 0), which is usually connected to the keyboard. Input redirection allows us to tell a command to read its input from a file instead.
Redirecting Standard Input (<)
The < operator redirects stdin for a command. The command will read from the specified file as if its contents were being typed on the keyboard.
- Syntax:
command < filename -
Example: Count the number of lines in the
Instead of waiting for keyboard input,etc_contents.txtfile we created earlier using thewc(word count) command with the-l(lines) option.wc -lreads directly frometc_contents.txt. The output will be a number followed by no filename (e.g.,45). Compare this towc -l etc_contents.txt, which produces output like45 etc_contents.txt. When using input redirection (<), the command often doesn't know the name of the file it's reading from, only that it's receiving data on its stdin. -
Example: Sort the contents of a file.
The# First, create a simple unsorted list echo "Charlie" > names.txt echo "Alice" >> names.txt echo "Bob" >> names.txt # Now sort it using input redirection sort < names.txtsortcommand reads the lines fromnames.txtvia its stdin and prints the sorted result ("Alice", "Bob", "Charlie") to its stdout (the terminal).
Here Documents (<<)
A "Here Document" is a special type of redirection that allows you to embed multi-line input for a command directly within your script or command line. It's incredibly useful for feeding predefined text blocks to commands like cat, mail, ftp, or configuration tools without needing a separate temporary file.
- Syntax:
DELIMITERis a user-chosen keyword (conventionallyEOFfor "End Of File", but it can be almost anything).- The shell reads the lines following the command up to a line containing only the
DELIMITER. This block of text is then fed as stdin to thecommand. -
The
DELIMITERitself is not part of the input. The closingDELIMITERmust be on a line by itself, with no leading or trailing whitespace. -
Example: Create a file named
message.txtwith multiple lines usingcatand a here document.After running this, check the contents ofcat > message.txt << MSG_END This is the first line of the message. This is the second line. Indentation is preserved. Special characters like $HOME or `date` are usually expanded. MSG_ENDmessage.txt: You'll see the multi-line text was written to the file. Notice that$HOMEanddatemight have been replaced by their values (e.g., your home directory path and the current date/time). -
Suppressing Expansion: If you want to prevent the shell from expanding variables (
$VAR), command substitutions (`command`or$(command)), etc., within the here document, you need to quote the delimiter in the initial<<line:Now,cat > literal_message.txt << "EOF" This line contains $USER and `pwd`. These will be treated literally. EOF cat literal_message.txtliteral_message.txtwill contain the literal strings$USERand`pwd`.
Here Strings (<<<)
Bash (and Zsh, ksh) also provide "Here Strings", a simpler construct for feeding a single string (which can contain spaces or newlines represented by \n, though the shell might process those) as standard input.
- Syntax:
command <<< "Some string data" - Example: Pass a single line of text to
wc -w(word count). Output:5 - Example: Use
grepto search within a string.
Here Strings are often more convenient than echo "string" | command for simple cases, as they avoid creating an extra process for echo.
Input redirection is essential for automating tasks where commands need to process data stored in files or defined directly within scripts.
Workshop Batch Processing with Input Redirection
In this workshop, we'll use input redirection (<) and here documents (<<) to perform simple batch processing tasks.
Objective: Practice using < to feed file contents and << to feed multi-line script input to commands.
Scenario: We have a list of tasks in a file that we want to process, and we also want to generate a configuration file using a here document.
Steps:
-
Prepare Input File: Create a file named
Verify the file contents:tasks.txtcontaining a list of items, one per line. -
Process with Input Redirection (
<): Use thegrepcommand to find specific tasks in the file, feeding the file via stdin.- Find tasks related to "system" or "security":
- Explanation:
grepnormally takes input from stdin if no file is given. The< tasks.txtredirects the contents oftasks.txtto becomegrep's standard input.grepfilters this input based on the pattern (-Eenables extended regular expressions,|means OR) and prints matching lines to its standard output (the terminal).
-
Process with Input Redirection (
<): Sort the tasks alphabetically.- Sort the
tasks.txtfile: - Explanation: Similar to the previous step,
sortreads the lines fromtasks.txtvia stdin and outputs the sorted list to stdout.
- Sort the
-
Create a Configuration Snippet with a Here Document (
<<): Imagine we need to create a small configuration file or script snippet. We can usecatcombined with a here document.- Create a file
config_snippet.cfgwith some settings: - Verify the contents of the newly created file:
- Explanation:
cat > config_snippet.cfgtellscatto write its output to the fileconfig_snippet.cfg. The<< CONFIG_EOFtells the shell to read the following lines until it encountersCONFIG_EOFon a line by itself, and feed those lines as stdin tocat.catthen reads this stdin and writes it to its stdout, which is redirected to the file.
- Create a file
-
Use a Here Document for Interactive Command Input (Demonstration): Some commands expect interactive input. While better automation often uses command-line arguments or files, here documents can sometimes script simple interactions. Let's simulate feeding input to
bc(an arbitrary precision calculator).- Perform a simple calculation using
bc: - You should see the results:
- Explanation: The
bccommand reads calculation instructions from stdin. The here document provides these instructions (scale=2sets decimal places, the calculations,quitexitsbc).bcexecutes them and prints results to stdout.
- Perform a simple calculation using
-
Use a Here String (
<<<): Perform a quick word count on a specific phrase.- Output:
6 - Explanation: The shell takes the string
"This is a quick test phrase."and provides it as stdin to thewc -wcommand, which counts the words and prints the result to stdout.
- Output:
This workshop illustrated how input redirection (<), here documents (<<), and here strings (<<<) can be used to supply input to commands from files or directly within the shell, enabling batch processing and scripting.
4. Pipes Connecting Commands
Pipes are perhaps the most iconic feature embodying the Unix philosophy of combining small, specialized tools. A pipe, represented by the vertical bar character |, allows you to take the standard output (stdout) of one command and connect it directly to the standard input (stdin) of another command, without using an intermediate temporary file.
How Pipes Work
When you type command1 | command2 in the shell:
- The shell creates a pipe, which is an in-memory buffer managed by the kernel.
- It starts both
command1andcommand2processes more or less simultaneously. - Crucially, it redirects the stdout (fd 1) of
command1so that instead of going to the terminal, it writes into the pipe. - It redirects the stdin (fd 0) of
command2so that instead of reading from the keyboard, it reads from the pipe.
command1 starts running and producing output. As it writes to its stdout, the data flows into the pipe. command2 starts running and tries to read from its stdin. As data becomes available in the pipe (written by command1), command2 reads it and processes it.
- If
command1produces output faster thancommand2can consume it, the pipe buffer might fill up. The kernel will then temporarily pausecommand1(block itswriteoperation) untilcommand2reads some data and makes space in the buffer. - If
command2tries to read from the pipe but it's empty (andcommand1hasn't finished and closed its end of the pipe yet), the kernel will temporarily pausecommand2(block itsreadoperation) untilcommand1writes more data. - When
command1finishes and closes its stdout (its connection to the pipe),command2will eventually read an End-of-File (EOF) signal from the pipe, indicating no more data is coming.
This creates a producer-consumer relationship, allowing data to flow between commands efficiently.
Simple Pipe Examples
-
Count files: List files in
/etcand count how many there are.ls /etcproduces a list of filenames on its stdout.- The pipe
|connects this stdout to the stdin ofwc -l. wc -lreads the list of filenames from its stdin and counts the number of lines, printing the result to its stdout (the terminal).
-
Find specific processes: List all running processes and filter for lines containing "bash".
ps auxlists processes on its stdout.- The pipe sends this list to
grep bash. grep bashreads the process list from its stdin, keeps only the lines containing "bash", and prints them to its stdout (the terminal).
Chaining Multiple Pipes
The real power comes from chaining multiple commands together. The output of one becomes the input for the next, allowing for sophisticated multi-stage data processing directly on the command line.
-
Syntax:
command1 | command2 | command3 | ... -
Example: Find the top 5 largest files in the current directory.
Let's break this down:ls -lS: Lists files in long format (-l) and sorts them by size (-S), largest first. Output goes to stdout.| head -n 6: The pipe sends the sorted list tohead.head -n 6takes only the first 6 lines from its stdin (the header line fromls -lplus the top 5 files) and prints them to its stdout.| awk '{print $9 " (" $5 " bytes)"}': The pipe sends the top 6 lines toawk.awkis a powerful text processing tool. This command tellsawkto process each line it receives on stdin: print the 9th field ($9, the filename) followed by a space, an opening parenthesis, the 5th field ($5, the size in bytes), the word " bytes)", and a closing parenthesis. The result is printed toawk's stdout (the terminal). Note: Field numbers might vary slightly depending on yourlsversion/options, adjust if needed. You might also needtail -n +2afterheadif you want to skip the "total" linels -loutputs. A potentially more robust command usesfind:find . -maxdepth 1 -type f -printf '%s %p\n' | sort -nr | head -n 5.
Pipes are fundamental for combining command-line utilities effectively. They allow you to build complex data processing workflows succinctly and efficiently.
Workshop Analyzing Process Information
In this workshop, we'll build a pipeline of commands to extract specific information about running processes.
Objective: Practice connecting commands using pipes (|) to filter and transform data sequentially.
Scenario: We want to find the processes currently running by our user, sort them by memory usage, and display the top 3 memory-consuming processes along with their Process ID (PID) and command name.
Steps:
-
List All Processes: Start by listing all processes in a detailed format. The
ps auxcommand is common for this.- Observe the output. It contains many lines and columns, including USER, PID, %CPU, %MEM, COMMAND, etc. This entire output is sent to stdout.
-
Filter by User: We only want processes owned by the current user. We can pipe the output of
ps auxtogrepto filter the lines. The$USERenvironment variable holds your username.- Explanation:
ps aux: Generates the process list (stdout).|: Pipes the stdout ofps auxto the stdin ofgrep.grep "^$USER": Reads the process list from stdin. It filters for lines that start (^) with the current username ($USER). Matching lines are printed to stdout.
- Observe the output: It should now only contain processes belonging to you (plus possibly the
grepcommand itself).
- Explanation:
-
Sort by Memory Usage: The
ps auxoutput typically has the memory percentage (%MEM) in the 4th column. We want to sort the filtered lines numerically based on this column in reverse order (highest memory usage first). We pipe the output ofgrepto thesortcommand.- Explanation:
ps aux | grep "^$USER": Produces the user-filtered process list (stdout).|: Pipes the stdout ofgrepto the stdin ofsort.sort -k 4 -nr: Reads the filtered list from stdin.-k 4: Specifies sorting based on the 4th key (field). Fields are whitespace-separated by default.-n: Specifies a numeric sort (otherwise '10' might come before '2').-r: Specifies a reverse sort (descending order).
- The sorted list is printed to stdout.
- Observe the output: Your processes should now be listed with the highest memory consumers at the top.
- Explanation:
-
Select the Top 3: We only want the top 3 most memory-intensive processes. We can pipe the sorted list to the
headcommand.- Explanation:
ps aux | grep "^$USER" | sort -k 4 -nr: Produces the user's processes sorted by memory usage (stdout).|: Pipes the stdout ofsortto the stdin ofhead.head -n 3: Reads the sorted list from stdin and prints only the first 3 lines to its stdout (the terminal).
- Observe the output: You should now see only the header line (if
grepdidn't filter it) and your top 3 memory-using processes.
- Explanation:
-
(Optional) Refine Output Format: The output still contains all columns from
ps aux. Let's useawkto print only the PID (column 2) and the command (column 11 onwards).ps aux | grep "^$USER" | sort -k 4 -nr | head -n 3 | awk '{print "PID:", $2, " Memory:", $4 "%", " Command:", $11}'- Explanation:
... | head -n 3: Produces the top 3 lines (stdout).|: Pipes the stdout ofheadto the stdin ofawk.awk '{print "PID:", $2, " Memory:", $4 "%", " Command:", $11}': Reads the 3 lines from stdin. For each line, it prints the literal string "PID:", then the 2nd field ($2), then " Memory:", the 4th field ($4), "%", " Command:", and the 11th field ($11). (Note:$11might only be the start of the command if it contains spaces; for the full command, more complexawkor differentpsoptions likeps -u $USER -o pid,pmem,comm --sort=-pmem | head -n 4might be better, but this illustrates the pipe concept).
- Observe the output: A cleaner display showing PID, Memory %, and Command for your top 3 memory users.
- Explanation:
This workshop demonstrated how pipes (|) allow you to chain commands together, progressively filtering and transforming data to achieve a specific result without needing intermediate files. Each command in the pipeline performs a specialized task on the data it receives from the previous one.
5. Combining Pipes and Redirection
The true power of the shell often lies in using pipes and redirection together within the same command line. This allows you to create sophisticated workflows where data flows between commands via pipes, while also reading initial input from files or sending final (or intermediate) output and errors to files.
Order of Operations
Understanding how the shell processes a command line with both pipes and redirection is important:
- Parsing: The shell first parses the entire command line, identifying commands, pipes (
|), and redirection operators (<,>,>>,2>,2>>,&>, etc.). - Redirection Setup: Before executing any commands, the shell sets up all the specified redirections. This means it opens or creates the necessary files and connects the appropriate file descriptors (0, 1, 2) of the future commands to these files or away from the terminal.
- Pipe Setup: The shell sets up the pipes between commands specified by the
|operator, connecting the stdout of one command to the stdin of the next in the chain. - Command Execution: Finally, the shell executes the commands. The commands themselves run, reading from whatever their stdin is now connected to and writing to whatever their stdout and stderr are now connected to.
Because redirection is set up before commands run, a command generally doesn't know whether its input/output is connected to the terminal, a file, or a pipe – it just uses file descriptors 0, 1, and 2.
Common Combinations
-
Pipe output to a file: Process data through a pipeline and save the final result.
- Syntax:
command1 | command2 > output_file - Example: Find all
.logfiles in your home directory, sort them, and save the list tolog_files.txt.find's stdout goes into the pipe.sortreads from the pipe (stdin), sorts, and its stdout is redirected by>tolog_files.txt.
- Syntax:
-
Read input from a file into a pipeline: Feed data from a file into the start of a pipeline.
- Syntax:
command1 < input_file | command2 - Example: Count the unique lines in a file
data.txt.sort's stdin is redirected by<to read fromdata.txt. Its sorted output (stdout) goes into the first pipe.uniqreads from the pipe (stdin), removes adjacent duplicates, and its output (stdout) goes into the second pipe.wc -lreads the unique lines from the pipe (stdin) and counts them, printing the result to its stdout (the terminal).
- Syntax:
-
Redirecting Errors Within a Pipeline: You can redirect stderr for any command within the pipeline.
- Example: Redirect
command1's errors, pipe its stdout.command1's stderr (fd 2) is redirected toerrors.log.command1's stdout (fd 1) goes into the pipe tocommand2.command2's stdin (fd 0) reads from the pipe. Its stdout and stderr go to their default locations (usually the terminal).
- Example: Redirect
command2's errors.command1's stdout goes into the pipe. Its stderr goes to the terminal.command2reads from the pipe. Its stdout goes to the terminal. Its stderr is redirected toerrors.log.
- Example: Redirect
-
Complex Example: Read from a file, process through several commands, save the final output, and log all errors (from all commands in the pipeline) to a separate file. This is tricky because stderr redirection typically applies to a single command. To capture stderr from the whole pipeline, you often group the commands.
- Using a subshell
(...):- The parentheses create a subshell.
command1's stdin comes frominput.txt.- Pipes connect
command1->command2->command3within the subshell. - The entire subshell's stdout (which is the stdout of the final command,
command3) is redirected by>tooutput.log. - The entire subshell's stderr (which includes stderr from
command1,command2, andcommand3, unless redirected individually inside) is redirected by2>topipeline_errors.log.
- Using a subshell
Combining pipes and redirection allows for extremely flexible command construction, forming the backbone of many shell scripts and data processing tasks.
Workshop Filtering and Logging Web Server Data
In this workshop, we'll combine pipes and redirection to process a sample web server log file, extracting specific information into one file and potential errors into another.
Objective: Practice using pipes (|) and redirection (<, >, 2>) together in a single command line for data processing and logging.
Scenario: We have a simplified web server access log file. We want to extract all lines corresponding to successful image file requests (.jpg, .png, .gif) and save them to image_access.log. We also want to capture any potential errors during processing (e.g., if our filtering command had an issue, although we'll simulate this simply) into processing_errors.log.
Steps:
-
Create Sample Log File: Create a file named
weblog_sample.txtwith the following content:192.168.1.10 - - [10/Oct/2023:10:00:01 +0000] "GET /index.html HTTP/1.1" 200 1500 192.168.1.12 - - [10/Oct/2023:10:00:05 +0000] "GET /images/logo.png HTTP/1.1" 200 5670 192.168.1.10 - - [10/Oct/2023:10:00:10 +0000] "GET /styles.css HTTP/1.1" 200 800 192.168.1.15 - - [10/Oct/2023:10:01:15 +0000] "GET /background.jpg HTTP/1.1" 200 102400 192.168.1.12 - - [10/Oct/2023:10:01:20 +0000] "GET /favicon.ico HTTP/1.1" 404 500 192.168.1.10 - - [10/Oct/2023:10:02:00 +0000] "POST /submit.php HTTP/1.1" 200 300 192.168.1.15 - - [10/Oct/2023:10:02:05 +0000] "GET /images/icon.gif HTTP/1.1" 200 1234 MALFORMED LINE - This should cause an error maybe?- Explanation: This file contains typical log entries (IP, date, request, status code, size) and one intentionally malformed line.
-
Filter for Successful Image Requests and Save Output: We'll use
grepto find lines containingGETrequests for.png,.jpg, or.giffiles that were successful (status code200). We read input from the file using<and save the filtered output using>.- Explanation:
grep '...': The command doing the filtering.GET .*: Matches lines containing "GET " followed by any characters (.*).\.: Matches a literal dot.\(jpg\|png\|gif\): Matches "jpg" OR "png" OR "gif". The parentheses\(\)group the alternatives\|. (Note: Withgrep -E, you could use.(jpg|png|gif))..* 200: Matches any characters followed by the status code " 200".
< weblog_sample.txt: Redirectsgrep's standard input to come from theweblog_sample.txtfile.> image_access.log: Redirectsgrep's standard output (the matching lines) to the fileimage_access.log, overwriting it if it exists.
- Verify the output file: (Should contain the lines for logo.png, background.jpg, and icon.gif)
- Explanation:
-
Introduce a Pipe and Redirect Errors: Let's refine the previous step. Suppose we want to count how many successful image requests we found. We can pipe the output of
greptowc -l. We also want to capture any potential errors from thegrepcommand itself (though unlikely here, we simulate the need).- Explanation:
grep ... < weblog_sample.txt: Reads from the file, filters.2> processing_errors.log: Redirectsgrep's standard error toprocessing_errors.log. Ifgrepencountered an issue (e.g., invalid pattern, though not in this case), the error message would go here.|: Pipesgrep's standard output (the matching lines) to the standard input ofwc -l.wc -l: Reads the lines from the pipe and counts them, printing the result to its standard output (the terminal).
- Observe the terminal: You should see the count (e.g.,
3). - Check the error file:
(This file should likely be empty, as our
grepcommand was valid).
- Explanation:
-
Combine Input, Pipe, Output, and Error Redirection: Now, let's extract the IP addresses of the successful image requests, find the unique IPs, and save the list, while also logging any errors from the entire process (using a subshell).
(grep 'GET .* \.\(jpg\|png\|gif\) .* 200' < weblog_sample.txt | awk '{print $1}' | sort | uniq) > unique_image_ips.log 2> pipeline_errors_combined.log- Explanation:
(...): Groups the commands in a subshell.grep ... < weblog_sample.txt: Reads from the file and filters for successful image requests. Its stdout goes to the first pipe.| awk '{print $1}':awkreads the filtered log lines from the pipe. For each line, it prints only the first field ($1, the IP address). Its stdout goes to the second pipe.| sort:sortreads the IP addresses from the pipe and sorts them alphabetically/numerically. Its stdout goes to the third pipe.| uniq:uniqreads the sorted IP addresses from the pipe and removes adjacent duplicates, printing only the unique IPs. Its stdout is the final output of the pipeline within the subshell.> unique_image_ips.log: Redirects the final stdout of the subshell (which comes fromuniq) to the fileunique_image_ips.log.2> pipeline_errors_combined.log: Redirects the combined stderr of all commands within the subshell to the filepipeline_errors_combined.log.
- Verify the results: (Should contain the unique IP addresses: 192.168.1.12 and 192.168.1.15, sorted)
- Check the combined error log: (Should still be empty if all commands ran successfully).
- Explanation:
This workshop demonstrated how to weave together input redirection (<), pipes (|), output redirection (>), and error redirection (2>), using subshells (...) when necessary to capture output or errors from an entire pipeline. This enables complex data extraction and processing directly on the command line.
6. Essential Companion Commands for Pipes and Redirection
While pipes and redirection provide the mechanism for connecting commands and managing I/O streams, their true power comes from the rich ecosystem of Linux command-line utilities designed to work with text streams. These utilities often follow the Unix philosophy: do one thing, do it well, and work together. Here are some of the most essential commands you'll frequently use in pipelines:
Core Text Manipulation Tools
-
cat(Concatenate):- Use Case: Display file contents, combine multiple files, or feed a file into a pipeline (though
command < fileis often preferred for single files). - Example:
cat file1.txt file2.txt | grep 'error'(Combine files, then filter).
- Use Case: Display file contents, combine multiple files, or feed a file into a pipeline (though
-
grep(Global Regular Expression Print):- Use Case: Search for lines matching a pattern (regular expression) in its input (stdin or files). Indispensable for filtering data. Options like
-i(ignore case),-v(invert match),-E(extended regex),-o(show only matching part) are very useful. - Example:
ps aux | grep 'nginx'(Find processes related to nginx).
- Use Case: Search for lines matching a pattern (regular expression) in its input (stdin or files). Indispensable for filtering data. Options like
-
sort:- Use Case: Sort lines of text alphabetically, numerically (
-n), in reverse order (-r), based on specific fields/keys (-k), or uniquely (-u- often less efficient thansort | uniq). - Example:
cat data.txt | sort -nr(Sort data numerically, highest first).
- Use Case: Sort lines of text alphabetically, numerically (
-
uniq:- Use Case: Filter out duplicate adjacent lines from a sorted input stream. Often used after
sort. The-coption counts occurrences,-dshows only duplicated lines,-ushows only unique lines. - Example:
sort access.log | uniq -c(Count occurrences of each unique line in the log).
- Use Case: Filter out duplicate adjacent lines from a sorted input stream. Often used after
-
wc(Word Count):- Use Case: Count lines (
-l), words (-w), characters (-m), or bytes (-c) in its input. - Example:
ls /usr/bin | wc -l(Count the number of files in /usr/bin).
- Use Case: Count lines (
-
head:- Use Case: Display the first few lines (default 10) of its input. Use
-n <num>to specify the number of lines. - Example:
ls -t | head -n 5(Show the 5 most recently modified files).
- Use Case: Display the first few lines (default 10) of its input. Use
-
tail:- Use Case: Display the last few lines (default 10) of its input. Use
-n <num>to specify the number of lines. The-foption "follows" a file, continuously displaying new lines as they are added (great for monitoring logs). - Example:
dmesg | tail -n 20(Show the last 20 kernel messages). - Example:
tail -f /var/log/syslog(Monitor the system log in real-time).
- Use Case: Display the last few lines (default 10) of its input. Use
-
tr(Translate):- Use Case: Translate or delete characters from its input stream. Useful for case conversion, replacing characters, deleting characters (
-d), squeezing repeats (-s). - Example:
cat file.txt | tr '[:upper:]' '[:lower:]'(Convert file content to lowercase). - Example:
echo "Hello World" | tr -s ' '(Squeeze multiple spaces into single spaces).
- Use Case: Translate or delete characters from its input stream. Useful for case conversion, replacing characters, deleting characters (
More Advanced Stream Editors and Processors
-
sed(Stream Editor):- Use Case: Perform text transformations on an input stream based on script commands (most commonly substitution
s/pattern/replacement/). It processes input line by line. Powerful for search-and-replace, selective deletion, and more complex editing tasks within a pipeline. - Example:
cat file.txt | sed 's/error/warning/g'(Replace all occurrences of "error" with "warning").
- Use Case: Perform text transformations on an input stream based on script commands (most commonly substitution
-
awk:- Use Case: A versatile pattern scanning and processing language. Excellent at processing text files structured into fields (columns). Can perform calculations, reformat output, filter based on field values, and much more. It reads input line by line, splitting each line into fields (
$1,$2, etc.). - Example:
ls -l | awk '{print $1, $5, $9}'(Print permissions, size, and filename fromls -loutput). - Example:
cat data.csv | awk -F',' '$3 > 100 {print $1}'(From a comma-separated file, print the first field if the third field is greater than 100).
- Use Case: A versatile pattern scanning and processing language. Excellent at processing text files structured into fields (columns). Can perform calculations, reformat output, filter based on field values, and much more. It reads input line by line, splitting each line into fields (
Utility for Viewing and Splitting Streams
tee:- Use Case: Reads from standard input and writes to both standard output and one or more files. Extremely useful for viewing data at an intermediate point in a pipeline while also saving it to a file, or for sending output to multiple destinations.
- Example:
command1 | tee intermediate_output.log | command2(Runcommand1, save its output tointermediate_output.log, and pass the same output via stdout tocommand2through the pipe). - Example:
make | tee build.log(Runmake, save the entire build output tobuild.log, and also display it on the terminal). Usetee -ato append to the file.
Mastering these companion commands alongside pipes and redirection unlocks the vast potential of the Linux command line for data manipulation and system administration.
Workshop Text Processing Pipeline
In this workshop, we'll build a pipeline using several of the essential companion commands to process a raw text file, clean it up, and calculate word frequencies.
Objective: Construct a multi-stage pipeline using cat, tr, sort, uniq, and wc (implicitly via uniq -c) to perform a common text analysis task. We will also use tee.
Scenario: We have a text file containing mixed-case words, punctuation, and repeated words. We want to find the frequency of each word, ignoring case and punctuation, and display the most frequent words first.
Steps:
-
Create Sample Text File: Create a file named
sample_article.txt: -
View the Raw File:
-
Build the Pipeline Step-by-Step (and Observe):
-
Step 3a: Convert to Lowercase: Use
Observe the output: All text is now lowercase.trto convert all uppercase letters to lowercase. -
Step 3b: Remove Punctuation: Pipe the lowercase text to another
Observe the output: Punctuation like '.', ',', '?' is gone.trcommand to delete punctuation characters. -
Step 3c: Put Each Word on a New Line: Pipe the cleaned text to
Observe the output: Each word appears on its own line. (Note: This might create empty lines if there were multiple spaces; we'll handle that).tragain to translate spaces into newline characters (\n). This prepares the text for word-based sorting. -
Step 3d: Remove Empty Lines: Add
Observe the output: List of words, one per line, no empty lines.grep .to filter out any empty lines that might have resulted from the previous step..matches any single character, so only lines with at least one character pass through. -
Step 3e: Sort the Words: Pipe the word list to
Observe the output: Alphabetically sorted list of words.sortto group identical words together, which is necessary foruniq. -
Step 3f: Count Unique Word Frequencies: Pipe the sorted list to
uniq -c.uniqremoves adjacent duplicates, and-cprepends the count of each word.Observe the output: Lines now look likecat sample_article.txt | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | tr ' ' '\n' | grep '.' | sort | uniq -c4 sample,4 text, etc. -
Step 3g: Sort by Frequency: Pipe the counted list to
Observe the final output: A list of word frequencies, sorted from most common to least common.sort -nrto sort numerically (-n) and in reverse (-r), putting the most frequent words first.
-
-
Use
teeto Save Intermediate Results: Let's modify the pipeline to save the cleaned, sorted word list before counting, while still getting the final frequency count.cat sample_article.txt | \ tr '[:upper:]' '[:lower:]' | \ tr -d '[:punct:]' | \ tr ' ' '\n' | \ grep '.' | \ sort | \ tee sorted_words.log | \ uniq -c | \ sort -nr > word_frequencies.log- Explanation:
- We added
tee sorted_words.logafter the firstsort. teereceives the sorted word list via stdin.- It writes this list to the file
sorted_words.log. - It also writes the same list to its stdout, which is connected via the next pipe to
uniq -c. - The final output of
sort -nr(the frequency list) is redirected using>toword_frequencies.log.
- We added
- Verify the created files: You now have both the intermediate sorted word list and the final frequency count saved in separate files.
- Explanation:
This workshop demonstrated how to chain multiple text processing utilities together using pipes to perform a complex task, and how tee can be used to capture intermediate data within a pipeline without interrupting the flow.
Conclusion
Throughout this exploration, we've journeyed from the fundamental concept of standard streams (stdin, stdout, stderr) and their associated file descriptors (0, 1, 2) to the powerful techniques of I/O redirection and command pipelines.
Redirection (<, >, >>, 2>, 2>>, &>, 2>&1) gives you granular control over where a command's input comes from and where its output and errors go. Whether you need to read data from a file, save results, create logs, suppress messages, or handle errors gracefully, redirection operators are your essential tools. We saw how to direct streams to files, append to existing files, handle stdout and stderr independently or together, and use the special /dev/null device to discard unwanted output. Techniques like Here Documents (<<) and Here Strings (<<<) provide convenient ways to embed input directly in scripts or command lines.
Pipes (|) exemplify the power of the Unix philosophy, enabling you to connect the standard output of one command directly to the standard input of another. This allows you to build sophisticated data processing workflows by chaining together small, specialized utilities like grep, sort, uniq, wc, tr, sed, and awk. Each command acts as a filter or transformer, processing the data stream sequentially without the need for intermediate files, leading to efficient and elegant solutions.
Combining pipes and redirection allows for even more intricate command constructions, reading initial data from files, processing it through multi-stage pipelines, and saving final results and errors to designated locations. Utilities like tee further enhance flexibility by allowing you to "tap into" a pipeline, saving intermediate results while letting the data continue to flow.
Mastering pipes and redirection is not just about learning syntax; it's about adopting a powerful way of thinking about problem-solving on the command line. It encourages breaking down complex tasks into smaller, manageable steps and leveraging the existing toolkit of Linux commands. These concepts are foundational to effective shell scripting, system administration, data analysis, and automation in Linux environments. Continue to practice and experiment – the possibilities for combining these tools are vast and rewarding. ```