Menü schliessen
Created: February 5th 2025
Last updated: February 5th 2025
Categories: Linux
Author: Marcus Fleuti

Export and Analyze Postfix Mail Logs as CSV or Markdown Table with Subject Lines Using Bash and Python

Donation Section: Background
Monero Badge: QR-Code
Monero Badge: Logo Icon Donate with Monero Badge: Logo Text
82uymVXLkvVbB4c4JpTd1tYm1yj1cKPKR2wqmw3XF8YXKTmY7JrTriP4pVwp2EJYBnCFdXhLq4zfFA6ic7VAWCFX5wfQbCC

Export and Analyze Postfix Mail Logs with Subject Lines Using Bash and Python

Managing a mail server can be challenging, especially when you need to analyze email patterns or track specific communications. While Postfix provides robust logging capabilities, extracting meaningful information from these logs often requires complex parsing and processing. In this guide, we'll explore a powerful script that combines Bash and Python to extract and format mail log data, including email subjects, into easily analyzable formats.

Why Track Mail Logs with Subjects?

Email log analysis is crucial for various scenarios. System administrators and IT professionals frequently need to analyze mail logs for troubleshooting delivery issues, monitoring email patterns and volume, auditing email communications, investigating security incidents, and generating usage reports.

The default Postfix configuration doesn't include subject lines in logs, which can make these tasks more challenging. Our solution addresses this limitation while providing flexible export options that make data analysis straightforward and efficient.

Prerequisites and Setup

Before implementing the script, it's important to ensure your system meets all necessary requirements. Let's walk through each component needed for successful implementation.

System Requirements

Your server needs to have these core components installed and properly configured:

  • Postfix mail server (installed and configured)
  • Bash shell environment (standard on most Linux distributions)
  • Python 3.x (for subject line decoding functionality)
  • Basic command-line knowledge
  • Root or sudo access for configuration changes

Required Python Modules

One of the advantages of our script is that it relies entirely on Python's standard library modules, requiring no additional package installation. The script utilizes these built-in modules:

  • email.header - For handling MIME-encoded email headers
  • quopri - For quoted-printable encoding support
  • base64 - For base64 encoded content handling
  • re - For regular expression operations
  • codecs - For character encoding support

Configuring Postfix for Subject Logging

To enable subject line logging in Postfix, you'll need to make some configuration changes. Follow these steps carefully:

1. First, create or edit the header checks configuration file:

sudo mkdir -p /etc/postfix/maps/
sudo nano /etc/postfix/maps/header_checks.pcre

2. Add this content to enable subject logging:

/^Subject:/     WARN

3. Configure Postfix to use the header checks by editing the main configuration:

sudo nano /etc/postfix/main.cf

4. Add or modify this line in main.cf:

header_checks = pcre:/etc/postfix/maps/header_checks.pcre

5. Apply the changes by reloading Postfix:

sudo postfix reload

Script Implementation

Below is the complete implementation of our Postfix mail log export script. The script combines Bash for log processing and Python for robust subject line decoding:

#!/bin/bash

# ===========================================
# Postfix Mail Log Export Script
# ===========================================
# This script processes Postfix mail logs and exports email information
# into either markdown tables or CSV format. It supports processing of
# both current and historical (rotated) log files, with options to
# handle compressed archives.

# ===========================================
# Configuration Variables
# ===========================================
# Base path for mail log files. Rotated logs will be searched as:
# LOG_PATH.1, LOG_PATH.2.gz, etc.
LOG_PATH="/var/log/mail.log"

# Character used to wrap subject text in markdown output.
# This helps prevent conflicts with markdown table separators
# and makes the output more reliable when copy/pasting.
SUBJECT_SEP=""

# When true, subjects starting with '=' will be prefixed with
# a single quote. This prevents Excel/OpenOffice from interpreting
# the subject as a formula when importing the data.
QUOTE_EQUAL=true

# ===========================================
# Help Function
# ===========================================
show_usage() {
    # Display script usage and available command line options
    echo "Usage: $0 -r <include_rejected> -f <format> -c <case_sensitive> -n <history_count>"
    echo "Options:"
    echo "  -r: Include rejected emails (true/false)"
    echo "  -f: Output format (markdown/csv)"
    echo "  -c: Case sensitive email matching (true/false)"
    echo "  -n: Number of history files to process (number or 'a' for all)"
    exit 1
}

# ===========================================
# Command Line Argument Processing
# ===========================================
# Parse command line options using getopts
while getopts "r:f:c:n:h" opt; do
    case $opt in
        r) include_rejected=$OPTARG ;;  # Control inclusion of rejected mails
        f) output_format=$OPTARG ;;     # Output format selection
        c) case_sensitive=$OPTARG ;;    # Email matching case sensitivity
        n) history_count=$OPTARG ;;     # Number of historic files to process
        h) show_usage ;;                # Show help
        *) show_usage ;;                # Invalid option
    esac
done

# Verify all required arguments are provided
if [[ -z $include_rejected || -z $output_format || -z $case_sensitive || -z $history_count ]]; then
    echo -e "\n\nERROR: Missing parameters\n\n"
    show_usage
fi

# Convert output format to uppercase/lowercase for consistent comparison
include_rejected=${include_rejected,,}
output_format=${output_format^^}
case_sensitive=${case_sensitive,,}

# Validate the various formats...
if [[ $output_format != "MARKDOWN" && $output_format != "CSV" ]]; then
    echo -e "\n\nERROR: Output format must be either 'markdown' or 'csv' (case insensitive)\n\n"
    show_usage
fi

if [[ $include_rejected != "true" && $include_rejected != "false" ]]; then
    echo "\n\nERROR: Include rejected must be either 'true' or 'false'\n\n"
    show_usage
fi

if [[ $case_sensitive != "true" && $case_sensitive != "false" ]]; then
    echo "\n\nERROR: Case sensitive must be either 'true' or 'false'\n\n"
    show_usage
fi

# ===========================================
# Log File Discovery
# ===========================================
# Count available history files by incrementing counter
# until no more log files (plain or compressed) are found
max_history=1
while [ -f "${LOG_PATH}.${max_history}.gz" ] || [ -f "${LOG_PATH}.${max_history}" ]; do
    ((max_history++))
done
((max_history--))

echo -e "\n\nNOTE: Found $max_history [ ${LOG_PATH} ] history files...\n\n"

# Handle history file count selection
if [[ $history_count == "a" ]]; then
    # Process all available history files
    history_count=$max_history
elif [[ $history_count -gt $max_history ]]; then
    # Adjust if requested count exceeds available files
    echo "Warning: Only $max_history history files available. Using that instead."
    history_count=$max_history
fi

# ===========================================
# Email Pattern Configuration
# ===========================================
# Prompt for and process email patterns
echo "╔═══════════════════════════════════════════════════╗"
echo "║ Enter recipient email patterns (space-separated): ║"
echo "║ Examples:                                         ║"
echo "║  - user@domain.tld                                ║"
echo "║  - *@domain.tld (all recipients at domain)        ║"
echo "║  - user@* (specific user at any domain)           ║"
echo "╚═══════════════════════════════════════════════════╝"
echo ""
echo -n "Recipient(s): "
read -r recipients

# Convert email patterns into grep-compatible regex
email_patterns=""
for recipient in $recipients; do
    # Add separator between multiple patterns
    if [[ -n $email_patterns ]]; then
        email_patterns+="|"
    fi
    # Convert wildcards to regex patterns and escape dots
    recipient=$(echo "$recipient" | sed 's/\./\\./g' | sed 's/\*/.\*/g')
    email_patterns+="($recipient)"
done

# ===========================================
# Subject Decoder Setup
# ===========================================
# Create temporary Python script for decoding email subjects
# This handles various character encodings and MIME formats
cat > /tmp/decode_subject.py << 'EOF'
import sys
import email.header
import quopri
import base64
import re
import codecs

def decode_subject(subject):
    try:
        subject = subject.replace('?==?', '?= =?')
        parts = email.header.decode_header(subject)
        decoded_parts = []
        for part, charset in parts:
            if isinstance(part, bytes):
                if charset:
                    charset_map = {
                        'Windows-1252': 'cp1252',
                        'iso-8859-1': 'latin1',
                        'iso-8859-2': 'latin2',
                        'iso-8859-15': 'latin9',
                        'ks_c_5601-1987': 'cp949',
                        'GB2312': 'gb2312',
                        'big5': 'big5',
                        'shift_jis': 'cp932',
                        'euc-jp': 'euc_jp',
                        'koi8-r': 'koi8_r'
                    }
                    charset = charset_map.get(charset.lower(), charset)
                    decoded_parts.append(part.decode(charset, errors='replace'))
                else:
                    decoded_parts.append(part.decode('utf-8', errors='replace'))
            else:
                decoded_parts.append(part)
        result = ''.join(decoded_parts)
        return result.strip()
    except Exception as e:
        return subject

if __name__ == '__main__':
    if len(sys.argv) > 1:
        print(decode_subject(sys.argv[1]))
EOF

# ===========================================
# Output Header Generation
# ===========================================
# Print appropriate header based on chosen format

echo -e "\n\nNOTE: Starting [ ${output_format} ] log extraction process...\n\n"
if [[ $output_format == "MARKDOWN" ]]; then
    printf "| DateTime | Subject | From | To |\n"
    printf "|----------|---------|------|----|\n"
elif [[ $output_format == "CSV" ]]; then
    printf "DateTime,Subject,From,To\n"
fi

# ===========================================
# Log Processing
# ===========================================
# Process each log file, starting with oldest
for ((i=history_count; i>=0; i--)); do
    # Determine current file path
    current_file="${LOG_PATH}.${i}"
    if [[ $i == 0 ]]; then
        current_file="${LOG_PATH}"  # Handle current log file
    fi
    
    # Choose appropriate cat command based on compression
    if [[ -f "${current_file}.gz" ]]; then
        cat_cmd="zcat"
        current_file="${current_file}.gz"
    else
        cat_cmd="cat"
    fi
    
    # Process file if it exists
    if [[ -f $current_file ]]; then
        # Configure grep command based on case sensitivity
        grep_cmd="grep -E"
        if [[ $case_sensitive == "false" ]]; then
            grep_cmd="grep -iE"
        fi
        
        # Configure rejection filter
        reject_filter=""
        if [[ $include_rejected == "false" ]]; then
            reject_filter='| grep -v ": reject: "'
        fi
        
        # Process each matching line in the log file
        eval "$cat_cmd '$current_file' | $grep_cmd 'from=<.*?>\\sto=<($email_patterns)>' $reject_filter" | while IFS= read -r line; do
            # Extract datetime from log line
            datetime=$(echo "$line" | cut -c1-15)
            
            # Extract and decode subject
            raw_subject=$(echo "$line" | grep -oP 'Subject: \K.*?(?= from [^ ]+\[|$)' | tr -d '\n')
            
            # Decode MIME-encoded subjects
            if [[ $raw_subject == *"=?"* ]]; then
                decoded_subject=$(python3 /tmp/decode_subject.py "$raw_subject")
                [ -z "$decoded_subject" ] && decoded_subject=$raw_subject
            else
                decoded_subject=$raw_subject
            fi
            
            # Handle subjects starting with equals sign
            if [[ $QUOTE_EQUAL == true && $decoded_subject == "="* ]]; then
                decoded_subject="'$decoded_subject"
            fi
            
            # Extract email addresses
            from_email=$(echo "$line" | grep -oP 'from=<\K[^>]+')
            to_email=$(echo "$line" | grep -oP 'to=<\K[^>]+')
            
            # Output in selected format
            if [[ $output_format == "MARKDOWN" ]]; then
                # Escape markdown table separators in subject
                decoded_subject=$(echo "$decoded_subject" | sed "s/|/\\|/g" | sed "s/\`/\\\`/g")
                printf "| %s | %s%s%s | %s | %s |\n" \
                    "$datetime" \
                    "$SUBJECT_SEP" \
                    "$decoded_subject" \
                    "$SUBJECT_SEP" \
                    "$from_email" \
                    "$to_email"
            elif [[ $output_format == "CSV" ]]; then
                # Escape quotes in subject for CSV
                decoded_subject=$(echo "$decoded_subject" | sed 's/"/""/'g)
                printf "%s,\"%s\",%s,%s\n" \
                    "$datetime" \
                    "$decoded_subject" \
                    "$from_email" \
                    "$to_email"
            fi
        done
    fi
done

# Cleanup temporary files
rm /tmp/decode_subject.py

Save this script as postfix-mail-log-export.sh and make it executable:

chmod +x postfix-mail-log-export.sh

Script Features and Capabilities

Our script offers a comprehensive set of features designed to make mail log analysis both powerful and flexible. Let's explore each major capability in detail.

Flexible Output Formats

The script provides two output format options, each serving different needs:

  • CSV format - Perfect for detailed analysis in spreadsheet applications
  • Markdown tables - Ideal for documentation and reporting

Advanced Email Matching

The pattern matching system supports sophisticated filtering options:

  • Wildcard pattern support for flexible matching
  • Case-sensitive or case-insensitive matching options
  • Multiple recipient pattern matching in a single run
  • Domain-specific or user-specific filtering

Robust Subject Handling

The script excels at handling complex email subjects:

  • Automatic MIME-encoded subject decoding
  • Support for multiple character sets and encodings
  • Special handling for Excel-safe formatting
  • Preservation of unicode characters

CSV Output

Markdown Table Output

Help output

Comparison with Other Solutions

Feature This Script Manual Grep Logwatch
Subject Line Support Yes Limited No
MIME Decoding Yes No No
Multiple Output Formats Yes No Limited
Pattern Matching Advanced Basic Basic

Usage Examples

Let's explore some practical examples of using the script in different scenarios.

Basic Usage

Here's a basic command that covers most common use cases:

./postfix-mail-log-export.sh -r false -f csv -c true -n 1

This command configuration:

  • Excludes rejected emails (-r false)
  • Outputs in CSV format (-f csv)
  • Uses case-sensitive matching (-c true)
  • Processes one historical log file (-n 1)

Advanced Pattern Matching

The script supports sophisticated email pattern matching that can handle various use cases:

  • user@domain.tld - Exact match for specific addresses
  • *@domain.tld - Matches all recipients at a specific domain
  • user@* - Tracks a specific user across all domains

Multiple patterns can be combined by separating them with spaces when entering them at the prompt.

Common Issues and Solutions

While the script is designed to be robust, you might encounter some common issues. Here's how to address them:

Permission Issues

If you encounter permission-related problems:

  • Verify the script has read access to all log files
  • Use sudo when necessary for accessing system logs
  • Check and adjust log file ownership and permissions

Encoding Problems

For character encoding-related issues:

  • The script handles various character encodings automatically
  • UTF-8 is recommended for all log files
  • Additional charset mappings can be added to the Python decoder script

Performance Considerations

To optimize performance when working with large log files:

  • Use the history count option (-n) to limit the scope of processing
  • Consider running the script during off-peak hours
  • Monitor system resources during execution

Maintenance and Updates

To ensure optimal performance and reliability, consider these maintenance aspects:

Regular Maintenance

Implement these maintenance practices:

  • Regularly monitor log rotation settings to prevent excessive file buildup
  • Check disk space usage, especially when processing multiple log files
  • Update charset mappings as needed for new email sources

Script Updates

Keep the script current by:

  • Reviewing and updating regex patterns for new email formats
  • Adding support for new character encodings as needed
  • Maintaining Python compatibility with system updates

Conclusion

This Postfix mail log export script represents a powerful solution for email log analysis, combining the flexibility of Bash with Python's robust text processing capabilities. Whether you're troubleshooting mail delivery issues or conducting security audits, this tool simplifies the process of extracting and analyzing email log data.

Remember to properly configure Postfix for subject logging and ensure all dependencies are met before implementing the script. With its flexible output options and powerful pattern matching capabilities, this script can significantly streamline your mail server administration tasks.

The combination of easy setup, powerful features, and flexible output options makes this script an invaluable tool for any system administrator working with Postfix mail servers. By following the installation and configuration steps outlined in this guide, you'll be well-equipped to handle various mail log analysis tasks efficiently.