If you encounter Unicode/encoding errors like:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 6319: invalid start byte
This means your log files contain characters that aren't valid UTF-8. Here are several solutions:
Replace problematic characters with placeholder characters:
cat /var/log/maillog | python postfix_log_parser.py --encoding-errors=replaceSkip problematic characters entirely:
cat /var/log/maillog | python postfix_log_parser.py --encoding-errors=ignoreMany legacy systems use Latin-1 encoding:
cat /var/log/maillog | python postfix_log_parser.py --encoding=latin-1Let the parser try to detect the encoding automatically:
cat /var/log/maillog | python postfix_log_parser.py --encoding=autoClean the log file before parsing:
iconv -f latin-1 -t utf-8 /var/log/maillog | python postfix_log_parser.pysed 's/[^\x00-\x7F]//g' /var/log/maillog | python postfix_log_parser.pycat /var/log/maillog | python postfix_log_parser.py --encoding-errors=replace --line-modetail -f /var/log/maillog | python postfix_log_parser.py --line-mode --encoding-errors=replacecat /var/log/maillog.1 | python postfix_log_parser.py --encoding=auto --flush-remainingzcat /var/log/maillog.*.gz | python postfix_log_parser.py --encoding-errors=replace-
--encoding=utf-8(default): Expect UTF-8 encoding -
--encoding=latin-1: Use Latin-1/ISO-8859-1 encoding (common in older systems) -
--encoding=cp1252: Use Windows-1252 encoding -
--encoding=auto: Try to auto-detect the encoding -
--encoding-errors=strict(default for most): Fail on encoding errors -
--encoding-errors=replace: Replace problematic characters with � -
--encoding-errors=ignore: Skip problematic characters entirely
For most production environments with mixed encoding issues:
cat /var/log/maillog | python postfix_log_parser.py --encoding-errors=replace --line-modeThis will:
- Handle encoding errors gracefully by replacing bad characters
- Output each log line immediately (good for real-time processing)
- Continue processing even if some characters are corrupted