Created: December 13th 2024
Last updated: December 13th 2024
Categories: Cyber Security
Author: Marcus Fleuti

Advanced Email Fraud Detection: Using Regex Patterns to Catch Sophisticated DHL Phishing Attempts with Unicode Homoglyphs

Tags: DHL scam detection, email authentication, email fraud detection, email security, homoglyph attack prevention, phishing prevention, regex pattern matching, regex word boundaries, spam filtering, Spamassassin, SpamAssassin configuration, SpamAssassin custom rules, SpamAssassin rules, Unicode character detection

Donate with

82uymVXLkvVbB4c4JpTd1tYm1yj1cKPKR2wqmw3XF8YXKTmY7JrTriP4pVwp2EJYBnCFdXhLq4zfFA6ic7VAWCFX5wfQbCC

Introduction

Email fraud continues to evolve, with scammers employing increasingly sophisticated techniques to bypass traditional security measures. One common target is DHL, the international shipping company, whose brand is frequently exploited in phishing attempts. In this article, we'll dive deep into an advanced regex pattern designed to catch these fraudulent emails, even when they use Unicode homoglyphs and special characters to evade detection.

Understanding the Challenge

Modern phishing attempts often use sophisticated techniques in the Email FROM header like homoglyph attacks, where similar-looking characters replace legitimate ones. For example, scammers might replace the Latin "H" in "DHL" with a Cyrillic "Н" that looks identical to human eyes but has a different Unicode value.
Here's a real-world example:

ᎠНᏞ_Express <DHL_Express.13972@sinectis.com.ar>

Most people will not notice that the first "DHL" word is written with unicode characters, which just look like "DHL". It uses different Unicode characters to impersonate the DHL brand. Also most spamfilters will not be able to detect this FROM header as spam because of the same reason. Try it out yourself: Press CTRL+F in your browser and enter "DHL". You will see that your browser will not mark the first "DHL" word.

Breaking Down the Regex Pattern

Let's analyze our fraud detection pattern piece by piece:

(?:my-?)?(?:[_]|\b)(?:d|D|ᴅ|ď|đ|𝐝|𝑑|𝒅|𝓭|𝔡|𝕕|𝖉|𝗱|𝘥|𝙙|𝚍|Ꭰ).?(?:h|H|Н|н|ʜ|ℎ|ħ|ḣ|𝐡|𝒉|𝓱|𝔥|𝕙|𝖍|𝗵|𝘩|𝙝|𝚑).?(?:l|L|ʟ|ι|ℓ|ŀ|𝐥|𝑙|𝒍|𝓵|𝔩|𝕝|𝖑|𝗹|𝘭|𝙡|𝚕|Ꮮ)(?:[_]|\b).*?<(?!.*?@(?:.*?\.)?(?:dhl(?:-news)?|dhlfreight-news)\.(?:com|ch|de|ru|it|fr|at)(?:>|$)])

1. General description (in simple words)

The pattern implements an advanced detection mechanism for email sender verification, specifically targeting potential DHL impersonation attempts in email communications. It employs a sophisticated regular expression to analyze the sender information in email headers, with particular focus on the FROM field. The pattern is designed to identify instances where "DHL" appears in the sender's display name while simultaneously verifying that the associated email domain does not correspond to any of DHL's legitimate international domains.

This dual-verification approach creates an effective filter against common phishing tactics where attackers attempt to exploit DHL's brand recognition by using the company name in the sender's display name while sending from unrelated domains. The pattern is particularly powerful as it accounts for various Unicode homoglyphs and character substitutions that malicious actors might employ to bypass simpler detection methods.

When implemented within SpamAssassin's rule framework, this pattern assigns a significant spam score to messages that match these characteristics, effectively filtering potential phishing attempts while maintaining a low false-positive rate through careful consideration of DHL's legitimate international domain portfolio.

2. The Word Boundary Pattern Problem

One of the components of our pattern is (?:[_]|\b). This elegant solution addresses a specific limitation in regex word boundaries. Let's break down why this is important:

The standard word boundary \b in regex considers underscore (_) as a word character
- This means \bDHL\b wouldn't match within the string "DHL_Express"
Our pattern (?:[_]|\b) creates a custom boundary that also treats underscores as separators
This catches both traditional word boundaries AND underscore-separated text

3. Homoglyph Detection

The pattern includes extensive character sets for each letter in "DHL":
D: d|D|ᴅ|ď|đ|𝐝|𝑑|𝒅|𝓭|𝔡|𝕕|𝖉|𝗱|𝘥|𝙙|𝚍|ᎠH: h|H|Н|н|ʜ|ℎ|ħ|ḣ|𝐡|𝒉|𝓱|𝔥|𝕙|𝖍|𝗵|𝘩|𝙝|𝚑L: l|L|ʟ|ι|ℓ|ŀ|𝐥|𝑙|𝒍|𝓵|𝔩|𝕝|𝖑|𝗹|𝘭|𝙡|𝚕|Ꮮ
These character sets include:

Standard ASCII characters (both upper and lowercase). Yes, we are also using the /i mode for which you might think this is unnecessary. And this is true - for some cases. There is characters though in REGEX like for example "é" and "É" which do not work with the /i mode. To ensure proper matching we use both, a full set of characters in combination with the /i mode being enabled.
Unicode mathematical variants
Similar-looking characters from different scripts
Decorated versions of the letters

Here's a full list of a character map we compiled in the past years, which might help you creating proper REGEXs in the future:

a = (?:a|A|А|а|Α|α|ä|ą|𝐚|𝑎|𝒂|𝓪|𝔞|𝕒|𝖆|𝗮|𝘢|𝙖|𝚊)
b = (?:b|B|ʙ|В|в|β|ḃ|ḅ|𝐛|𝑏|𝒃|𝔟|𝕓|𝖇|𝗯|𝘣|𝙗|𝚋)
c = (?:c|C|С|ᥴ|с|ᴄ|ϲ|ċ|ç|𝐜|𝑐|𝒄|𝓬|𝔠|𝕔|𝖈|𝗰|𝘤|𝙘|𝚌)
d = (?:d|D|ᴅ|ď|đ|𝐝|𝑑|𝒅|𝓭|𝔡|𝕕|𝖉|𝗱|𝘥|𝙙|𝚍|Ꭰ)
e = (?:e|E|Е|з|є|ɘ|ɛ|ə|𝑒|𝖾|Ԑ|ԑ|ε|ë|ė|ẹ|𝐞|𝒆|𝓮|𝔢|𝖊|𝗲|𝘦|𝙚|𝚎)
f = (?:f|F|ꜰ|𝑓|𝖿|ƒ|𝘧|Ϝ|ϝ|ḟ|𝐟|𝒇|𝔣|𝕗|𝖋|𝗳|𝙛|𝚏)
g = (?:g|G|ɢ|Ԍ|ԍ|Ꮐ|ġ|ğ|𝐠|𝑔|𝒈|𝓰|𝔤|𝕘|𝖌|𝗴|𝘨|𝙜|𝚐)
h = (?:h|H|Н|н|ʜ|ℎ|ħ|ḣ|𝐡|𝒉|𝓱|𝔥|𝕙|𝖍|𝗵|𝘩|𝙝|𝚑)
i = (?:i|I|1|\||𝑖|𝖎|𝓲|!|і|í|ı|ɪ|ị|𝐢|𝒊|𝔦|𝕚|𝖏|𝗶|𝘪|𝙞|𝚒)
j = (?:j|J|Ј|ј|ᴊ|ϳ|ɉ|𝐣|𝑗|𝒋|𝓳|𝔧|𝕛|𝖏|𝗷|𝘫|𝙟|𝚓)
k = (?:k|K|К|к|ᴋ|ĸ|ҟ|𝐤|𝑘|𝒌|𝓴|𝔨|𝕜|𝖐|𝗸|𝘬|𝙠|𝚔)
l = (?:l|L|ʟ|ι|ℓ|ŀ|𝐥|𝑙|𝒍|𝓵|𝔩|𝕝|𝖑|𝗹|𝘭|𝙡|𝚕|Ꮮ)
m = (?:m|M|М|м|ᴍ|ṁ|𝐦|𝑚|𝒎|𝓶|𝔪|𝕞|𝖒|𝗺|𝘮|𝙢|𝚖)
n = (?:n|N|и|И|п|П|η|ɴ|ŋ|ɲ|ň|ṅ|𝐧|𝑛|𝒏|𝓷|𝔫|𝕟|𝖓|𝗻|𝘯|𝙣|𝚗|Ո)
o = (?:o|O|О|о|ᴏ|օ|σ|ȯ|ọ|𝐨|𝑜|𝒐|𝓸|𝔬|𝕠|𝖔|𝗼|𝘰|𝙤|𝚘)
p = (?:p|P|Р|р|ᴘ|ρ|ṗ|𝐩|𝑝|𝒑|𝓹|𝔭|𝕡|𝖕|𝗽|𝘱|𝙥|𝚙)
q = (?:q|Q|ԛ|զ|𝑞|𝖖|𝓆|ｑ|ϙ|𝐪|𝒒|𝓺|𝔮|𝕢|𝗾|𝘲|𝙦|𝚚)
r = (?:r|R|Я|я|ʀ|г|𝑟|𝖗|ｒ|ŕ|ṙ|𝐫|𝒓|𝓻|𝔯|𝕣|𝗋|𝘳|𝙧|𝚛)
s = (?:s|S|ꜱ|Ѕ|ѕ|ṡ|ș|𝐬|𝑠|𝒔|𝓼|𝔰|𝕤|𝖘|𝗌|𝘴|𝙨|𝚜)
t = (?:t|T|Т|𝑡|𝖙|𝓉|ŧ|𝘵|ｔ|τ|ṫ|𝐭|𝒕|𝓽|𝔱|𝕥|𝖙|𝗍|𝚝)
u = (?:u|U|ц|Ц|ս|ʊ|𝑢|𝖚|𝓊|ｕ|μ|ü|ụ|𝐮|𝒖|𝓾|𝔲|𝕦|𝖚|𝗎|𝘶|𝙪|𝚞)
v = (?:v|V|ѵ|ν|ⅴ|𝑣|𝖛|𝓋|ｖ|ṿ|𝐯|𝒗|𝓿|𝔳|𝕧|𝖝|𝗏|𝘷|𝙫|𝚟)
w = (?:w|W|ѡ|ω|ш|Ш|𝑤|𝖜|𝓌|ｗ|ẇ|𝐰|𝒘|𝔀|𝔴|𝕨|𝖜|𝗐|𝘸|𝙬|𝚠)
x = (?:x|X|Х|х|𝑥|𝘹|ｘ|χ|𝖷|𝓍|ẋ|𝐱|𝒙|𝔁|𝔵|𝕩|𝖝|𝗑|𝙭|𝚡)
y = (?:y|Y|У|у|ʏ|ý|ɏ|𝐲|𝑦|𝒚|𝓎|𝔂|𝔶|𝕪|𝖞|𝗒|𝘺|𝙮|𝚢)
z = (?:z|Z|ᴢ|ʐ|ź|ż|ž|𝑧|𝖟|𝓏|ｚ|ẓ|𝐳|𝒛|𝔃|𝔷|𝕫|𝖟|𝗓|𝘻|𝙯|𝚣)

4. Flexible Matching with `.?`

Between each letter, we use `.?` to allow for a character of separation. This catches variants like:

- D-H-L
- D.H.L
- D H L

Legitimate Domain Validation

The pattern ends with a negative lookahead to ensure the email doesn't come from legitimate DHL domains:

(?!.*?@(?:.*?\.)?(dhl(?:-news)?|dhlfreight-news)\.(?:com|ch|de|ru|it|fr|at)>)

This checks that the email domain isn't:

dhl.com, dhl.de, etc.
dhl-news.com, dhl-news.de, etc.
dhlfreight-news.com, dhlfreight-news.de, etc.

The pattern's domain validation approach might seem counterintuitive at first, as it employs a negative lookahead to check for legitimate DHL domains. Instead of explicitly matching suspicious domains, we validate against a list of known legitimate DHL domains. This inverse logic serves a crucial purpose: it allows the pattern to match any sender that uses "DHL" in their display name while sending from a non-authorized domain.

This approach is particularly effective because:

The list of legitimate DHL domains is finite and well-known
The list of potential malicious domains is infinite and constantly changing
Any email claiming to be from DHL should only come from their official domains

By structuring the pattern this way, we create a more robust detection system that automatically flags any new, unauthorized domains that attackers might use, without requiring constant updates to our pattern's domain list.

Alternative Approaches and Comparisons

Feature	Our Regex Pattern	Simple Text Match	Domain Whitelist
Homoglyph Detection	Yes	No	No
Special Character Handling	Yes	Limited	No
False Positive Rate	Low	High	Very Low
Implementation Complexity	Medium	Low	Low

Implementing the ruleset in SpamAssassin

Introduction

SpamAssassin is a powerful tool in the fight against email fraud, and with proper configuration, it can effectively detect sophisticated phishing attempts targeting DHL customers. In this guide, we'll walk through implementing a custom rule that catches homoglyph-based DHL phishing attempts while maintaining a low false-positive rate.

The SpamAssassin Rule

Let's start with the complete rule configuration that you'll add to your SpamAssassin setup:

header          PHISHING_DHL                       From =~ /(?:my-?)?(?:[_]|\b)(?:d|D|ᴅ|ď|đ|𝐝|𝑑|𝒅|𝓭|𝔡|𝕕|𝖉|𝗱|𝘥|𝙙|𝚍|Ꭰ).?(?:h|H|Н|н|ʜ|ℎ|ħ|ḣ|𝐡|𝒉|𝓱|𝔥|𝕙|𝖍|𝗵|𝘩|𝙝|𝚑).?(?:l|L|ʟ|ι|ℓ|ŀ|𝐥|𝑙|𝒍|𝓵|𝔩|𝕝|𝖑|𝗹|𝘭|𝙡|𝚕|Ꮮ)(?:[_]|\b).*?<(?!.*?@(?:.*?\.)?(?:dhl(?:-news)?|dhlfreight-news)\.(?:com|ch|de|ru|it|fr|at)(?:>|$)])/ims
describe        PHISHING_DHL                       High Probability DHL Phishing/Scam
score           PHISHING_DHL                       8.0

Understanding the Components

1. Header Rule Definition

The rule starts with:

header: Indicates this is a header-based rule
describe: The description shown to the enduser in the spam report
score: The amount of points given when the filter matches (common setting: point values > 5 will cause the e-mail to be filtered out as spam)
PHISHING_DHL: The unique identifier for this rule
From =~ : Specifies that we're matching against the From header

2. Pattern Flags

The pattern uses three important flags:

/i: CASE-INSENSITIVE mode
/m: MULTILINE mode (^ and $ match line start/end)
/s: DOTALL mode (Makes dot match any character INCLUDING newlines)

3. Scoring Configuration

score PHISHING_DHL 8.0
The score of 8.0 is deliberately high because:

DHL phishing attempts are common and dangerous
The pattern is specifically designed to minimize false positives
Most legitimate DHL communication comes from official domains

Implementation Guide

1. Installation Location

Add the rule to your SpamAssassin configuration:

Locate your local.cf:

# Common locations:
- /etc/spamassassin/local.cf
- /etc/mail/spamassassin/local.cf
- /usr/local/etc/spamassassin/local.cf

Back up your existing configuration (example):

sudo cp /etc/spamassassin/local.cf /etc/spamassassin/local.cf.backup

Add the rule to local.cf

2. Testing the Configuration

After adding the rule:

Check syntax:

spamassassin --lint

Test with a sample email:

spamassassin -D --test-mode < sample-email.txt

Restart SpamAssassin:

sudo systemctl restart spamassassin

Performance Considerations

Aspect	Impact	Mitigation
Pattern Complexity	Medium	Pattern compilation caching
Memory Usage	Low	No action needed
Processing Time	Low-Medium	Early rule positioning

Monitoring and Maintenance

1. Regular Checks

Monitor SpamAssassin logs for rule hits
Track false positives/negatives
Review scoring effectiveness

2. Rule Updates

Keep the rule updated when:

New DHL domains are added
New Unicode homoglyphs are discovered
False positives are reported

3. Performance Monitoring

Monitor these metrics:

Rule processing time
Memory usage
Cache hit rates

Troubleshooting Common Issues

Rule Not Triggering

Verify rule syntax
Check SpamAssassin debug logs
Confirm character encoding support

False Positives

Review legitimate DHL domains
Adjust scoring if needed
Consider adding whitelist rules

Performance Issues

Monitor system resources
Consider rule optimization
Check regex engine performance

Conclusion

This advanced regex pattern demonstrates how to catch sophisticated email fraud attempts that use Unicode homoglyphs and special characters. By understanding and implementing these techniques, you can better protect your users from phishing attempts that try to impersonate trusted brands like DHL.

Remember that this pattern is just one part of a comprehensive email security strategy. It should be combined with other security measures like SPF, DKIM, and DMARC for optimal protection against email fraud.

The SpamAssassin configuration provides robust protection against sophisticated DHL phishing attempts while maintaining good performance. Regular monitoring and updates ensure continued effectiveness against evolving threats.

December 19th 2024

Categories

Money money money...

Advanced Email Fraud Detection: Using Regex Patterns to Catch Sophisticated DHL Phishing Attempts with Unicode Homoglyphs

Introduction

Understanding the Challenge

Breaking Down the Regex Pattern

1. General description (in simple words)

2. The Word Boundary Pattern Problem

3. Homoglyph Detection

4. Flexible Matching with `.?`

Legitimate Domain Validation

Alternative Approaches and Comparisons

Implementing the ruleset in SpamAssassin

Introduction

The SpamAssassin Rule

Understanding the Components

1. Header Rule Definition

2. Pattern Flags

3. Scoring Configuration

Implementation Guide

1. Installation Location

2. Testing the Configuration

Performance Considerations

Monitoring and Maintenance

1. Regular Checks

2. Rule Updates

3. Performance Monitoring

Troubleshooting Common Issues

Conclusion

Spamassassin: Detect Domain Spoofing in Emails: A Custom Plugin to Identify Domain Impersonation Attempts

SpamAssassin Plugin: Detecting Email Usernames in Subject Lines for Enhanced Spam Detection - clone

How to Recognize and Prevent Spam Emails

SpamAssassin: Detect Spam by TO/CC Address Usage in Email Body with False Positive Prevention

[SOLVED] SPF setting does not apply to Return-Path causing more spam and phishing e-mails | Spamassassin | Postfix

Unsere Leistungen