Could we help you? Please click the banners. We are young and desperately need the money
Email fraud continues to evolve, with scammers employing increasingly sophisticated techniques to bypass traditional security measures. One common target is DHL, the international shipping company, whose brand is frequently exploited in phishing attempts. In this article, we'll dive deep into an advanced regex pattern designed to catch these fraudulent emails, even when they use Unicode homoglyphs and special characters to evade detection.
Modern phishing attempts often use sophisticated techniques in the Email FROM header like homoglyph attacks, where similar-looking characters replace legitimate ones. For example, scammers might replace the Latin "H" in "DHL" with a Cyrillic "Н" that looks identical to human eyes but has a different Unicode value.
Here's a real-world example:
ᎠНᏞ_Express <DHL_Express.13972@sinectis.com.ar>
Most people will not notice that the first "DHL" word is written with unicode characters, which just look like "DHL". It uses different Unicode characters to impersonate the DHL brand. Also most spamfilters will not be able to detect this FROM header as spam because of the same reason. Try it out yourself: Press CTRL+F in your browser and enter "DHL". You will see that your browser will not mark the first "DHL" word.
Let's analyze our fraud detection pattern piece by piece:
(?:my-?)?(?:[_]|\b)(?:d|D|ᴅ|ď|đ|𝐝|𝑑|𝒅|𝓭|𝔡|𝕕|𝖉|𝗱|𝘥|𝙙|𝚍|Ꭰ).?(?:h|H|Н|н|ʜ|ℎ|ħ|ḣ|𝐡|𝒉|𝓱|𝔥|𝕙|𝖍|𝗵|𝘩|𝙝|𝚑).?(?:l|L|ʟ|ι|ℓ|ŀ|𝐥|𝑙|𝒍|𝓵|𝔩|𝕝|𝖑|𝗹|𝘭|𝙡|𝚕|Ꮮ)(?:[_]|\b).*?<(?!.*?@(?:.*?\.)?(?:dhl(?:-news)?|dhlfreight-news)\.(?:com|ch|de|ru|it|fr|at)(?:>|$)])
The pattern implements an advanced detection mechanism for email sender verification, specifically targeting potential DHL impersonation attempts in email communications. It employs a sophisticated regular expression to analyze the sender information in email headers, with particular focus on the FROM field. The pattern is designed to identify instances where "DHL" appears in the sender's display name while simultaneously verifying that the associated email domain does not correspond to any of DHL's legitimate international domains.
This dual-verification approach creates an effective filter against common phishing tactics where attackers attempt to exploit DHL's brand recognition by using the company name in the sender's display name while sending from unrelated domains. The pattern is particularly powerful as it accounts for various Unicode homoglyphs and character substitutions that malicious actors might employ to bypass simpler detection methods.
When implemented within SpamAssassin's rule framework, this pattern assigns a significant spam score to messages that match these characteristics, effectively filtering potential phishing attempts while maintaining a low false-positive rate through careful consideration of DHL's legitimate international domain portfolio.
One of the components of our pattern is (?:[_]|\b). This elegant solution addresses a specific limitation in regex word boundaries. Let's break down why this is important:
The pattern includes extensive character sets for each letter in "DHL":
D: d|D|ᴅ|ď|đ|𝐝|𝑑|𝒅|𝓭|𝔡|𝕕|𝖉|𝗱|𝘥|𝙙|𝚍|Ꭰ
H: h|H|Н|н|ʜ|ℎ|ħ|ḣ|𝐡|𝒉|𝓱|𝔥|𝕙|𝖍|𝗵|𝘩|𝙝|𝚑
L: l|L|ʟ|ι|ℓ|ŀ|𝐥|𝑙|𝒍|𝓵|𝔩|𝕝|𝖑|𝗹|𝘭|𝙡|𝚕|Ꮮ
These character sets include:
Here's a full list of a character map we compiled in the past years, which might help you creating proper REGEXs in the future:
a = (?:a|A|А|а|Α|α|ä|ą|𝐚|𝑎|𝒂|𝓪|𝔞|𝕒|𝖆|𝗮|𝘢|𝙖|𝚊)
b = (?:b|B|ʙ|В|в|β|ḃ|ḅ|𝐛|𝑏|𝒃|𝔟|𝕓|𝖇|𝗯|𝘣|𝙗|𝚋)
c = (?:c|C|С|ᥴ|с|ᴄ|ϲ|ċ|ç|𝐜|𝑐|𝒄|𝓬|𝔠|𝕔|𝖈|𝗰|𝘤|𝙘|𝚌)
d = (?:d|D|ᴅ|ď|đ|𝐝|𝑑|𝒅|𝓭|𝔡|𝕕|𝖉|𝗱|𝘥|𝙙|𝚍|Ꭰ)
e = (?:e|E|Е|з|є|ɘ|ɛ|ə|𝑒|𝖾|Ԑ|ԑ|ε|ë|ė|ẹ|𝐞|𝒆|𝓮|𝔢|𝖊|𝗲|𝘦|𝙚|𝚎)
f = (?:f|F|ꜰ|𝑓|𝖿|ƒ|𝘧|Ϝ|ϝ|ḟ|𝐟|𝒇|𝔣|𝕗|𝖋|𝗳|𝙛|𝚏)
g = (?:g|G|ɢ|Ԍ|ԍ|Ꮐ|ġ|ğ|𝐠|𝑔|𝒈|𝓰|𝔤|𝕘|𝖌|𝗴|𝘨|𝙜|𝚐)
h = (?:h|H|Н|н|ʜ|ℎ|ħ|ḣ|𝐡|𝒉|𝓱|𝔥|𝕙|𝖍|𝗵|𝘩|𝙝|𝚑)
i = (?:i|I|1|\||𝑖|𝖎|𝓲|!|і|í|ı|ɪ|ị|𝐢|𝒊|𝔦|𝕚|𝖏|𝗶|𝘪|𝙞|𝚒)
j = (?:j|J|Ј|ј|ᴊ|ϳ|ɉ|𝐣|𝑗|𝒋|𝓳|𝔧|𝕛|𝖏|𝗷|𝘫|𝙟|𝚓)
k = (?:k|K|К|к|ᴋ|ĸ|ҟ|𝐤|𝑘|𝒌|𝓴|𝔨|𝕜|𝖐|𝗸|𝘬|𝙠|𝚔)
l = (?:l|L|ʟ|ι|ℓ|ŀ|𝐥|𝑙|𝒍|𝓵|𝔩|𝕝|𝖑|𝗹|𝘭|𝙡|𝚕|Ꮮ)
m = (?:m|M|М|м|ᴍ|ṁ|𝐦|𝑚|𝒎|𝓶|𝔪|𝕞|𝖒|𝗺|𝘮|𝙢|𝚖)
n = (?:n|N|и|И|п|П|η|ɴ|ŋ|ɲ|ň|ṅ|𝐧|𝑛|𝒏|𝓷|𝔫|𝕟|𝖓|𝗻|𝘯|𝙣|𝚗|Ո)
o = (?:o|O|О|о|ᴏ|օ|σ|ȯ|ọ|𝐨|𝑜|𝒐|𝓸|𝔬|𝕠|𝖔|𝗼|𝘰|𝙤|𝚘)
p = (?:p|P|Р|р|ᴘ|ρ|ṗ|𝐩|𝑝|𝒑|𝓹|𝔭|𝕡|𝖕|𝗽|𝘱|𝙥|𝚙)
q = (?:q|Q|ԛ|զ|𝑞|𝖖|𝓆|q|ϙ|𝐪|𝒒|𝓺|𝔮|𝕢|𝗾|𝘲|𝙦|𝚚)
r = (?:r|R|Я|я|ʀ|г|𝑟|𝖗|r|ŕ|ṙ|𝐫|𝒓|𝓻|𝔯|𝕣|𝗋|𝘳|𝙧|𝚛)
s = (?:s|S|ꜱ|Ѕ|ѕ|ṡ|ș|𝐬|𝑠|𝒔|𝓼|𝔰|𝕤|𝖘|𝗌|𝘴|𝙨|𝚜)
t = (?:t|T|Т|𝑡|𝖙|𝓉|ŧ|𝘵|t|τ|ṫ|𝐭|𝒕|𝓽|𝔱|𝕥|𝖙|𝗍|𝚝)
u = (?:u|U|ц|Ц|ս|ʊ|𝑢|𝖚|𝓊|u|μ|ü|ụ|𝐮|𝒖|𝓾|𝔲|𝕦|𝖚|𝗎|𝘶|𝙪|𝚞)
v = (?:v|V|ѵ|ν|ⅴ|𝑣|𝖛|𝓋|v|ṿ|𝐯|𝒗|𝓿|𝔳|𝕧|𝖝|𝗏|𝘷|𝙫|𝚟)
w = (?:w|W|ѡ|ω|ш|Ш|𝑤|𝖜|𝓌|w|ẇ|𝐰|𝒘|𝔀|𝔴|𝕨|𝖜|𝗐|𝘸|𝙬|𝚠)
x = (?:x|X|Х|х|𝑥|𝘹|x|χ|𝖷|𝓍|ẋ|𝐱|𝒙|𝔁|𝔵|𝕩|𝖝|𝗑|𝙭|𝚡)
y = (?:y|Y|У|у|ʏ|ý|ɏ|𝐲|𝑦|𝒚|𝓎|𝔂|𝔶|𝕪|𝖞|𝗒|𝘺|𝙮|𝚢)
z = (?:z|Z|ᴢ|ʐ|ź|ż|ž|𝑧|𝖟|𝓏|z|ẓ|𝐳|𝒛|𝔃|𝔷|𝕫|𝖟|𝗓|𝘻|𝙯|𝚣)
Between each letter, we use `.?` to allow for a character of separation. This catches variants like:
- D-H-L - D.H.L - D H L
The pattern ends with a negative lookahead to ensure the email doesn't come from legitimate DHL domains:
(?!.*?@(?:.*?\.)?(dhl(?:-news)?|dhlfreight-news)\.(?:com|ch|de|ru|it|fr|at)>)
This checks that the email domain isn't:
The pattern's domain validation approach might seem counterintuitive at first, as it employs a negative lookahead to check for legitimate DHL domains. Instead of explicitly matching suspicious domains, we validate against a list of known legitimate DHL domains. This inverse logic serves a crucial purpose: it allows the pattern to match any sender that uses "DHL" in their display name while sending from a non-authorized domain.
This approach is particularly effective because:
By structuring the pattern this way, we create a more robust detection system that automatically flags any new, unauthorized domains that attackers might use, without requiring constant updates to our pattern's domain list.
Feature | Our Regex Pattern | Simple Text Match | Domain Whitelist |
---|---|---|---|
Homoglyph Detection | Yes | No | No |
Special Character Handling | Yes | Limited | No |
False Positive Rate | Low | High | Very Low |
Implementation Complexity | Medium | Low | Low |
SpamAssassin is a powerful tool in the fight against email fraud, and with proper configuration, it can effectively detect sophisticated phishing attempts targeting DHL customers. In this guide, we'll walk through implementing a custom rule that catches homoglyph-based DHL phishing attempts while maintaining a low false-positive rate.
Let's start with the complete rule configuration that you'll add to your SpamAssassin setup:
header PHISHING_DHL From =~ /(?:my-?)?(?:[_]|\b)(?:d|D|ᴅ|ď|đ|𝐝|𝑑|𝒅|𝓭|𝔡|𝕕|𝖉|𝗱|𝘥|𝙙|𝚍|Ꭰ).?(?:h|H|Н|н|ʜ|ℎ|ħ|ḣ|𝐡|𝒉|𝓱|𝔥|𝕙|𝖍|𝗵|𝘩|𝙝|𝚑).?(?:l|L|ʟ|ι|ℓ|ŀ|𝐥|𝑙|𝒍|𝓵|𝔩|𝕝|𝖑|𝗹|𝘭|𝙡|𝚕|Ꮮ)(?:[_]|\b).*?<(?!.*?@(?:.*?\.)?(?:dhl(?:-news)?|dhlfreight-news)\.(?:com|ch|de|ru|it|fr|at)(?:>|$)])/ims
describe PHISHING_DHL High Probability DHL Phishing/Scam
score PHISHING_DHL 8.0
The rule starts with:
The pattern uses three important flags:
score PHISHING_DHL 8.0
The score of 8.0 is deliberately high because:
Add the rule to your SpamAssassin configuration:
Locate your local.cf:
# Common locations:
- /etc/spamassassin/local.cf
- /etc/mail/spamassassin/local.cf
- /usr/local/etc/spamassassin/local.cf
Back up your existing configuration (example):
sudo cp /etc/spamassassin/local.cf /etc/spamassassin/local.cf.backup
Add the rule to local.cf
After adding the rule:
Check syntax:
spamassassin --lint
Test with a sample email:
spamassassin -D --test-mode < sample-email.txt
Restart SpamAssassin:
sudo systemctl restart spamassassin
Aspect | Impact | Mitigation |
---|---|---|
Pattern Complexity | Medium | Pattern compilation caching |
Memory Usage | Low | No action needed |
Processing Time | Low-Medium | Early rule positioning |
Keep the rule updated when:
Monitor these metrics:
Rule Not Triggering
False Positives
Performance Issues
This advanced regex pattern demonstrates how to catch sophisticated email fraud attempts that use Unicode homoglyphs and special characters. By understanding and implementing these techniques, you can better protect your users from phishing attempts that try to impersonate trusted brands like DHL.
Remember that this pattern is just one part of a comprehensive email security strategy. It should be combined with other security measures like SPF, DKIM, and DMARC for optimal protection against email fraud.
The SpamAssassin configuration provides robust protection against sophisticated DHL phishing attempts while maintaining good performance. Regular monitoring and updates ensure continued effectiveness against evolving threats.