Homoglyph Filename
The filename contains Unicode characters that look identical to common characters but are different code points — used to visually spoof the file extension.
A homoglyph is a character that looks visually identical — or nearly identical — to a different character. The Latin letter a (U+0061) and the Cyrillic letter а (U+0430) are visually indistinguishable in most fonts, but they are different Unicode code points.
When used in filenames, homoglyphs can make a file appear to have a safe extension while actually having none — or a different one entirely. The file invoice.рdf (with Cyrillic р) renders identically to invoice.pdf in most filename displays, but the operating system sees no recognized extension. The file may open in a generic handler, execute directly, or be stored in a way that bypasses extension-based security controls.
| Target character | Homoglyph | Source |
|---|---|---|
Latin a (U+0061) |
Cyrillic а (U+0430) |
Cyrillic alphabet |
Latin e (U+0065) |
Cyrillic е (U+0435) |
Cyrillic alphabet |
Period . (U+002E) |
One Dot Leader ․ (U+2024) |
General punctuation |
Period . (U+002E) |
Fullwidth Full Stop . (U+FF0E) |
Fullwidth forms |
Latin p (U+0070) |
Cyrillic р (U+0440) |
Cyrillic alphabet |
The attack exploits a fundamental assumption: users and security tools trust what they can read. A filename that renders as invoice.pdf is assumed to be a PDF. Extension-based email filtering, endpoint security controls, and user training all rely on this assumption.
Homoglyph substitution defeats that assumption without requiring any exploit. The deception is at the display layer — the actual bytes contain a different character that the rendering font happens to draw the same way. No vulnerability is needed. No execution has occurred yet. The deception is complete before the file is even opened.
There is no legitimate software distribution use case for homoglyph characters in filenames. Their presence in a filename is itself sufficient to treat the file as suspect, independent of any other finding.
The Unicode code point values in the filename are directly readable from the filename byte sequence. A non-ASCII code point in an extension position is directly observable — no inference or pattern matching is required. The substitution either exists in the byte sequence or it does not.
- Treat as malicious. There is no legitimate reason for homoglyph characters in a filename. The visual deception alone is sufficient indicator — no further analysis is required to justify treating this file as a threat.
- Do not open, execute, or forward the file without security team review.
- If received via email, report the message as a phishing attempt and preserve the original for investigation. The sending account may be compromised or spoofed.
- Check the sending channel — homoglyph filenames often accompany social engineering designed to create urgency (unpaid invoices, contract documents, urgent requests).
- Inspect the actual file bytes using ShieldScope's File tab to understand what the file actually is, independent of the misleading filename.