Back to Sandbox
HIGH RISK OBSERVED File Inspection
/sandbox/findings/homoglyph-filename

Homoglyph Filename

The filename contains Unicode characters that look identical to common characters but are different code points — used to visually spoof the file extension.

A homoglyph is a character that looks visually identical — or nearly identical — to a different character. The Latin letter a (U+0061) and the Cyrillic letter а (U+0430) are visually indistinguishable in most fonts, but they are different Unicode code points.

When used in filenames, homoglyphs can make a file appear to have a safe extension while actually having none — or a different one entirely. The file invoice.рdf (with Cyrillic р) renders identically to invoice.pdf in most filename displays, but the operating system sees no recognized extension. The file may open in a generic handler, execute directly, or be stored in a way that bypasses extension-based security controls.

Target character Homoglyph Source
Latin a (U+0061) Cyrillic а (U+0430) Cyrillic alphabet
Latin e (U+0065) Cyrillic е (U+0435) Cyrillic alphabet
Period . (U+002E) One Dot Leader (U+2024) General punctuation
Period . (U+002E) Fullwidth Full Stop (U+FF0E) Fullwidth forms
Latin p (U+0070) Cyrillic р (U+0440) Cyrillic alphabet

The attack exploits a fundamental assumption: users and security tools trust what they can read. A filename that renders as invoice.pdf is assumed to be a PDF. Extension-based email filtering, endpoint security controls, and user training all rely on this assumption.

Homoglyph substitution defeats that assumption without requiring any exploit. The deception is at the display layer — the actual bytes contain a different character that the rendering font happens to draw the same way. No vulnerability is needed. No execution has occurred yet. The deception is complete before the file is even opened.

There is no legitimate software distribution use case for homoglyph characters in filenames. Their presence in a filename is itself sufficient to treat the file as suspect, independent of any other finding.

DETECTION METHOD
ShieldScope applies Unicode NFKD normalization to the filename and inspects the code points of characters in positions that affect extension parsing — particularly the extension separator and characters within the extension itself. Non-ASCII characters in these positions are flagged. The comparison is performed against the raw Unicode code point values, not the rendered glyph, so visual similarity does not affect detection accuracy.
OBSERVED

The Unicode code point values in the filename are directly readable from the filename byte sequence. A non-ASCII code point in an extension position is directly observable — no inference or pattern matching is required. The substitution either exists in the byte sequence or it does not.

  • Treat as malicious. There is no legitimate reason for homoglyph characters in a filename. The visual deception alone is sufficient indicator — no further analysis is required to justify treating this file as a threat.
  • Do not open, execute, or forward the file without security team review.
  • If received via email, report the message as a phishing attempt and preserve the original for investigation. The sending account may be compromised or spoofed.
  • Check the sending channel — homoglyph filenames often accompany social engineering designed to create urgency (unpaid invoices, contract documents, urgent requests).
  • Inspect the actual file bytes using ShieldScope's File tab to understand what the file actually is, independent of the misleading filename.
↗ Inspect a file in Sandbox