Amcache FileId explained: the SHA-1 hash format Windows stores
The FileId value in Root\InventoryApplicationFile is one of the
most useful fields in the entire Amcache hive — and one of the most
misunderstood. It is the file's content hash, but it is not quite
a standard SHA-1, and it does not quite hash the whole file. This
post is the full reference: what the value is, how to use it, and
the traps that catch new analysts.
For the broader Amcache reference, see the Amcache complete reference; for the surrounding registry structure, see Amcache registry structure.
What the value looks like#
A typical FileId from a real hive:
0000da39a3ee5e6b4b0d3255bfef95601890afd80709
41 characters total:
- The first 4 characters are always
"0000"— a fixed type tag. - The remaining 40 characters are the file's SHA-1 hex digest.
The "0000" prefix is a historical artefact: early versions of the
appraiser anticipated multiple hash algorithms (with each prefix
indicating which) but in practice only SHA-1 was ever used. Today
the prefix is constant.
When AmcacheParser exposes this field in its CSV, it splits it into two columns:
| Column | Value |
|---|---|
FileId |
The full 41-character string, prefix included. |
Hash |
Just the 40 hex characters — the SHA-1 alone. |
Always use Hash (or strip the prefix yourself) when joining
against external hash feeds. VirusTotal, your TI feeds, and
hash-allowlist databases expect a 40-char SHA-1 — they will not
match anything if you include the "0000" prefix.
What it actually hashes#
This is the trap that catches almost every new Amcache analyst:
Amcache's SHA-1 hashes the first 31 MiB of the file, not the whole file.
For files smaller than 31 MiB (which is almost everything — most EXEs and DLLs are well under), the prefix hash equals the whole-file SHA-1. They are indistinguishable from one another.
For files larger than 31 MiB, the Amcache hash is a prefix
hash, not a full-content hash. It is still distinctive enough to
identify a specific build of a specific binary, but it is not the
same value you would get from sha1sum on the whole file.
Why this matters#
- VirusTotal matches. For files under 31 MiB, the Amcache SHA-1 matches the SHA-1 VirusTotal indexes. For larger files (installers, some game binaries, large enterprise software) it will not match, and a VirusTotal "no record" response is meaningless.
- Custom hash databases. If you maintain an internal allowlist of known-good hashes, make sure you are storing the same kind of hash you'll compare against. Either store full-content SHA-1s (and accept that large-binary comparisons against Amcache will fail) or maintain a parallel prefix-hash column.
- Recompiling for verification. If you have the original binary on hand and want to verify that an Amcache hash matches, hash only the first 31 MiB:
import hashlib
PREFIX_BYTES = 31 * 1024 * 1024 # 31 MiB
def amcache_sha1(path: str) -> str:
h = hashlib.sha1()
with open(path, 'rb') as f:
h.update(f.read(PREFIX_BYTES))
return h.hexdigest()Real-world traps#
A handful of pitfalls that come up on real cases:
Don't include the "0000" prefix in lookups#
# Wrong
search_virustotal('0000da39a3ee5e6b4b0d3255bfef95601890afd80709')
# Right
search_virustotal('da39a3ee5e6b4b0d3255bfef95601890afd80709')VirusTotal's API specifically expects the bare hash. Including the prefix returns an empty result silently — which looks identical to "this file is unknown" and is far more misleading.
Hash collisions are theoretical but not impossible#
SHA-1 has known collision attacks. In a non-adversarial context this is irrelevant — finding a SHA-1 collision against a specific Amcache entry requires effort vastly disproportionate to what an attacker gains. But for high-confidence matching in a high-stakes investigation, do not treat a SHA-1 match as cryptographic identity. Pair with file size, link date, and at least one other field.
Don't trust an IsPeFile = False row's FileId#
Amcache occasionally records FileId values for non-PE files
inventoried by the appraiser. The hash is still real, but the
context is different — it is hashing whatever the file is (a script,
a config file), and downstream tools that assume PE-file context
(VirusTotal's PE-aware searches, Yara rules against PE bytes) will
return less useful results.
Multiple rows, same hash#
If you find the same Hash value across multiple
*_UnassociatedFileEntries.csv rows on the same host, that is
meaningful. It means the same binary content was inventoried at
multiple paths. Common reasons:
- The attacker copied a tool into several locations to test which one would execute.
- A legitimate installer dropped the same DLL into multiple product directories.
- A user copied a file around manually.
Cluster by Hash, then look at the FullPath set and
KeyLastWriteTimestamp for each instance. The timestamps tell you
the sequence of copies; the paths tell you the intent.
Multiple hashes, same FullPath#
The opposite pattern — the same path with different Hash values
across multiple rows — means the binary at that path changed
between inventories. This is a strong signal for binary
replacement:
- Legitimate: a software update overwrote the file.
- Suspicious: an attacker replaced a system binary or a regularly-run user tool with a trojanised copy.
Sort the rows by KeyLastWriteTimestamp to see when each new hash
appeared, then correlate with patch events or Sysmon File Create
events around those times.
Pivots that use FileId / Hash#
The pivots that earn their pain on real cases:
Cross-host hash hunting#
# Pivot a known-bad SHA-1 across every host's Amcache CSV
$badHash = 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
Get-ChildItem -Recurse -Filter *_UnassociatedFileEntries.csv |
ForEach-Object {
Import-Csv $_.FullName |
Where-Object { $_.Hash -eq $badHash } |
Select-Object @{n='Host';e={$_.PSChildName.Split('_')[0]}},
FullPath, KeyLastWriteTimestamp, Size
} |
Sort-Object HostThis is how you go from "we found this hash on one host" to "tell me every host in the environment that has ever had this binary present, and when it appeared."
VirusTotal enrichment of a CSV#
import csv, requests, time
API = 'https://www.virustotal.com/api/v3/files/'
HEADERS = {'x-apikey': '<your-key>'}
seen = set()
with open('HOST_amcache_UnassociatedFileEntries.csv', newline='') as f:
for row in csv.DictReader(f):
h = row['Hash']
if not h or h in seen:
continue
seen.add(h)
r = requests.get(API + h, headers=HEADERS)
if r.status_code == 200:
stats = r.json()['data']['attributes']['last_analysis_stats']
if stats.get('malicious', 0) > 0:
print(h, stats, row['FullPath'])
time.sleep(15) # VT public API rate limitEven a low-volume lookup against the public API yields a tight list of confirmed-bad hashes on a typical infected host.
Correlating with Sysmon Image Loaded events#
Sysmon event ID 7 (Image Loaded) records the SHA-1 of every DLL
loaded by every process. Joining Amcache Hash to Sysmon 7's
Hashes field tells you exactly which processes loaded a given
attacker DLL, and when.
See also#
- Amcache complete reference — the high-level overview.
- Amcache registry structure —
where
FileIdsits in the hive. - Amcache ProgramId explained —
the other unique identifier in
InventoryApplicationFile. - Amcache timestamps explained
— how to pivot
FileIdmatches in time. - AmcacheParser output columns explained — the surrounding CSV columns.
Want to see the FileId values in your own hive without
installing anything? Drop a hive on the
parser home page — it parses entirely in your browser.
Related posts
- Volatility and Amcache: extracting the hive from memory images
A practical guide to recovering Amcache from a Windows memory image using Volatility — when memory-side recovery is the only option, which plugins to use, and how to hand off to AmcacheParser.
- RegRipper amcache plugin: what it does and when to use it
A practical guide to RegRipper's amcache plugin — what it parses, how its text output differs from AmcacheParser's CSV, and when to reach for it instead of (or alongside) the Zimmerman tool.
- What is Amcache FileId? (glossary)
FileId is the 41-character identifier Amcache stores for each file — '0000' + the SHA-1 hex of the first 31 MiB of the file.
- AmcacheParser output columns explained: every CSV field decoded
A field-by-field reference for AmcacheParser's CSV output — FileId, PathHash, ProgramId, LinkDate, BinFileVersion, IsPeFile, and every other column, with the pivots that matter in DFIR.