What does FileId look like?

Forty-one characters. Starts with 0000. Example: 0000da39a3ee5e6b4b0d3255bfef95601890afd80709. AmcacheParser splits this into FileId (full string) and Hash (40 chars without prefix).

Does Amcache hash the whole file?

No. The SHA-1 covers the first 31 MiB only. For files under 31 MiB (almost everything), the value equals the whole-file SHA-1. For larger files it is a prefix hash.

Can I submit an Amcache hash to VirusTotal directly?

Yes, but strip the 0000 first. Use the Hash column from AmcacheParser. Including the prefix returns an empty result that looks identical to 'unknown file', which is the worst kind of misleading.

What does it mean if two rows share FullPath but have different Hash?

The binary at that path changed between inventories. Strong signal for binary replacement. Either a legitimate software update or an attacker swapping a system binary or a regularly-run user tool for a trojanised copy.

What about multiple rows with the same Hash but different paths?

Same content inventoried at multiple locations. Common reasons: attacker copying a tool around to test which location runs, an installer dropping the same DLL into multiple directories, or a user copying files manually.

Amcache FileId explained: the SHA-1 hash format Windows stores

Q: Why the 0000 prefix?

Historical type tag. Early appraiser builds anticipated multiple hash algorithms with different prefixes. Only SHA-1 ever shipped. The prefix has been constant for years.

FileId in Root\InventoryApplicationFile is one of the most useful fields in the whole hive, and one of the most misunderstood. It is a content hash. It is not quite a standard SHA-1. It does not quite hash the whole file. This page is the full reference.

For the broader context, see the Amcache complete reference. For the surrounding registry structure inside the hive, see Amcache registry structure.

What the value looks like#

A typical FileId from a real hive:

0000da39a3ee5e6b4b0d3255bfef95601890afd80709

41 characters total:

The first four are always "0000". Fixed type tag.
The remaining 40 are the SHA-1 hex digest.

The "0000" prefix is a historical artefact. Early appraiser builds anticipated multiple hash algorithms with different prefixes. Only SHA-1 ever shipped. The prefix has been constant for years.

AmcacheParser splits this into two CSV columns:

Column	Value
`FileId`	Full 41-character string with prefix.
`Hash`	40 hex characters, SHA-1 alone.

Always use Hash (or strip the prefix yourself) when joining against external feeds. VirusTotal, TI feeds, allowlist databases all want 40-character SHA-1. They will silently not match anything if you include the "0000".

What it actually hashes#

The trap that catches almost every new analyst:

The SHA-1 hashes the first 31 MiB of the file, not the whole file.

For files smaller than 31 MiB (which is most EXEs and DLLs), the prefix hash equals the whole-file SHA-1. Indistinguishable.

For larger files, Amcache's value is a prefix hash. Still distinctive enough to identify a specific build of a specific binary. But it is not what sha1sum would give you on the whole file.

Why this matters#

VirusTotal matches. Under 31 MiB, Amcache SHA-1 matches the SHA-1 VT indexes. Larger files (installers, some game binaries, large enterprise software) often don't match, and a "no record" response from VT means nothing useful.
Custom hash databases. If you maintain an internal allowlist, store the same kind of hash you'll compare against. Either store full-content SHA-1s (and accept large-binary mismatches) or maintain a parallel prefix-hash column.
Verification. If you have the original binary and want to verify, hash only the first 31 MiB:

import hashlib
PREFIX_BYTES = 31 * 1024 * 1024
 
def amcache_sha1(path: str) -> str:
    h = hashlib.sha1()
    with open(path, 'rb') as f:
        h.update(f.read(PREFIX_BYTES))
    return h.hexdigest()

Real-world traps#

A handful of pitfalls that come up on actual cases.

Don't include the prefix in lookups#

# Wrong
search_virustotal('0000da39a3ee5e6b4b0d3255bfef95601890afd80709')
 
# Right
search_virustotal('da39a3ee5e6b4b0d3255bfef95601890afd80709')

VirusTotal's API expects the bare hash. Including the prefix returns an empty result silently, which looks identical to "this file is unknown". This wastes triage time and produces wrong findings.

SHA-1 collisions are theoretical but not impossible#

Real SHA-1 collision attacks exist. In a non-adversarial context this is irrelevant. Finding a collision against a specific Amcache entry is wildly disproportionate to what an attacker gains. But for high-confidence matching in a high-stakes investigation, do not treat a SHA-1 match as cryptographic identity. Pair with file size, link date, and at least one other field.

Don't trust a non-PE row's FileId#

Amcache occasionally records FileId values for non-PE files the appraiser saw. The hash is still real, but downstream tools that assume PE context (VirusTotal's PE-aware searches, Yara rules against PE bytes) return less useful results.

Multiple rows, same hash#

If you find the same Hash across multiple *_UnassociatedFileEntries.csv rows on the same host, that is meaningful. The same binary content was inventoried at multiple paths. Common reasons:

Attacker copied the tool to several locations to test which one would execute.
Legitimate installer dropped the same DLL into multiple product directories.
User copied a file around manually.

Cluster by Hash, then look at FullPath and KeyLastWriteTimestamp for each instance. Timestamps tell you the sequence. Paths tell you the intent.

Multiple hashes, same FullPath#

The opposite pattern. Same path, different Hash across multiple rows means the binary at that path changed between inventories. Strong signal for binary replacement:

Legitimate: software update overwrote the file.
Suspicious: attacker swapped a system binary or regularly-run user tool for a trojanised copy.

Sort the rows by KeyLastWriteTimestamp to see when each new hash appeared. Correlate with patch events or Sysmon File Create events around those times.

Pivots that earn their keep#

Cross-host hash hunting#

# Pivot a known-bad SHA-1 across every host's Amcache CSV
$badHash = 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
Get-ChildItem -Recurse -Filter *_UnassociatedFileEntries.csv |
  ForEach-Object {
    Import-Csv $_.FullName |
      Where-Object { $_.Hash -eq $badHash } |
      Select @{n='Host';e={$_.PSChildName.Split('_')[0]}},
             FullPath, KeyLastWriteTimestamp, Size
  } |
  Sort-Object Host

This is how you go from "we found this hash on one host" to "every host in the estate that has ever had this binary".

VirusTotal enrichment#

import csv, requests, time
 
API = 'https://www.virustotal.com/api/v3/files/'
HEADERS = {'x-apikey': '<your-key>'}
 
seen = set()
with open('HOST_amcache_UnassociatedFileEntries.csv', newline='') as f:
    for row in csv.DictReader(f):
        h = row['Hash']
        if not h or h in seen:
            continue
        seen.add(h)
        r = requests.get(API + h, headers=HEADERS)
        if r.status_code == 200:
            stats = r.json()['data']['attributes']['last_analysis_stats']
            if stats.get('malicious', 0) > 0:
                print(h, stats, row['FullPath'])
        time.sleep(15)  # VT public API rate limit

Even low-volume against the public API yields a tight list of confirmed-bad hashes on a typical infected host.

Sysmon Image Loaded correlation#

Sysmon Event ID 7 records the SHA-1 of every DLL loaded by every process. Joining Amcache Hash to Sysmon 7's Hashes field tells you exactly which processes loaded a given attacker DLL, and when.

Amcache FileId explained: the SHA-1 hash format Windows stores

What the value looks like#

What it actually hashes#

Why this matters#

Real-world traps#

Don't include the prefix in lookups#

SHA-1 collisions are theoretical but not impossible#

Don't trust a non-PE row's FileId#

Multiple rows, same hash#

Multiple hashes, same FullPath#

Pivots that earn their keep#

Cross-host hash hunting#

VirusTotal enrichment#

Sysmon Image Loaded correlation#

Further reading#

Related posts

Amcache FileId explained: the SHA-1 hash format Windows stores

What the value looks like#

What it actually hashes#

Why this matters#

Real-world traps#

Don't include the prefix in lookups#

SHA-1 collisions are theoretical but not impossible#

Don't trust a non-PE row's FileId#

Multiple rows, same hash#

Multiple hashes, same FullPath#

Pivots that earn their keep#

Cross-host hash hunting#

VirusTotal enrichment#

Sysmon Image Loaded correlation#

Further reading#

Related#

Related posts