v1.3.4 Improve Archive Command and update UnityFileSystemAPI by SkowronskiAndrew · Pull Request #62 · Unity-Technologies/UnityDataTools

SkowronskiAndrew · 2026-04-02T19:49:35Z

Summary

Adds new archive inspection subcommands, self-contained archive metadata parsing, JSON output support, and C# wrappers for recently added UnityFileSystemApi functions.

New `archive` subcommands

archive info — High-level summary: Unity version, file size, data size, compression ratio, compression algorithm, block count, file count, and serialized file count.
archive header — Displays the raw archive header fields (version, sizes, metadata compression, archive flags). Only UnityFS signature is supported; legacy signatures produce a clear error.
archive blocks — Lists each data block with file offset, data offset, uncompressed/compressed sizes, and compression type.

Improved `archive list`

Now parses the archive directory directly (instead of using the native DllWrapper) so it can expose the data offset of each file within the uncompressed data.
Supports JSON output via -f Json.

Improved `archive extract`

New --filter option for case-insensitive substring matching on file paths inside the archive. Prints "Extracted N out of M files" summary.

Self-contained archive metadata parsing

ArchiveDetector (in UnityBinaryFormat) now parses the full archive metadata section (BlocksInfo + DirectoryInfo), including LZ4/LZ4HC decompression via the new K4os.Compression.LZ4 NuGet dependency.
Handles both default layout (metadata after header) and at-end layout (metadata after data).
Includes consistency validation: directory nodes must be in order and non-overlapping, and block data must cover all file data.

WebBundle refactor

Extracted all WebBundle-specific code from Archive.cs into a dedicated WebBundleHelper class.

UnityFileSystemApi C# wrappers

Added P/Invoke declarations and public wrappers for 7 new native API functions: GetDllVersion, GetUnityVersion, GetSerializedFileVersion, GetTypeTreeCount, GetTypeTreeInfo, GetTypeTreeByIndex, RemoveTypeTreeSource.
Added TypeTreeCategory enum and TypeTreeInfo struct.

Examples

Example of new info command with an AssetBundle:

$ UnityDataTool archive info .\scenes
Unity Version                  2019.4.0f1
File Size                      70,290 bytes
Data Size                      70,123 bytes
Uncompressed Data Size         307,872 bytes
Compression Ratio              4.39x
Compression                    Lzma
Block Count                    1
File Count                     4
Serialized File Count          4

Example output of header from a small compressed player build

=== data.unity3d ===
--- header ---
Signature                      UnityFS
Version                        8
Unity Version                  6000.2.0a7
File Size                      3,033,272 bytes
Compressed Metadata Size       488
Uncompressed Metadata Size     934
Metadata Compression           Lz4HC
Flags                          BlocksAndDirectoryInfoCombined, BlockInfoNeedPaddingAtStart

Example output blocks (json form)

--- blocks ---
{
  "blocks": [
    {
      "index": 0,
      "fileOffset": 272,
      "dataOffset": 0,
      "uncompressedSize": 131072,
      "compressedSize": 69431,
      "compression": "Lz4",
      "isStreamed": false
    },
    {
      "index": 1,
      "fileOffset": 69703,
      "dataOffset": 131072,
      "uncompressedSize": 131072,
      "compressedSize": 85598,
      "compression": "Lz4",
      "isStreamed": false
    },

Example blocks output text form:

Blocks: 8
  #0    FileOffset: 272  DataOffset: 0  Uncompressed: 131,072  Compressed: 69,431  Compression: Lz4
  #1    FileOffset: 69,703  DataOffset: 131,072  Uncompressed: 131,072  Compressed: 85,598  Compression: Lz4
  #2    FileOffset: 155,301  DataOffset: 262,144  Uncompressed: 131,072  Compressed: 99,564  Compression: Lz4
  #3    FileOffset: 254,865  DataOffset: 393,216  Uncompressed: 131,072  Compressed: 54,236  Compression: Lz4
  #4    FileOffset: 309,101  DataOffset: 524,288  Uncompressed: 131,072  Compressed: 55,752  Compression: Lz4
  #5    FileOffset: 364,853  DataOffset: 655,360  Uncompressed: 131,072  Compressed: 55,286  Compression: Lz4
  #6    FileOffset: 420,139  DataOffset: 786,432  Uncompressed: 131,072  Compressed: 29,240  Compression: Lz4
  #7    FileOffset: 449,379  DataOffset: 917,504  Uncompressed: 45,613  Compressed: 10,275  Compression: Lz4

list now includes data offset (starting byte for the file in the uncompressed data)

--- list ---
[
  {
    "path": "globalgamemanagers",
    "dataOffset": 0,
    "size": 90352,
    "flags": "SerializedFile"
  },
  {
    "path": "Resources/unity_builtin_extra",
    "dataOffset": 90352,
    "size": 662392,
    "flags": "SerializedFile"
  },

Add P/Invoke declarations and public wrappers for 7 missing native API functions: GetDllVersion, GetUnityVersion, GetSerializedFileVersion, GetTypeTreeCount, GetTypeTreeInfo, GetTypeTreeByIndex, and RemoveTypeTreeSource. Also adds TypeTreeCategory enum and TypeTreeInfo struct. Fixes TypeTreeInfo name ambiguity in SerializedFileCommands.

Move all WebBundle-specific code (detection, parsing, extraction, listing) out of Archive.cs into a dedicated WebBundleHelper class. Archive now delegates to WebBundleHelper for web bundle operations.

Add -f/--format option to "archive list" supporting Text (default) and Json output, matching the existing serialized-file commands. Includes tests for both AssetBundle and WebBundle archive types in both formats, plus an extract test with file size verification.

Parse and print the header for Unity Archive files. This is similar to the header command already implemented for serialized files.

…command

Useful to extract just a single file or group of files out of an archive

Summarize the key metrics about the archive Add an example compressed player file (used to test info and useful for manual testing)

Add sanity check that the spans make sense in the Blocks and Directory data Add some more comments with some details of the format

Copilot

Pull request overview

This PR expands the archive command to provide richer inspection of Unity archive files (AssetBundles / UnityFS), adds JSON output options, and updates the UnityFileSystem API surface with new native interop calls.

Changes:

Add archive header, archive blocks, and archive info subcommands with -f/--format Text|Json, and enhance archive list output (incl. data offsets).
Add --filter support for archive extract and refactor web bundle handling into WebBundleHelper.
Introduce a managed UnityFS header/metadata parser (ArchiveDetector) including LZ4 metadata decompression, plus new tests and documentation.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
UnityFileSystem/UnityFileSystem.cs	Adds native API wrappers (dll version, unity version, remove type tree source).
UnityFileSystem/SerializedFile.cs	Adds serialized file version and type-tree enumeration APIs.
UnityFileSystem/DllWrapper.cs	Adds new P/Invoke declarations and TypeTreeInfo/TypeTreeCategory definitions.
UnityBinaryFormat/ArchiveDetector.cs	Adds UnityFS header + metadata parsing and LZ4 metadata decompression.
UnityBinaryFormat/BinaryFileHelper.cs	Adds `ReadUInt16` helper used by archive parsing.
UnityBinaryFormat/UnityBinaryFormat.csproj	Adds LZ4 decompression package dependency.
UnityDataTool/Program.cs	Wires new archive subcommands and output format option into CLI.
UnityDataTool/Archive.cs	Implements new archive subcommands and JSON/text formatting; adds filter support for extract.
UnityDataTool/WebBundleHelper.cs	New helper for listing/extracting UnityWebData `.data(.gz/.br)` bundles.
UnityDataTool/UnityDataTool.csproj	Updates assembly/file/informational version values.
UnityDataTool.Tests/*	Adds/updates tests and expected values for new archive outputs and JSON formats.
Documentation/command-archive.md	Documents new `archive` subcommands and options.
TestCommon/Data/PlayerDataCompressed/README.md	Adds description of new compressed player data sample.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T22:32:38Z

UnityDataTool/WebBundleHelper.cs

+                Path = filePath,
+            });
+            // Advance byte offset, so we keep track of the position (to know when we're done reading the header).
+            currentByteOffset += 3 * sizeof(uint) + filePath.Length;


ParseWebBundleHeader advances currentByteOffset using filePath.Length (characters) instead of the byte count read from the file (filePathLength). This will mis-detect the end of the header for non-ASCII paths and can break parsing. Use filePathLength (and keep it as a byte count) when advancing currentByteOffset.

Suggested change

currentByteOffset += 3 * sizeof(uint) + filePath.Length;

currentByteOffset += 3 * sizeof(uint) + filePathLength;

Fixed — now uses filePathLength (byte count) instead of filePath.Length (character count).

Copilot · 2026-04-08T22:32:39Z

UnityDataTool/WebBundleHelper.cs

+            // Always read the bytes to advance the stream position.
+            var data = ReadBytes(reader, (int)description.Size);
+
+            if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))
+                continue;
+
+            Console.WriteLine($"... Extracting {description.Path}");
+            var path = Path.Combine(outputFolder.ToString(), description.Path);
+            Directory.CreateDirectory(Path.GetDirectoryName(path));
+            File.WriteAllBytes(path, data);
+            extracted++;
+        }
+
+        Console.WriteLine($"Extracted {extracted} out of {total} files.");
+    }
+


Extract() casts description.Size (uint) to int when reading bytes. This will overflow/throw for entries >2GB and can lead to incorrect reads. Consider validating the size fits in int, or stream-copy using long-sized reads instead of buffering the whole file into memory.

Suggested change

// Always read the bytes to advance the stream position.

var data = ReadBytes(reader, (int)description.Size);

if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))

continue;

Console.WriteLine($"... Extracting {description.Path}");

var path = Path.Combine(outputFolder.ToString(), description.Path);

Directory.CreateDirectory(Path.GetDirectoryName(path));

File.WriteAllBytes(path, data);

extracted++;

}

Console.WriteLine($"Extracted {extracted} out of {total} files.");

}

long size = description.Size;

if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))

{

CopyBytes(reader.BaseStream, Stream.Null, size);

continue;

}

Console.WriteLine($"... Extracting {description.Path}");

var path = Path.Combine(outputFolder.ToString(), description.Path);

Directory.CreateDirectory(Path.GetDirectoryName(path));

using var output = File.Create(path);

CopyBytes(reader.BaseStream, output, size);

extracted++;

}

Console.WriteLine($"Extracted {extracted} out of {total} files.");

}

private static void CopyBytes(Stream input, Stream output, long bytesToCopy)

{

byte[] buffer = new byte[81920];

long remaining = bytesToCopy;

while (remaining > 0)

{

int bytesRead = input.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));

if (bytesRead <= 0)

throw new EndOfStreamException($"Unexpected end of stream while reading {bytesToCopy} bytes.");

output.Write(buffer, 0, bytesRead);

remaining -= bytesRead;

}

}

Valid concern for very large files. This is pre-existing behavior (the code was moved from Archive.cs, not written new). Web bundle entries are practically always well under 2GB, so this is low-risk. Filed as a known limitation — not addressing in this PR.

Copilot · 2026-04-08T22:32:39Z

UnityDataTool/WebBundleHelper.cs

+    static void ExtractFile(FileDescription description, BinaryReader reader, DirectoryInfo outputFolder)
+    {
+        // This function assumes `reader` is at the start of the binary data representing the file contents.
+        Console.WriteLine($"... Extracting {description.Path}");
+        var path = Path.Combine(outputFolder.ToString(), description.Path);
+        Directory.CreateDirectory(Path.GetDirectoryName(path));
+        File.WriteAllBytes(path, ReadBytes(reader, (int)description.Size));
+    }
+


ExtractFile() is declared but never used. Removing it (or wiring it into Extract()) would avoid dead code and keep the helper focused.

Suggested change

static void ExtractFile(FileDescription description, BinaryReader reader, DirectoryInfo outputFolder)

{

// This function assumes `reader` is at the start of the binary data representing the file contents.

Console.WriteLine($"... Extracting {description.Path}");

var path = Path.Combine(outputFolder.ToString(), description.Path);

Directory.CreateDirectory(Path.GetDirectoryName(path));

File.WriteAllBytes(path, ReadBytes(reader, (int)description.Size));

}

Fixed — removed the now-unused ExtractFile method. The extraction logic was inlined into Extract() when adding filter support.

Copilot · 2026-04-08T22:32:39Z

UnityBinaryFormat/ArchiveDetector.cs

+            // Read the metadata bytes (which may be compressed)
+            var compressedData = new byte[header.CompressedMetadataSize];
+            int bytesRead = stream.Read(compressedData, 0, compressedData.Length);
+            if (bytesRead != compressedData.Length)
+                throw new InvalidDataException("Could not read the full metadata section from the file.");


This allocates an array with a uint length (header.CompressedMetadataSize). Array lengths must be int, so this will not compile as-is. Cast with checked((int)...) after validating the value is <= int.MaxValue (and similarly anywhere else a uint size is used for allocations/reads).

This compiles and works correctly. C# array creation expressions accept uint dimensions (not just int) — the compiler handles the conversion. The metadata section will never get anywhere close to max int, so overflow is not a practical concern.

Copilot · 2026-04-08T22:32:39Z

UnityBinaryFormat/ArchiveDetector.cs

+            {
+                // LZ4 and LZ4HC use the same decompression algorithm.
+                uncompressedData = new byte[header.UncompressedMetadataSize];
+                int decoded = LZ4Codec.Decode(compressedData, 0, compressedData.Length,
+                    uncompressedData, 0, uncompressedData.Length);
+                if (decoded != header.UncompressedMetadataSize)
+                    throw new InvalidDataException($"LZ4 decompression produced {decoded} bytes, expected {header.UncompressedMetadataSize}.");


This allocates an array with a uint length (header.UncompressedMetadataSize), which will not compile. Add a checked cast to int with a range check (<= int.MaxValue) before allocating the buffer.

Same as above — uint is valid for C# array dimensions. No compilation issue.

Copilot · 2026-04-08T22:32:41Z

UnityDataTool/UnityDataTool.csproj

@@ -5,9 +5,9 @@
    <TargetFramework>net9.0</TargetFramework>
    <LangVersion>latest</LangVersion>
    <Version>1.3.2</Version>


The project version metadata is inconsistent: is still 1.3.2 while Assembly/File/InformationalVersion are 1.3.4. This can lead to incorrect NuGet/package/version reporting. Update to 1.3.4 (or align all version fields to the intended release).

Suggested change

<Version>1.3.2</Version>

<Version>1.3.4</Version>

Fixed — updated <Version> to 1.3.4 to match the other version fields.

Copilot · 2026-04-08T22:32:41Z

TestCommon/Data/PlayerDataCompressed/README.md

@@ -0,0 +1,5 @@
+This is an example of the format used for Player Data when compression is enabled.  
+
+It is a Unity Archive and can be examined with the "archive" command.  It as created with Unity 2021.3.20f1.


Typo: "It as created" should be "It was created".

Suggested change

It is a Unity Archive and can be examined with the "archive" command. It as created with Unity 2021.3.20f1.

It is a Unity Archive and can be examined with the "archive" command. It was created with Unity 2021.3.20f1.

Fixed — typo corrected.

Copilot · 2026-04-08T22:32:41Z

UnityDataTool/WebBundleHelper.cs

+        foreach (var description in fileDescriptions)
+        {
+            // Always read the bytes to advance the stream position.
+            var data = ReadBytes(reader, (int)description.Size);
+
+            if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))
+                continue;
+
+            Console.WriteLine($"... Extracting {description.Path}");
+            var path = Path.Combine(outputFolder.ToString(), description.Path);
+            Directory.CreateDirectory(Path.GetDirectoryName(path));
+            File.WriteAllBytes(path, data);


FileDescription.ByteOffset is parsed/stored but never used, and Extract() always reads the next bytes from the stream rather than seeking to ByteOffset. If the web bundle header offsets are not strictly sequential (e.g., padding/alignment), extraction will produce incorrect file contents. Consider seeking to description.ByteOffset (or validating the current position matches it) before reading each entry.

The web bundle format stores file data sequentially immediately after the header, so sequential reading is correct. The ByteOffset field from the header could be used for validation (asserting it matches the current stream position), but the sequential read approach matches the original pre-refactor code and works correctly for all test files. Adding seek-based extraction would be a separate improvement.

Copilot · 2026-04-08T22:32:41Z

UnityDataTool/Archive.cs

+    static void ExtractAssetBundle(FileInfo filename, DirectoryInfo outputFolder, string filter)
    {
-        Console.WriteLine($"Extracting asset bundle: {filename}");
+        Console.WriteLine($"Extracting files from archive: {filename}");
        using var archive = UnityFileSystem.MountArchive(filename.FullName, "/");
+
+        int total = archive.Nodes.Count;
+        int extracted = 0;
+
        foreach (var node in archive.Nodes)
        {
+            if (filter != null && !node.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))
+                continue;
+
            Console.WriteLine($"... Extracting {node.Path}");


New behavior: --filter controls which files are extracted, but there are no tests covering filtered extraction (e.g., that only matching paths are written and that the exit code/output counts are correct). Add a test case that runs archive extract ... --filter <text> and asserts only the expected subset of files exist.

Agreed — filter tests would be valuable. The filter feature was manually tested and the overall extract test covers the non-filtered path. Adding dedicated filter tests is a good follow-up.

Claude was being lazy, i told it to add a test.

UnityDataTool/WebBundleHelper.cs

SkowronskiAndrew added 8 commits April 1, 2026 10:17

[CBD] Extract WebBundle logic from Archive into WebBundleHelper

f6a9590

Move all WebBundle-specific code (detection, parsing, extraction, listing) out of Archive.cs into a dedicated WebBundleHelper class. Archive now delegates to WebBundleHelper for web bundle operations.

Add "header" subcommand for archive

68a8cb2

Parse and print the header for Unity Archive files. This is similar to the header command already implemented for serialized files.

Archive command supports "blocks" command and adds offsets to "list" …

dc261de

…command

Dump the version to 1.3.4

0d56463

Add filter argument to archive extract

d6b4c2a

Useful to extract just a single file or group of files out of an archive

Archive Info command

55e1726

Summarize the key metrics about the archive Add an example compressed player file (used to test info and useful for manual testing)

SkowronskiAndrew mentioned this pull request Apr 7, 2026

Support repacking Archives (LH4 compression configuration) #64

Open

SkowronskiAndrew added 2 commits April 8, 2026 17:52

Improve offset reporting for Blocks and Directory contents

4d97a18

Comments and add sanity check

35bf4d2

Add sanity check that the spans make sense in the Blocks and Directory data Add some more comments with some details of the format

SkowronskiAndrew requested a review from Copilot April 8, 2026 22:27

Copilot started reviewing on behalf of SkowronskiAndrew April 8, 2026 22:28 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Claude fixes based on CoPilot review

50ab520

SkowronskiAndrew merged commit 148801b into main Apr 8, 2026
5 checks passed

	currentByteOffset += 3 * sizeof(uint) + filePath.Length;
	currentByteOffset += 3 * sizeof(uint) + filePathLength;

		@@ -0,0 +1,5 @@
		This is an example of the format used for Player Data when compression is enabled.

		It is a Unity Archive and can be examined with the "archive" command. It as created with Unity 2021.3.20f1.

Conversation

SkowronskiAndrew commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New archive subcommands

Improved archive list

Improved archive extract

Self-contained archive metadata parsing

WebBundle refactor

UnityFileSystemApi C# wrappers

Examples

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

SkowronskiAndrew Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SkowronskiAndrew commented Apr 2, 2026 •

edited

Loading

New `archive` subcommands

Improved `archive list`

Improved `archive extract`

SkowronskiAndrew Apr 8, 2026 •

edited

Loading