Skip to content

v1.3.4 Improve Archive Command and update UnityFileSystemAPI#62

Merged
SkowronskiAndrew merged 11 commits intomainfrom
next
Apr 8, 2026
Merged

v1.3.4 Improve Archive Command and update UnityFileSystemAPI#62
SkowronskiAndrew merged 11 commits intomainfrom
next

Conversation

@SkowronskiAndrew
Copy link
Copy Markdown
Collaborator

@SkowronskiAndrew SkowronskiAndrew commented Apr 2, 2026

Summary

Adds new archive inspection subcommands, self-contained archive metadata parsing, JSON output support, and C# wrappers for recently added UnityFileSystemApi functions.

New archive subcommands

  • archive info — High-level summary: Unity version, file size, data size, compression ratio, compression algorithm, block count, file count, and serialized file count.
  • archive header — Displays the raw archive header fields (version, sizes, metadata compression, archive flags). Only UnityFS signature is supported; legacy signatures produce a clear error.
  • archive blocks — Lists each data block with file offset, data offset, uncompressed/compressed sizes, and compression type.

Improved archive list

  • Now parses the archive directory directly (instead of using the native DllWrapper) so it can expose the data offset of each file within the uncompressed data.
  • Supports JSON output via -f Json.

Improved archive extract

  • New --filter option for case-insensitive substring matching on file paths inside the archive. Prints "Extracted N out of M files" summary.

Self-contained archive metadata parsing

  • ArchiveDetector (in UnityBinaryFormat) now parses the full archive metadata section (BlocksInfo + DirectoryInfo), including LZ4/LZ4HC decompression via the new K4os.Compression.LZ4 NuGet dependency.
  • Handles both default layout (metadata after header) and at-end layout (metadata after data).
  • Includes consistency validation: directory nodes must be in order and non-overlapping, and block data must cover all file data.

WebBundle refactor

  • Extracted all WebBundle-specific code from Archive.cs into a dedicated WebBundleHelper class.

UnityFileSystemApi C# wrappers

  • Added P/Invoke declarations and public wrappers for 7 new native API functions: GetDllVersion, GetUnityVersion, GetSerializedFileVersion, GetTypeTreeCount, GetTypeTreeInfo, GetTypeTreeByIndex, RemoveTypeTreeSource.
  • Added TypeTreeCategory enum and TypeTreeInfo struct.

Examples

Example of new info command with an AssetBundle:

$ UnityDataTool archive info .\scenes
Unity Version                  2019.4.0f1
File Size                      70,290 bytes
Data Size                      70,123 bytes
Uncompressed Data Size         307,872 bytes
Compression Ratio              4.39x
Compression                    Lzma
Block Count                    1
File Count                     4
Serialized File Count          4

Example output of header from a small compressed player build

=== data.unity3d ===
--- header ---
Signature                      UnityFS
Version                        8
Unity Version                  6000.2.0a7
File Size                      3,033,272 bytes
Compressed Metadata Size       488
Uncompressed Metadata Size     934
Metadata Compression           Lz4HC
Flags                          BlocksAndDirectoryInfoCombined, BlockInfoNeedPaddingAtStart

Example output blocks (json form)

--- blocks ---
{
  "blocks": [
    {
      "index": 0,
      "fileOffset": 272,
      "dataOffset": 0,
      "uncompressedSize": 131072,
      "compressedSize": 69431,
      "compression": "Lz4",
      "isStreamed": false
    },
    {
      "index": 1,
      "fileOffset": 69703,
      "dataOffset": 131072,
      "uncompressedSize": 131072,
      "compressedSize": 85598,
      "compression": "Lz4",
      "isStreamed": false
    },

Example blocks output text form:

Blocks: 8
  #0    FileOffset: 272  DataOffset: 0  Uncompressed: 131,072  Compressed: 69,431  Compression: Lz4
  #1    FileOffset: 69,703  DataOffset: 131,072  Uncompressed: 131,072  Compressed: 85,598  Compression: Lz4
  #2    FileOffset: 155,301  DataOffset: 262,144  Uncompressed: 131,072  Compressed: 99,564  Compression: Lz4
  #3    FileOffset: 254,865  DataOffset: 393,216  Uncompressed: 131,072  Compressed: 54,236  Compression: Lz4
  #4    FileOffset: 309,101  DataOffset: 524,288  Uncompressed: 131,072  Compressed: 55,752  Compression: Lz4
  #5    FileOffset: 364,853  DataOffset: 655,360  Uncompressed: 131,072  Compressed: 55,286  Compression: Lz4
  #6    FileOffset: 420,139  DataOffset: 786,432  Uncompressed: 131,072  Compressed: 29,240  Compression: Lz4
  #7    FileOffset: 449,379  DataOffset: 917,504  Uncompressed: 45,613  Compressed: 10,275  Compression: Lz4

list now includes data offset (starting byte for the file in the uncompressed data)

--- list ---
[
  {
    "path": "globalgamemanagers",
    "dataOffset": 0,
    "size": 90352,
    "flags": "SerializedFile"
  },
  {
    "path": "Resources/unity_builtin_extra",
    "dataOffset": 90352,
    "size": 662392,
    "flags": "SerializedFile"
  },

Add P/Invoke declarations and public wrappers for 7 missing native API
functions: GetDllVersion, GetUnityVersion, GetSerializedFileVersion,
GetTypeTreeCount, GetTypeTreeInfo, GetTypeTreeByIndex, and
RemoveTypeTreeSource. Also adds TypeTreeCategory enum and TypeTreeInfo
struct. Fixes TypeTreeInfo name ambiguity in SerializedFileCommands.
Move all WebBundle-specific code (detection, parsing, extraction,
listing) out of Archive.cs into a dedicated WebBundleHelper class.
Archive now delegates to WebBundleHelper for web bundle operations.
Add -f/--format option to "archive list" supporting Text (default) and
Json output, matching the existing serialized-file commands. Includes
tests for both AssetBundle and WebBundle archive types in both formats,
plus an extract test with file size verification.
Parse and print the header for Unity Archive files.  This is similar to the header command already implemented for serialized files.
Useful to extract just a single file or group of files out of an archive
Summarize the key metrics about the archive
Add an example compressed player file (used to test info and useful for manual testing)
Add sanity check that the spans make sense in the Blocks and Directory data
Add some more comments with some details of the format
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the archive command to provide richer inspection of Unity archive files (AssetBundles / UnityFS), adds JSON output options, and updates the UnityFileSystem API surface with new native interop calls.

Changes:

  • Add archive header, archive blocks, and archive info subcommands with -f/--format Text|Json, and enhance archive list output (incl. data offsets).
  • Add --filter support for archive extract and refactor web bundle handling into WebBundleHelper.
  • Introduce a managed UnityFS header/metadata parser (ArchiveDetector) including LZ4 metadata decompression, plus new tests and documentation.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
UnityFileSystem/UnityFileSystem.cs Adds native API wrappers (dll version, unity version, remove type tree source).
UnityFileSystem/SerializedFile.cs Adds serialized file version and type-tree enumeration APIs.
UnityFileSystem/DllWrapper.cs Adds new P/Invoke declarations and TypeTreeInfo/TypeTreeCategory definitions.
UnityBinaryFormat/ArchiveDetector.cs Adds UnityFS header + metadata parsing and LZ4 metadata decompression.
UnityBinaryFormat/BinaryFileHelper.cs Adds ReadUInt16 helper used by archive parsing.
UnityBinaryFormat/UnityBinaryFormat.csproj Adds LZ4 decompression package dependency.
UnityDataTool/Program.cs Wires new archive subcommands and output format option into CLI.
UnityDataTool/Archive.cs Implements new archive subcommands and JSON/text formatting; adds filter support for extract.
UnityDataTool/WebBundleHelper.cs New helper for listing/extracting UnityWebData .data(.gz/.br) bundles.
UnityDataTool/UnityDataTool.csproj Updates assembly/file/informational version values.
UnityDataTool.Tests/* Adds/updates tests and expected values for new archive outputs and JSON formats.
Documentation/command-archive.md Documents new archive subcommands and options.
TestCommon/Data/PlayerDataCompressed/README.md Adds description of new compressed player data sample.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Path = filePath,
});
// Advance byte offset, so we keep track of the position (to know when we're done reading the header).
currentByteOffset += 3 * sizeof(uint) + filePath.Length;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseWebBundleHeader advances currentByteOffset using filePath.Length (characters) instead of the byte count read from the file (filePathLength). This will mis-detect the end of the header for non-ASCII paths and can break parsing. Use filePathLength (and keep it as a byte count) when advancing currentByteOffset.

Suggested change
currentByteOffset += 3 * sizeof(uint) + filePath.Length;
currentByteOffset += 3 * sizeof(uint) + filePathLength;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — now uses filePathLength (byte count) instead of filePath.Length (character count).

Comment on lines +37 to +52
// Always read the bytes to advance the stream position.
var data = ReadBytes(reader, (int)description.Size);

if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))
continue;

Console.WriteLine($"... Extracting {description.Path}");
var path = Path.Combine(outputFolder.ToString(), description.Path);
Directory.CreateDirectory(Path.GetDirectoryName(path));
File.WriteAllBytes(path, data);
extracted++;
}

Console.WriteLine($"Extracted {extracted} out of {total} files.");
}

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract() casts description.Size (uint) to int when reading bytes. This will overflow/throw for entries >2GB and can lead to incorrect reads. Consider validating the size fits in int, or stream-copy using long-sized reads instead of buffering the whole file into memory.

Suggested change
// Always read the bytes to advance the stream position.
var data = ReadBytes(reader, (int)description.Size);
if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))
continue;
Console.WriteLine($"... Extracting {description.Path}");
var path = Path.Combine(outputFolder.ToString(), description.Path);
Directory.CreateDirectory(Path.GetDirectoryName(path));
File.WriteAllBytes(path, data);
extracted++;
}
Console.WriteLine($"Extracted {extracted} out of {total} files.");
}
long size = description.Size;
if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))
{
CopyBytes(reader.BaseStream, Stream.Null, size);
continue;
}
Console.WriteLine($"... Extracting {description.Path}");
var path = Path.Combine(outputFolder.ToString(), description.Path);
Directory.CreateDirectory(Path.GetDirectoryName(path));
using var output = File.Create(path);
CopyBytes(reader.BaseStream, output, size);
extracted++;
}
Console.WriteLine($"Extracted {extracted} out of {total} files.");
}
private static void CopyBytes(Stream input, Stream output, long bytesToCopy)
{
byte[] buffer = new byte[81920];
long remaining = bytesToCopy;
while (remaining > 0)
{
int bytesRead = input.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead <= 0)
throw new EndOfStreamException($"Unexpected end of stream while reading {bytesToCopy} bytes.");
output.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern for very large files. This is pre-existing behavior (the code was moved from Archive.cs, not written new). Web bundle entries are practically always well under 2GB, so this is low-risk. Filed as a known limitation — not addressing in this PR.

Comment on lines +130 to +138
static void ExtractFile(FileDescription description, BinaryReader reader, DirectoryInfo outputFolder)
{
// This function assumes `reader` is at the start of the binary data representing the file contents.
Console.WriteLine($"... Extracting {description.Path}");
var path = Path.Combine(outputFolder.ToString(), description.Path);
Directory.CreateDirectory(Path.GetDirectoryName(path));
File.WriteAllBytes(path, ReadBytes(reader, (int)description.Size));
}

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtractFile() is declared but never used. Removing it (or wiring it into Extract()) would avoid dead code and keep the helper focused.

Suggested change
static void ExtractFile(FileDescription description, BinaryReader reader, DirectoryInfo outputFolder)
{
// This function assumes `reader` is at the start of the binary data representing the file contents.
Console.WriteLine($"... Extracting {description.Path}");
var path = Path.Combine(outputFolder.ToString(), description.Path);
Directory.CreateDirectory(Path.GetDirectoryName(path));
File.WriteAllBytes(path, ReadBytes(reader, (int)description.Size));
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the now-unused ExtractFile method. The extraction logic was inlined into Extract() when adding filter support.

Comment on lines +301 to +305
// Read the metadata bytes (which may be compressed)
var compressedData = new byte[header.CompressedMetadataSize];
int bytesRead = stream.Read(compressedData, 0, compressedData.Length);
if (bytesRead != compressedData.Length)
throw new InvalidDataException("Could not read the full metadata section from the file.");
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates an array with a uint length (header.CompressedMetadataSize). Array lengths must be int, so this will not compile as-is. Cast with checked((int)...) after validating the value is <= int.MaxValue (and similarly anywhere else a uint size is used for allocations/reads).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

@SkowronskiAndrew SkowronskiAndrew Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This compiles and works correctly. C# array creation expressions accept uint dimensions (not just int) — the compiler handles the conversion. The metadata section will never get anywhere close to max int, so overflow is not a practical concern.

Comment on lines +314 to +320
{
// LZ4 and LZ4HC use the same decompression algorithm.
uncompressedData = new byte[header.UncompressedMetadataSize];
int decoded = LZ4Codec.Decode(compressedData, 0, compressedData.Length,
uncompressedData, 0, uncompressedData.Length);
if (decoded != header.UncompressedMetadataSize)
throw new InvalidDataException($"LZ4 decompression produced {decoded} bytes, expected {header.UncompressedMetadataSize}.");
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates an array with a uint length (header.UncompressedMetadataSize), which will not compile. Add a checked cast to int with a range check (<= int.MaxValue) before allocating the buffer.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — uint is valid for C# array dimensions. No compilation issue.

@@ -5,9 +5,9 @@
<TargetFramework>net9.0</TargetFramework>
<LangVersion>latest</LangVersion>
<Version>1.3.2</Version>
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project version metadata is inconsistent: is still 1.3.2 while Assembly/File/InformationalVersion are 1.3.4. This can lead to incorrect NuGet/package/version reporting. Update to 1.3.4 (or align all version fields to the intended release).

Suggested change
<Version>1.3.2</Version>
<Version>1.3.4</Version>

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated <Version> to 1.3.4 to match the other version fields.

@@ -0,0 +1,5 @@
This is an example of the format used for Player Data when compression is enabled.

It is a Unity Archive and can be examined with the "archive" command. It as created with Unity 2021.3.20f1.
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "It as created" should be "It was created".

Suggested change
It is a Unity Archive and can be examined with the "archive" command. It as created with Unity 2021.3.20f1.
It is a Unity Archive and can be examined with the "archive" command. It was created with Unity 2021.3.20f1.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — typo corrected.

Comment on lines +35 to +46
foreach (var description in fileDescriptions)
{
// Always read the bytes to advance the stream position.
var data = ReadBytes(reader, (int)description.Size);

if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))
continue;

Console.WriteLine($"... Extracting {description.Path}");
var path = Path.Combine(outputFolder.ToString(), description.Path);
Directory.CreateDirectory(Path.GetDirectoryName(path));
File.WriteAllBytes(path, data);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileDescription.ByteOffset is parsed/stored but never used, and Extract() always reads the next bytes from the stream rather than seeking to ByteOffset. If the web bundle header offsets are not strictly sequential (e.g., padding/alignment), extraction will produce incorrect file contents. Consider seeking to description.ByteOffset (or validating the current position matches it) before reading each entry.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web bundle format stores file data sequentially immediately after the header, so sequential reading is correct. The ByteOffset field from the header could be used for validation (asserting it matches the current stream position), but the sequential read approach matches the original pre-refactor code and works correctly for all test files. Adding seek-based extraction would be a separate improvement.

Comment on lines +353 to 366
static void ExtractAssetBundle(FileInfo filename, DirectoryInfo outputFolder, string filter)
{
Console.WriteLine($"Extracting asset bundle: {filename}");
Console.WriteLine($"Extracting files from archive: {filename}");
using var archive = UnityFileSystem.MountArchive(filename.FullName, "/");

int total = archive.Nodes.Count;
int extracted = 0;

foreach (var node in archive.Nodes)
{
if (filter != null && !node.Path.Contains(filter, StringComparison.OrdinalIgnoreCase))
continue;

Console.WriteLine($"... Extracting {node.Path}");
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior: --filter controls which files are extracted, but there are no tests covering filtered extraction (e.g., that only matching paths are written and that the exit code/output counts are correct). Add a test case that runs archive extract ... --filter <text> and asserts only the expected subset of files exist.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — filter tests would be valuable. The filter feature was manually tested and the overall extract test covers the non-filtered path. Adding dedicated filter tests is a good follow-up.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude was being lazy, i told it to add a test.

@SkowronskiAndrew SkowronskiAndrew merged commit 148801b into main Apr 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants