-
Notifications
You must be signed in to change notification settings - Fork 538
Add Incremental Archive Updates #634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Incremental Archive Updates #634
Conversation
Introduces seekable writer abstraction for random-access file operations. Uint8ArraySeekableWriter provides in-memory implementation for testing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add initialized check to writeUint8Array - Cache writable stream instead of creating new one each access - Validate negative offset in readAt Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive tests covering compression, encryption, dates, attributes, ZIP64, duplicate names, remove, data descriptors, and extended timestamps. Establishes safety net before extracting shared logic for openExisting(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Notes code duplication that should be extracted into shared helper. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Allows updating entry metadata (dates, attributes) without rewriting the compressed data. Comment updates only work if new length <= old. The implementation handles duplicate extended timestamp extra fields by removing the old timestamp from rawExtraField when updating. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Moves valid entries to fill gaps left by removed entries, then truncates the file. Supports progress callback and abort signal. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Allows checking how much dead space would be reclaimed without actually modifying the archive. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ypes getEntryTotalSize() now sums all extra field components (ZIP64, AES, extended timestamp, NTFS, Unix) instead of just rawExtraField. This prevents size miscalculation during compact() for entries with complex extra fields. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds types for SeekableWriter, Uint8ArraySeekableWriter, openExisting(), updateEntry(), and compact(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests ZIP64, encrypted archives, empty archives, compact edge cases, concurrent updates, and large archive handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds usage examples and API documentation for openExisting(), updateEntry(), compact(), and SeekableWriter. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enables in-place ZIP modification on real files using Node.js fs.promises file handles. Also fixes SeekableWriter implementations to support creating a new writable stream after the previous one is closed. This is required for the incremental update API (openExisting) to work correctly when adding multiple files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…Access API Enables in-place ZIP modification in Chrome/Edge using showSaveFilePicker(). Provides seek, truncate, and readAt support via FileSystemWritableFileStream. - New FileSystemAccessSeekableWriter class in lib/core/io.js - Export from lib/zip-core-base.js - TypeScript definitions with usage examples - Documentation with browser-specific examples Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add boundary validation in readAt() to clamp length when offset+length > size - Auto-detect file size in init() when fileHandle is provided - Remove inline comments for consistency with codebase style - Update TypeScript definitions to document auto-detection behavior - Simplify documentation example for opening existing files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…prependZip When importing entries from existing archives via openExisting() or prependZip(), the CRC32 (signature) field was not being written to the header array. This caused the central directory to have zeroed CRC32 values for existing entries, corrupting the archive. Root cause: getHeaderArrayData() creates the header but doesn't set CRC32. For new entries, setEntryInfo() writes the signature later. For existing entries, this never happened. Fix: After getHeaderArrayData(), write entry.signature to headerView at HEADER_OFFSET_SIGNATURE for both prependZip() and openExisting(). Fixes #1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When calculating the append offset in openExisting(), the code was using the central directory's extra field length. However, the local header can have a different extra field length (common with archives from external tools like Info-ZIP). This caused openExisting() to calculate the wrong append offset, resulting in new entries overlapping existing data and corrupting the archive. Fix: Read the local header extra field length directly from offset 28 of each entry's local header, rather than using the central directory's extra field length. Fixes #2 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Node.js fs operations require BigInt for file positions exceeding 2^31-1 (2147483647 bytes, ~2GB). Without this conversion, operations on large files crash with assertion errors. Changes: - Add toBigIntIfNeeded() helper to convert large offsets to BigInt - Apply to readAt(), writeUint8Array(), and truncate() methods - Handle BigInt stats.size in init() for very large existing files - Add test for BigInt conversion logic and normal operations Fixes #3 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
dd45a15 to
3dec97a
Compare
Testing revealed that Node.js v25+ handles Number positions correctly for >2GB file offsets. The BigInt conversion introduced in 3dec97a actually breaks writes on fresh files - the file position is ignored and writes go to position 0. This reverts the toBigIntIfNeeded() function and its usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…y overflow Previously, openExisting() loaded the entire archive into memory using readAt(0, size), which failed for archives >2GB due to Node.js buffer limits. Changes: - Add SeekableWriterReader adapter class that wraps SeekableWriter to provide Reader interface for ZipReader - openExisting() now reads only metadata structures (EOCD, central directory, local headers) instead of entire archive - FileHandleWriter uses options object form for read/write operations - Remove auto-update of _size in writeUint8Array to avoid double-counting when pipeTo is used with bufferedWrite Tested with 4.5GB archive - now works correctly. Fixes #3 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Well, functionally, this works... Or at least creates archives that work perfectly with zip/unzip on Linux, even with gaps in them. I'll post another update once I get this integrated with my app. I'll be using this to use zip as a file container for a video editing application I'm building. I wanted to be able to edit all the small assets without it needing to completely rewrite potentially GBs of video on every save, so this was my solution. Another cool application for a feature like this might be a backup app with incremental backups. |
|
Thanks for the PR. I've skimmed through it and can confirm that the code was generated using AI and deserves to be studied closely. At first glance, here are a few more specific comments. The Finally, I have already thought about this kind of feature and imagined implementing a class of type I need to think about this slowly. For now, I'll keep your PR open so I can remember it, and think about this (probably slowly). PS: Feel free to use my answer as a prompt to see if Claude agrees 😜 |
|
Here's the details on the TODO from the report: Technical DebtTD-1: Code Duplication in openExisting/prependZipLocation: Issue: ~66 lines of entry import logic duplicated between Resolution: Extract shared helper function Priority: Low - working correctly, refactor when convenient. TODO added: Commit |
|
Here's Claude's response (I just got it to read the comment thread): Code Review Updates PlanPR: #634 (gildas-lormeau/zip.js) Summary of FeedbackGildas-lormeau provided the following feedback:
Proposed Changes1. Rename SeekableWriter → RandomAccessStreamRationale: The current name
New Name:
Files to modify:
2. Extract Shared Entry Import LogicIssue: ~66 lines duplicated between Solution: Create helper function /**
* Import entries from a ZipReader into this ZipWriter's internal state.
* Used by both prependZip() and openExisting().
*
* @param {ZipWriter} zipWriter - Target ZipWriter instance
* @param {Entry[]} entries - Entries from ZipReader
* @param {number} baseOffset - Offset for zip64 calculations
*/
function importEntriesFromReader(zipWriter, entries, baseOffset) {
zipWriter.filenames = new Set(entries.map(entry => entry.filename));
zipWriter.files = new Map(entries.map(entry => {
// ... shared entry import logic
return [entry.filename, entry];
}));
}Files to modify:
3. Move Node.js-Specific Writers to Separate ModuleIssue: Solution: Create New file structure: Files to create/modify:
4. Consider ZipEditor Architecture (Future)Gildas mentioned he imagined a Option A: Current Approach (Extend ZipWriter) // Static factory method on ZipWriter
const zipWriter = await ZipWriter.openExisting(writer);Option B: Separate ZipEditor Class (Gildas's Vision) // Dedicated class for editing existing archives
class ZipEditor {
constructor(randomAccessStream) { ... }
async open() { ... }
async add(name, reader, options) { ... }
async remove(entry) { ... }
async close() { ... }
}Recommendation: Keep current approach for now but structure code to allow easy migration:
5. Improve Tree Shaking SupportCurrent issue: All I/O classes bundled together Solution: Ensure each writer class can be imported independently: // Allow selective imports
import { ZipWriter } from "zip.js/core";
import { FileHandleRandomAccessStream } from "zip.js/node";
import { FileSystemAccessRandomAccessStream } from "zip.js/browser";Files to modify:
Implementation ChecklistPhase 1: Code Quality (Non-Breaking)
Phase 2: Naming (Minor Breaking)
Phase 3: Module Structure (Minor Breaking)
Phase 4: Documentation
Backward CompatibilityTo minimize breaking changes:
// Deprecated aliases for backward compatibility
export { RandomAccessStream as SeekableWriter };
export { FileHandleRandomAccessStream as FileHandleWriter };
class SeekableWriter extends RandomAccessStream {
constructor(...args) {
console.warn("SeekableWriter is deprecated, use RandomAccessStream");
super(...args);
}
}Priority
Questions for MaintainerBefore implementing major changes, clarify with @gildas-lormeau:
|
Hi, I needed this feature in an app I'm building, and I let Claude loose on this. I haven't gotten a chance to test this out yet (damn session limits 😅), but I wanted to submit this to see if you're interested in it.
This could be total garbage, but I'll add an update sometime over the next few days if it works with my app. I realize this is a huge branch, so if you just close it, I'll understand completely.
Cheers!