feat(handler): add support for minix filesystems #1359

jstucke · 2026-01-29T09:46:56Z

little endian test files were created with mkfs.minix
- since there does not seem to be a tool to create big endian MINIX filesystems, those test files were created by byte swapping the little endian test files
known firmware samples with MINIX FS:
- Trendnet IP cameras, e.g. TV-IP110W and TV-IP422WN (both MINIX FS v1 LE with 30 byte filenames)
- Netgear switches, e.g. FSM7326P and GSM7248 (both MINIX FS v2 BE with 30 byte filenames)
- I have not encountered MINIX v3 filesystems in firmware images, but adding it anyway was little overhead

jstucke · 2026-01-29T10:22:38Z

hmm I noticed that I should probably add a few sanity checks to calculate_chunk to avoid false positives

qkaiser

First review pass over lunch, will get back with a more detailed review.

qkaiser · 2026-01-29T11:24:35Z

python/unblob/handlers/filesystem/minixfs.py

+                yield self._read_zone_data(index)
+
+    def _read_directory(self, inode) -> Iterator[cstruct]:
+        for entry in chunked(self._read_file_data(inode), self.dir_entry_size):


We have iterate_file in file_utils that could help you if you're doing what I think you're doing. Lots of things are already implemented in file_utils, so you probably don't need the more-itertools dependency either.

I changed _read_file_data to stream chunks of data, but these chunks have block size and not dir_entry_size, so chunked() is still beneficial (making _read_file_data yield chunks of size dir_entry_size is a bit much overhead because of the indirect data zone structure).

I could replace the chunked with slices. That would remove the dependency, but would not really improve readability:

for dir_data in chunked(zone_data, self.dir_entry_size):

for index in range(0, len(zone_data), self.dir_entry_size): dir_data = zone_data[index : index + self.dir_entry_size]

qkaiser · 2026-01-29T11:25:28Z

python/unblob/handlers/filesystem/minixfs.py

+
+    def __init__(self):
+        super().__init__()
+        self.EXTRACTOR = MinixFSExtractor(self)


I understand the attempt at generalization but I don't think this self passing is a good design. I'll try to come up with something better when I'm thru the review.

You should define a _MinixFSExtractor class that do not override the __init__ function of Extractor. Then you create 3 extractors:

MinixFSv1Extractor(_MinixFSExtractor)

MinixFSv2Extractor(_MinixFSExtractor)

MinixFSv3Extractor(_MinixFSExtractor)

None of these extractors should rely on the handler they're attached to. No handlers argument or attribute here.

You should define a _MinixFSHandler(StructHandler) that does not override the __init__ function of StructHandler. This generic handler should define:

HEADER_STRUCT as minix_super_block

Then, create three different handlers:

MinixFSv1Handler(_MinixFSHandler)

MinixFSv2Handler(_MinixFSHandler)

MinixFSv3Handler(_MinixFSHandler)

Each of these handlers must override the C_DEFINITIONS entry with their own. To do so I would split the C_DEFINITIONS in 3 blocks:

C_DEFINITIONS_V1 (only minix_inode, minix_super_block, minix_dir_entry)

C_DEFINITIONS_V2 (only minix2_inode, minix_super_block, minix_dir_entry)

C_DEFINITIONS_V3 (only minix_inode, minix3_super_block, minix3_dir_entry)

In each of these C definitions, use the same names. Do not use minix2_inode vs minix_inode. Use generic names without version info. That's the reason we can define HEADER_STRUCT in the handler parent class.

With this approach you can get rid of DIR_STRUCT and INODE_STRUCT since they will be generic. Just use the generic name.

You can also drop the VERSION class attribute. The VERSION is currently used for two things:

derive the block_size, just check if your header has an s_blocksize attribute. If it does, use it otherwise default to BLOCK_SIZE

derive the inode_size, this can be done through the cstruct parser that expose its fields size (e.g. self._struct_parser.cparser_le.minix_inode.size

Do not define DOC in __init__, just define it in the class with a version that's not resolved at runtime.

Each handler must set its extractor explicitly like:

EXTRACTOR = MinixFSv3Extractor()

This should make the code easier to follow and reduce complexity.

Did you come up with anything? I'm still a bit puzzled how to solve this (since both handler and extractor both need many of the fields including the C_DEFINITIONS). Is overriding Handler.extract an option? 😅
edit: it seems I did not refresh the page before commenting

I tried to change everything like you suggested. I couldn't get completely rid of VERSION, though.

python/unblob/handlers/filesystem/minixfs.py

qkaiser · 2026-01-29T11:30:53Z

python/unblob/handlers/filesystem/minixfs.py

+        zone_count = self._get_zone_count(header)
+        if zone_count != 0:


Since zone_count is in the header, you can write patterns that do not match on headers with a null zone count. This way you don't need to get it here or check the value since it won't be matched in the first place.

How would I do that? There is the not (~) syntax in YARA but this does not seem to work. At least I get an error when using it in a pattern:

> raise InvalidHexString(str(e)) from e E unblob.parser.InvalidHexString: No terminal matches '~' in the current parser context, at line 1 col 44 E E ] (7f | 8f) 13 (00 | 01 | 02) 00 [2] 00 ~00 00 00 E ^ E Expected one of: E * START_ANCHOR E * SECONDNIBLE E * JUMP E * END_ANCHOR E * LPAR E * WILDCARD E * RANGE_JUMP E * RPAR E * FIRSTNIBLE E * LITERAL E * ALTERNATIVE_SEPARATOR E E Previous tokens: Token('LITERAL', '00')

That's something we may want to fix in unblob itself. Naive approach is to accept everything but 00:

( 01 | 02 | 03 | ... | ff)

You may have an easier time writing patterns as Regex rathern than HexString for that purpose.

I tried writing a Regex instead, but it seems that it does not like null bytes:

@lru_cache def build_hyperscan_database(handlers: Handlers) -> StreamDatabase: patterns = [] for handler_class in handlers: handler = handler_class() for pattern in handler.PATTERNS: try: patterns.append( Pattern( pattern.as_regex(), Flag.SOM_LEFTMOST, Flag.DOTALL, tag=handler, ) ) except InvalidHexString as e: logger.error( "Invalid pattern", handler=handler.NAME, pattern=pattern, error=str(e), ) raise > return StreamDatabase(*patterns) ^^^^^^^^^^^^^^^^^^^^^^^^^ E ValueError: Pattern expression contains NULL byte

Maybe it is best to do sanity checking during chunk extraction and change it as soon as it is supported?

python/unblob/handlers/filesystem/minixfs.py

qkaiser · 2026-01-29T12:03:26Z

python/unblob/handlers/filesystem/minixfs.py

+        endianness_fmt = "<" if self._handler.ENDIANNESS == Endian.LITTLE else ">"
+        return list(struct.unpack(f"{endianness_fmt}{count}{ptr_fmt}", data))
+
+    def _read_file_data(self, inode: cstruct) -> bytes:


Not sure about the cstruct typing here.

Is there some better way to do this? I removed it for the time being

Actual type is Structure, see https://github.com/fox-it/dissect.cstruct/blob/main/dissect/cstruct/types/structure.py#L384

I changed it to use Structure (but since that riddled the code with missing attribute warnings, I added more concrete subclasses as type hints (maybe that was overkill, though))

python/unblob/handlers/filesystem/minixfs.py

qkaiser · 2026-01-29T12:06:52Z

python/unblob/handlers/filesystem/minixfs.py

+            entry_inode = self._read_inode(entry.inode)
+
+            if self._is_file(entry_inode):
+                fs.write_bytes(Path(entry_path), self._read_file_data(entry_inode))


I'd recommend using write_chunks and implement _read_file_data as a generator that yields chunks of bytes. This way we don't have to load large files in memory.

I changed it, so that _read_file_data streams the file data instead (and renamed it to _stream_file_data)

qkaiser · 2026-01-29T12:08:15Z

@jstucke your recent contributions are highly appreciated. Keep them coming ! :)

python/unblob/handlers/filesystem/minixfs.py

jstucke · 2026-01-29T17:17:37Z

@jstucke your recent contributions are highly appreciated. Keep them coming ! :)

Haha thank you for taking the time for a thorough review. I added some validity checks and tests with invalid headers. I also found the original tool from the minix project and was also able to create samples with block size > 1024 and even log_zone_size != 0 (which is not possible with the mkfs.minix included in linux), but sadly I am unable to mount the filesystems and therefore could not add any files. The validity checks seem to work, though. There is are some images on the minix download page which uses v3 with blocksize 4096 and it seems to be unpacked without problems.

I will probably fix the remaining issues tomorrow.

python/unblob/handlers/filesystem/minixfs.py

* little endian test files were created with mkfs.minix * since there does not seem to be a tool to create big endian MINIX filesystems, those test files were created by byte swapping the little endian test files * known firmware samples with MINIX FS: * Trendnet IP cameras, e.g. TV-IP110W and TV-IP422WN (both MINIX FS v1 LE with 30 byte filenames) * Netgear switches, e.g. FSM7326P and GSM7248 (both MINIX FS v2 BE with 30 byte filenames)

qkaiser · 2026-02-02T08:45:08Z

@jstucke I'll do some cleanups on this branch sometimes this week, but not sure when. Hang on :)

qkaiser · 2026-02-12T17:38:05Z

Cleaning up the code and doing cross-validation as we speak. Will push my changes when I'm done.

qkaiser self-requested a review January 29, 2026 09:48

qkaiser added enhancement New feature or request format:filesystem python Pull requests that update Python code labels Jan 29, 2026

qkaiser reviewed Jan 29, 2026

View reviewed changes

python/unblob/handlers/filesystem/minixfs.py Outdated Show resolved Hide resolved

jstucke force-pushed the minixfs branch from 5502d1e to 152a053 Compare January 29, 2026 17:03

jstucke force-pushed the minixfs branch 2 times, most recently from f17d888 to 57475ad Compare January 30, 2026 09:58

qkaiser reviewed Jan 30, 2026

View reviewed changes

jstucke force-pushed the minixfs branch 3 times, most recently from 91dc5a2 to 7dc5667 Compare January 30, 2026 14:33

jstucke force-pushed the minixfs branch from 7dc5667 to 74bf52c Compare January 30, 2026 14:53

		zone_count = self._get_zone_count(header)
		if zone_count != 0:

feat(handler): add support for minix filesystems #1359

Are you sure you want to change the base?

feat(handler): add support for minix filesystems #1359

Uh oh!

Conversation

jstucke commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jstucke commented Jan 29, 2026

Uh oh!

qkaiser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qkaiser Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jstucke Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qkaiser commented Jan 29, 2026

Uh oh!

Uh oh!

jstucke commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qkaiser commented Feb 2, 2026

Uh oh!

qkaiser commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jstucke commented Jan 29, 2026 •

edited

Loading

qkaiser Jan 30, 2026 •

edited

Loading

jstucke Jan 30, 2026 •

edited

Loading