-
Notifications
You must be signed in to change notification settings - Fork 98
feat(handler): add support for minix filesystems #1359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
hmm I noticed that I should probably add a few sanity checks to |
qkaiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First review pass over lunch, will get back with a more detailed review.
| yield self._read_zone_data(index) | ||
|
|
||
| def _read_directory(self, inode) -> Iterator[cstruct]: | ||
| for entry in chunked(self._read_file_data(inode), self.dir_entry_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have iterate_file in file_utils that could help you if you're doing what I think you're doing. Lots of things are already implemented in file_utils, so you probably don't need the more-itertools dependency either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed _read_file_data to stream chunks of data, but these chunks have block size and not dir_entry_size, so chunked() is still beneficial (making _read_file_data yield chunks of size dir_entry_size is a bit much overhead because of the indirect data zone structure).
I could replace the chunked with slices. That would remove the dependency, but would not really improve readability:
for dir_data in chunked(zone_data, self.dir_entry_size): for index in range(0, len(zone_data), self.dir_entry_size):
dir_data = zone_data[index : index + self.dir_entry_size]|
|
||
| def __init__(self): | ||
| super().__init__() | ||
| self.EXTRACTOR = MinixFSExtractor(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the attempt at generalization but I don't think this self passing is a good design. I'll try to come up with something better when I'm thru the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should define a _MinixFSExtractor class that do not override the __init__ function of Extractor. Then you create 3 extractors:
MinixFSv1Extractor(_MinixFSExtractor)MinixFSv2Extractor(_MinixFSExtractor)MinixFSv3Extractor(_MinixFSExtractor)
None of these extractors should rely on the handler they're attached to. No handlers argument or attribute here.
You should define a _MinixFSHandler(StructHandler) that does not override the __init__ function of StructHandler. This generic handler should define:
HEADER_STRUCTasminix_super_block
Then, create three different handlers:
MinixFSv1Handler(_MinixFSHandler)MinixFSv2Handler(_MinixFSHandler)MinixFSv3Handler(_MinixFSHandler)
Each of these handlers must override the C_DEFINITIONS entry with their own. To do so I would split the C_DEFINITIONS in 3 blocks:
- C_DEFINITIONS_V1 (only
minix_inode,minix_super_block,minix_dir_entry) - C_DEFINITIONS_V2 (only
minix2_inode,minix_super_block,minix_dir_entry) - C_DEFINITIONS_V3 (only
minix_inode,minix3_super_block,minix3_dir_entry)
In each of these C definitions, use the same names. Do not use minix2_inode vs minix_inode. Use generic names without version info. That's the reason we can define HEADER_STRUCT in the handler parent class.
With this approach you can get rid of DIR_STRUCT and INODE_STRUCT since they will be generic. Just use the generic name.
You can also drop the VERSION class attribute. The VERSION is currently used for two things:
- derive the
block_size, just check if your header has ans_blocksizeattribute. If it does, use it otherwise default toBLOCK_SIZE - derive the
inode_size, this can be done through the cstruct parser that expose its fields size (e.g.self._struct_parser.cparser_le.minix_inode.size
Do not define DOC in __init__, just define it in the class with a version that's not resolved at runtime.
Each handler must set its extractor explicitly like:
EXTRACTOR = MinixFSv3Extractor()
This should make the code easier to follow and reduce complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you come up with anything? I'm still a bit puzzled how to solve this (since both handler and extractor both need many of the fields including the C_DEFINITIONS). Is overriding Handler.extract an option? 😅
edit: it seems I did not refresh the page before commenting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to change everything like you suggested. I couldn't get completely rid of VERSION, though.
| zone_count = self._get_zone_count(header) | ||
| if zone_count != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since zone_count is in the header, you can write patterns that do not match on headers with a null zone count. This way you don't need to get it here or check the value since it won't be matched in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would I do that? There is the not (~) syntax in YARA but this does not seem to work. At least I get an error when using it in a pattern:
> raise InvalidHexString(str(e)) from e
E unblob.parser.InvalidHexString: No terminal matches '~' in the current parser context, at line 1 col 44
E
E ] (7f | 8f) 13 (00 | 01 | 02) 00 [2] 00 ~00 00 00
E ^
E Expected one of:
E * START_ANCHOR
E * SECONDNIBLE
E * JUMP
E * END_ANCHOR
E * LPAR
E * WILDCARD
E * RANGE_JUMP
E * RPAR
E * FIRSTNIBLE
E * LITERAL
E * ALTERNATIVE_SEPARATOR
E
E Previous tokens: Token('LITERAL', '00')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's something we may want to fix in unblob itself. Naive approach is to accept everything but 00:
( 01 | 02 | 03 | ... | ff)
You may have an easier time writing patterns as Regex rathern than HexString for that purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried writing a Regex instead, but it seems that it does not like null bytes:
@lru_cache
def build_hyperscan_database(handlers: Handlers) -> StreamDatabase:
patterns = []
for handler_class in handlers:
handler = handler_class()
for pattern in handler.PATTERNS:
try:
patterns.append(
Pattern(
pattern.as_regex(),
Flag.SOM_LEFTMOST,
Flag.DOTALL,
tag=handler,
)
)
except InvalidHexString as e:
logger.error(
"Invalid pattern",
handler=handler.NAME,
pattern=pattern,
error=str(e),
)
raise
> return StreamDatabase(*patterns)
^^^^^^^^^^^^^^^^^^^^^^^^^
E ValueError: Pattern expression contains NULL byteMaybe it is best to do sanity checking during chunk extraction and change it as soon as it is supported?
| endianness_fmt = "<" if self._handler.ENDIANNESS == Endian.LITTLE else ">" | ||
| return list(struct.unpack(f"{endianness_fmt}{count}{ptr_fmt}", data)) | ||
|
|
||
| def _read_file_data(self, inode: cstruct) -> bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the cstruct typing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some better way to do this? I removed it for the time being
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual type is Structure, see https://github.com/fox-it/dissect.cstruct/blob/main/dissect/cstruct/types/structure.py#L384
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to use Structure (but since that riddled the code with missing attribute warnings, I added more concrete subclasses as type hints (maybe that was overkill, though))
| entry_inode = self._read_inode(entry.inode) | ||
|
|
||
| if self._is_file(entry_inode): | ||
| fs.write_bytes(Path(entry_path), self._read_file_data(entry_inode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend using write_chunks and implement _read_file_data as a generator that yields chunks of bytes. This way we don't have to load large files in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it, so that _read_file_data streams the file data instead (and renamed it to _stream_file_data)
|
@jstucke your recent contributions are highly appreciated. Keep them coming ! :) |
Haha thank you for taking the time for a thorough review. I added some validity checks and tests with invalid headers. I also found the original tool from the minix project and was also able to create samples with block size > 1024 and even log_zone_size != 0 (which is not possible with the mkfs.minix included in linux), but sadly I am unable to mount the filesystems and therefore could not add any files. The validity checks seem to work, though. There is are some images on the minix download page which uses v3 with blocksize 4096 and it seems to be unpacked without problems. I will probably fix the remaining issues tomorrow. |
f17d888 to
57475ad
Compare
91dc5a2 to
7dc5667
Compare
* little endian test files were created with mkfs.minix * since there does not seem to be a tool to create big endian MINIX filesystems, those test files were created by byte swapping the little endian test files * known firmware samples with MINIX FS: * Trendnet IP cameras, e.g. TV-IP110W and TV-IP422WN (both MINIX FS v1 LE with 30 byte filenames) * Netgear switches, e.g. FSM7326P and GSM7248 (both MINIX FS v2 BE with 30 byte filenames)
|
@jstucke I'll do some cleanups on this branch sometimes this week, but not sure when. Hang on :) |
|
Cleaning up the code and doing cross-validation as we speak. Will push my changes when I'm done. |
Uh oh!
There was an error while loading. Please reload this page.