Fix slurp_raw for files larger than 2GB#317
Fix slurp_raw for files larger than 2GB#317igor-raits wants to merge 1 commit intodagolden:masterfrom
Conversation
|
Thank you. I'll take a look but it may not be until next week. |
|
For the second hunk I’m somewhat uncomfortable with sometimes switching to if ( defined($binmode) and $binmode eq ":unix" ) {
# Use syswrite in a loop to handle write() syscall size limit (~2GB)
for my $data ( map { ref eq 'ARRAY' ? @$_ : $_ } @data ) {
my $total_left = length $data;
my $total_written = 0;
my $rc = 0;
while ( $total_left and ( $rc = syswrite $fh, $data, $total_left, $total_written ) ) {
$total_left -= $rc;
$total_written += $rc;
}
$self->_throw('syswrite', $temp->[PATH]) unless defined $rc;
}
} |
48a9d7e to
161da60
Compare
|
@ap thanks for the review & feedback. I have incorporated it and force-pushed. |
|
Thanks. More thoughts, this time about the if ( defined($binmode) and $binmode eq ":unix"
and my $total_left = -s $fh )
{
# Read in a loop to handle read() syscall size limit (~2GB)
my $buf = "";
my $total_read = 0;
my $rc = 0;
while ( $rc = read $fh, $buf, $total_left, $total_read ) {
$total_read += $rc;
# Ensure we will keep read()ing until we get 0 or undef
# even if someone else changed the file length from under us
$total_left = ( -s $fh ) - $total_read;
$total_left = 1 if $total_left < 1;
}
$self->_throw('read') unless defined $rc;
return $buf;
} |
The read() and write() system calls on Linux have a maximum single operation limit of approximately SSIZE_MAX (~2.1GB). When using the :unix PerlIO layer (which bypasses buffering), this limit caused silent data truncation for large files. Affected methods: - slurp_raw / slurp with binmode => ":unix" - spew_raw / spew_utf8 (when Unicode::UTF8 is available) For example, reading or writing a 3GB file would silently truncate to ~2.1GB. Fix by using loops that continue reading/writing until all data is processed: - slurp: loop with 4-argument read() to append at correct offset - spew: loop with 4-argument syswrite() over each data element, avoiding unnecessary data copying The buffered PerlIO path (regular slurp/spew without :unix) was not affected as PerlIO handles chunking internally.
161da60 to
f08d5d6
Compare
|
@ap thanks again, adjusted! |
|
The read part looks like more or less the same as what I'm doing in File::Slurper. I never got around to implementing the fast writer path because it is a PITA. |
|
FYI: I am not ignoring this, but given some of the complexity, I'm going to move slowly on this and haven't had a chance to do a close read. Thank you, everyone, for your feedback so far. |
The read() system call on Linux has a maximum single-read limit of
approximately SSIZE_MAX (~2.1GB). For files larger than this limit,
a single read() call returns fewer bytes than requested.
The previous implementation assumed a single read() would return all
requested bytes, causing silent data truncation for large files.
For example, a 3GB file would only return ~2.1GB of data.
Fix by looping until all bytes are read or EOF is reached, using the
4-argument form of read() to append at the correct buffer offset.
Fixes: #316