Skip to content

Conversation

@Nekrolm
Copy link
Contributor

@Nekrolm Nekrolm commented Feb 3, 2026

Was looking into #10662

While most time spent in getdents & lstat syscalls, allocations are still taking around 3-5% of the time

perf

@sylvestre
Copy link
Contributor

could you please share the hyperfine benchmark results ? (without the patch, with and with gnu)
thanks

@Nekrolm
Copy link
Contributor Author

Nekrolm commented Feb 3, 2026

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ sudo hyperfine --warmup 3 -N -i  "target/release/coreutils ls -R /proc/ > /dev/null" "./coreutils_main ls -R /proc/ > /dev/null" "ls -R /proc/ > /dev/nul
l" 
Benchmark 1: target/release/coreutils ls -R /proc/ > /dev/null
  Time (mean ± σ):     362.8 ms ±  15.2 ms    [User: 138.4 ms, System: 217.8 ms]
  Range (min … max):   348.1 ms … 399.4 ms    10 runs
 
  Warning: Ignoring non-zero exit code.
 
Benchmark 2: ./coreutils_main ls -R /proc/ > /dev/null
  Time (mean ± σ):     382.8 ms ±  24.1 ms    [User: 150.1 ms, System: 226.6 ms]
  Range (min … max):   356.4 ms … 419.2 ms    10 runs
 
  Warning: Ignoring non-zero exit code.
 
Benchmark 3: ls -R /proc/ > /dev/null
  Time (mean ± σ):     297.0 ms ±  26.5 ms    [User: 110.6 ms, System: 184.5 ms]
  Range (min … max):   255.7 ms … 351.4 ms    10 runs
 
  Warning: Ignoring non-zero exit code.
 
Summary
  ls -R /proc/ > /dev/null ran
    1.22 ± 0.12 times faster than target/release/coreutils ls -R /proc/ > /dev/null
    1.29 ± 0.14 times faster than ./coreutils_main ls -R /proc/ > /dev/null
dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ ls --version
ls (GNU coreutils) 9.4
Copyright (C) 2023 Free Software Foundation, Inc.

I don't expect much from these changes.
It has more potential to do better allocation-wise, but with more significant changes (e.g. refactor output width computation to not allocate by formating each visited inode).

But maybe optimising calls to getdents (tweak the buffer size?) is a more promising way.

@Nekrolm
Copy link
Contributor Author

Nekrolm commented Feb 3, 2026

getdents buffer is not a case, actually:

strace -o gnu.trace  ls -R /proc/ > /dev/null 
strace -o coreutils.trace ./coreutils_main ls -R /proc/ > /dev/null 

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ wc -l gnu.trace 
90005 gnu.trace
dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ wc -l coreutils.trace 
104756 coreutils.trace

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ grep getdent coreutils.trace | head -1
getdents64(3, 0x575ac6349440 /* 352 entries */, 32768) = 9408
dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ grep getdent gnu.trace | head -1
getdents64(3, 0x57444ada0fd0 /* 352 entries */, 32768) = 9408

gnu version just makes 60% less calls for fstat (newfstatat)

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ grep fstat gnu.trace | wc -l
13279
dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ grep fstat coreutils.trace | wc -l
39813

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ grep fstatat coreutils.trace | wc -l
26539
dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ grep fstatat gnu.trace | wc -l
0

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ grep getdents gnu.trace | wc -l
26564
dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ grep getdents coreutils.trace | wc -l
26564

fstatat calls -- around 10% on the flamegraph I posted above

}
}

impl ExtendPad for String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comments explaining why you are doing this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments are already there for this trait -- updated it

@Nekrolm Nekrolm force-pushed the allocate-less-strings-in-ls branch from 4e2410b to 6243fdd Compare February 4, 2026 16:25
@github-actions
Copy link

github-actions bot commented Feb 4, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

please run rustfmt

@Nekrolm Nekrolm force-pushed the allocate-less-strings-in-ls branch from 6243fdd to 2935836 Compare February 4, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants