-
Notifications
You must be signed in to change notification settings - Fork 387
Report correct L2 cache size on ARM (Neoverse V1/V2) #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Radu2k
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for addressing the issue! The only suggestion I would have is to add a check that level is matching used index or checking indexes and matching by level. This slipped through my fingers when I wrote the implementation in vllm and could lead to wrong cache values.
Here is tree -L 2 at /sys/devices/system/cpu/cpu0/cache and if you do a cat on level should see 2 at index2:
.
├── index0
│ ├── allocation_policy
│ ├── coherency_line_size
│ ├── level
│ ├── number_of_sets
│ ├── shared_cpu_list
│ ├── shared_cpu_map
│ ├── size
│ ├── type
│ ├── uevent
│ ├── ways_of_associativity
│ └── write_policy
├── index1
│ ├── allocation_policy
│ ├── coherency_line_size
│ ├── level
│ ├── number_of_sets
│ ├── shared_cpu_list
│ ├── shared_cpu_map
│ ├── size
│ ├── type
│ ├── uevent
│ ├── ways_of_associativity
│ └── write_policy
├── index2
│ ├── allocation_policy
│ ├── coherency_line_size
│ ├── level
│ ├── number_of_sets
│ ├── shared_cpu_list
│ ├── shared_cpu_map
│ ├── size
│ ├── type
│ ├── uevent
│ ├── ways_of_associativity
│ └── write_policy
├── index3
│ ├── allocation_policy
│ ├── coherency_line_size
│ ├── level
│ ├── number_of_sets
│ ├── shared_cpu_list
│ ├── shared_cpu_map
│ ├── size
│ ├── type
│ ├── uevent
│ ├── ways_of_associativity
│ └── write_policy
└── uevent
|
Thanks @Radu2k! That's a good catch. I can error out with
I tried this as well, although it can get a little involved, with a few more potential file opens. If you and the other reviewers deem it necessary, I can push that fix next. cc: @malfet |
Radu2k
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, LGTM 👍
|
Will fix the linter soon. Anything else of note? @malfet |
|
@Rohanjames1997 I wonder why reading from sysfs is an arm specific thing rather than a generic Linux one? I know that many sysfs entries are not available from AWS lambdas for security reasons |
malfet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please:
- Fix lint
- Use existing APIs/abstractions
- Add some links to the docs explaining why sysfs should be used, which kernel versions added support for this entry, and whether it's arch specific or generic
src/arm/linux/init.c
Outdated
|
|
||
| /* Verify the index actually corresponds to the requested cache level */ | ||
| snprintf(path, sizeof(path), "/sys/devices/system/cpu/cpu%u/cache/index%u/level", cpu_id, cache_level); | ||
| FILE* file = fopen(path, "r"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why cpuinfo_linux_parse_small_file should not be used there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I had no idea about that function 🙂
|
@malfet Thanks much for the review! As you can tell this is my first contrib to the repo. It looks like x86 CPUs have a CPUID instruction that directly returns cache properties - src/x86/cache/deterministic.c:14-95. ARM has no equivalent, so it used hardcoded values before. Does that sound reasonable? I will fix lint, use existing APIs & update docs next. |
| uint32_t value = 0; | ||
| const char* p = text_start; | ||
| while (p < text_end && *p >= '0' && *p <= '9') { | ||
| value = value * 10 + (*p - '0'); | ||
| p++; | ||
| } | ||
| if (p == text_start || value == 0) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A) this code looks identical to code on lines 25-29, why not move it to a shared function
B) Isn't this function already exists in standard library and called strtoull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I've extracted it into a helper function and used strtoul
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looking at it more closely, I see that a similar parsing logic is present in processors.c
One probable reason could be that cpuinfo_linux_parse_small_file passes a non-null-terminated buffer (text_start to text_end), while strtoul requires a null-terminated string and would read past the buffer.
What do you think @malfet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there is a document somewhere explaining how to properly query cache sizes? I.e. couldn't one use cpuid to uniquely identify whether l2 cache is shared or not?
I.e. shouldn't want be able to detect cache configuration during
cpuinfo/src/arm/linux/aarch64-isa.c
Line 103 in 84818a4
| switch (midr & (CPUINFO_ARM_MIDR_IMPLEMENTER_MASK | CPUINFO_ARM_MIDR_PART_MASK)) { |
or
somewhere inside chipset.c ?
Fixes #369
This is my first attempt at fixing the L2 size reporting issue. Comments and feedback are welcome.
I tested on Arm Neoverse V1 and V2 EC2 instances. I reused the reproducer in #369
This now returns the expected results on Arm Neoverse V1 and V2
Additionally, here is the output of
./cache-info.