Skip to content

Conversation

@Rohanjames1997
Copy link

Fixes #369

This is my first attempt at fixing the L2 size reporting issue. Comments and feedback are welcome.

I tested on Arm Neoverse V1 and V2 EC2 instances. I reused the reproducer in #369

#include <cstddef>
#include <cstdio>
#include <cpuinfo.h>
size_t l2_bytes() {
  if (!cpuinfo_initialize()) return 0;
  const cpuinfo_processor* p = cpuinfo_get_current_processor();
  if (!p || !p->cache.l2) return 0;
  return p->cache.l2->size; // bytes
}
int main() { std::printf("%zu\n", l2_bytes()); }

This now returns the expected results on Arm Neoverse V1 and V2


Additionally, here is the output of ./cache-info.

  1. On Neoverse-V1:
Max cache size (upper bound): 4194304 bytes
L1 instruction cache: 64 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L1 data cache: 64 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L2 data cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors
  1. On Neoverse-V2:
Max cache size (upper bound): 4194304 bytes
L1 instruction cache: 64 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L1 data cache: 64 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L2 data cache: 64 x 2 MB (inclusive), 8-way set associative (4096 sets), 64 byte lines, shared by 1 processors

Copy link

@Radu2k Radu2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for addressing the issue! The only suggestion I would have is to add a check that level is matching used index or checking indexes and matching by level. This slipped through my fingers when I wrote the implementation in vllm and could lead to wrong cache values.

Here is tree -L 2 at /sys/devices/system/cpu/cpu0/cache and if you do a cat on level should see 2 at index2:

.
├── index0
│   ├── allocation_policy
│   ├── coherency_line_size
│   ├── level
│   ├── number_of_sets
│   ├── shared_cpu_list
│   ├── shared_cpu_map
│   ├── size
│   ├── type
│   ├── uevent
│   ├── ways_of_associativity
│   └── write_policy
├── index1
│   ├── allocation_policy
│   ├── coherency_line_size
│   ├── level
│   ├── number_of_sets
│   ├── shared_cpu_list
│   ├── shared_cpu_map
│   ├── size
│   ├── type
│   ├── uevent
│   ├── ways_of_associativity
│   └── write_policy
├── index2
│   ├── allocation_policy
│   ├── coherency_line_size
│   ├── level
│   ├── number_of_sets
│   ├── shared_cpu_list
│   ├── shared_cpu_map
│   ├── size
│   ├── type
│   ├── uevent
│   ├── ways_of_associativity
│   └── write_policy
├── index3
│   ├── allocation_policy
│   ├── coherency_line_size
│   ├── level
│   ├── number_of_sets
│   ├── shared_cpu_list
│   ├── shared_cpu_map
│   ├── size
│   ├── type
│   ├── uevent
│   ├── ways_of_associativity
│   └── write_policy
└── uevent

@Rohanjames1997
Copy link
Author

Thanks @Radu2k! That's a good catch.

I can error out with 0 and use the previous hardcoded values if level of index2 does not show 2. Will push that commit now.


checking indexes and matching by level

I tried this as well, although it can get a little involved, with a few more potential file opens. If you and the other reviewers deem it necessary, I can push that fix next.

cc: @malfet

@Rohanjames1997 Rohanjames1997 requested a review from Radu2k February 5, 2026 17:59
Copy link

@Radu2k Radu2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, LGTM 👍

@Rohanjames1997
Copy link
Author

Will fix the linter soon. Anything else of note? @malfet

@malfet
Copy link
Contributor

malfet commented Feb 9, 2026

@Rohanjames1997 I wonder why reading from sysfs is an arm specific thing rather than a generic Linux one?
And if those entries are not available, what would it return now (0 it seems), compared to previous behavior

I know that many sysfs entries are not available from AWS lambdas for security reasons

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please:

  • Fix lint
  • Use existing APIs/abstractions
  • Add some links to the docs explaining why sysfs should be used, which kernel versions added support for this entry, and whether it's arch specific or generic


/* Verify the index actually corresponds to the requested cache level */
snprintf(path, sizeof(path), "/sys/devices/system/cpu/cpu%u/cache/index%u/level", cpu_id, cache_level);
FILE* file = fopen(path, "r");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why cpuinfo_linux_parse_small_file should not be used there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had no idea about that function 🙂

@Rohanjames1997
Copy link
Author

Rohanjames1997 commented Feb 9, 2026

@malfet Thanks much for the review! As you can tell this is my first contrib to the repo.

It looks like x86 CPUs have a CPUID instruction that directly returns cache properties - src/x86/cache/deterministic.c:14-95. ARM has no equivalent, so it used hardcoded values before.
If sysfs is unavailable, the function returns 0 and the code uses existing hardcoded values - same behavior as before.

Does that sound reasonable?

I will fix lint, use existing APIs & update docs next.

Comment on lines +39 to +47
uint32_t value = 0;
const char* p = text_start;
while (p < text_end && *p >= '0' && *p <= '9') {
value = value * 10 + (*p - '0');
p++;
}
if (p == text_start || value == 0) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A) this code looks identical to code on lines 25-29, why not move it to a shared function
B) Isn't this function already exists in standard library and called strtoull?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've extracted it into a helper function and used strtoul

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looking at it more closely, I see that a similar parsing logic is present in processors.c

One probable reason could be that cpuinfo_linux_parse_small_file passes a non-null-terminated buffer (text_start to text_end), while strtoul requires a null-terminated string and would read past the buffer.

What do you think @malfet ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there is a document somewhere explaining how to properly query cache sizes? I.e. couldn't one use cpuid to uniquely identify whether l2 cache is shared or not?

I.e. shouldn't want be able to detect cache configuration during

switch (midr & (CPUINFO_ARM_MIDR_IMPLEMENTER_MASK | CPUINFO_ARM_MIDR_PART_MASK)) {

or
somewhere inside chipset.c ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cpuinfo reports incorrect L2 cache size on ARM (Neoverse V1/V2)

3 participants