Skip to content

Add L1 block cache and direct-mapped BHT#711

Merged
jserv merged 1 commit intomasterfrom
block-cache
Jan 30, 2026
Merged

Add L1 block cache and direct-mapped BHT#711
jserv merged 1 commit intomasterfrom
block-cache

Conversation

@jserv
Copy link
Contributor

@jserv jserv commented Jan 30, 2026

This introduces two lookup optimizations:

  1. L1 Direct-Mapped Block Cache (256 entries, interpreter only):
  • Separated tag/pointer arrays: tag array (1KB) checked first, pointer loaded only on hit for cache efficiency
  • Invalid tag sentinel (0xFFFFFFFF) for clean miss detection
  • EXT_C-aware index shift (1 vs 2 bits) to reduce conflict misses
  • New block_lookup_or_find() as primary lookup with hash fallback
  1. Direct-Mapped Branch History Table (both JIT and non-JIT):
  • O(1) lookup replacing O(n) linear search: index = (PC >> 2) & mask
  • Remove idx field (non-JIT) and bht_find_min_idx() - no longer needed
  • Update bht_find_max_idx() to scan all entries since zeros can appear at any index with direct-mapped scheme
  • Add static assert for power-of-2 HISTORY_SIZE

Benchmarks on Intel Xeon E5-2650:

  • CoreMark: 804.66 -> ~887 iterations/sec (+10.2%)
  • Dhrystone: 1366 -> ~1382 DMIPS (+1.2%)

Summary by cubic

Adds an L1 direct-mapped block cache (interpreter only) and a direct-mapped branch history table to speed up hot-path lookups. On Xeon E5-2650 this yields ~+10% CoreMark and ~+1% Dhrystone.

  • New Features
    • L1 block cache: 256 entries, split tag/pointer arrays, invalid tag sentinel, EXT_C-aware index shift; new block_lookup_or_find() with hash fallback; proper init/clear.
    • Direct-mapped BHT (JIT and non-JIT): O(1) index = (PC >> 2) & mask; remove idx and bht_find_min_idx(); update bht_find_max_idx() to scan all entries; require power-of-2 HISTORY_SIZE; update templates to use direct lookup and replacement, with SATP checks for JIT.

Written for commit f89a114. Summary will update on new commits.

@jserv jserv added this to the release-2026.1 milestone Jan 30, 2026
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/riscv_private.h">

<violation number="1" location="src/riscv_private.h:251">
P3: The block_l1 cache comment lists incorrect sizes (128B/256B). With BLOCK_L1_SIZE=256, the tag array is 1KB and the pointer array is 2KB, so the comment should match the actual layout to avoid confusion during tuning or future changes.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Contributor Author

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks

Details
Benchmark suite Current: f89a114 Previous: a37c812 Ratio
Dhrystone 1656.333 DMIPS 1584 DMIPS 0.96
CoreMark 953.17 iterations/sec 939.552 iterations/sec 0.99

This comment was automatically generated by workflow using github-action-benchmark.

This introduces two lookup optimizations:
1. L1 Direct-Mapped Block Cache (256 entries, interpreter only):
  - Separated tag/pointer arrays: tag array (1KB) checked first, pointer
    loaded only on hit for cache efficiency
  - Invalid tag sentinel (0xFFFFFFFF) for clean miss detection
  - EXT_C-aware index shift (1 vs 2 bits) to reduce conflict misses
  - New block_lookup_or_find() as primary lookup with hash fallback
2. Direct-Mapped Branch History Table (both JIT and non-JIT):
  - O(1) lookup replacing O(n) linear search: index = (PC >> 2) & mask
  - Remove idx field (non-JIT) and bht_find_min_idx() - no longer needed
  - Update bht_find_max_idx() to scan all entries since zeros can appear
    at any index with direct-mapped scheme
  - Add static assert for power-of-2 HISTORY_SIZE

Benchmarks on Intel Xeon E5-2650:
- CoreMark: 804.66 -> ~887 iterations/sec (+10.2%)
- Dhrystone: 1366 -> ~1382 DMIPS (+1.2%)
@jserv jserv merged commit 4830758 into master Jan 30, 2026
32 checks passed
@jserv jserv deleted the block-cache branch January 30, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant