Fixed and Improved Gitlab Project Metadata in-memory cache #4727

kashifkhan0771 · 2026-02-03T12:10:12Z

Description:

Earlier, we used a map to temporarily store GitLab project metadata. While maps work well for small datasets, they don’t scale efficiently for larger ones. There was also a bug in the caching logic: when storing entries, we used the GitLab HTTPURLToRepo field as the cache key, but when retrieving entries, we used the normalized URL. As a result, cache lookups almost never succeeded, and the cache kept growing without being effectively used.

With this fix, we’ve replaced the map with an LRU cache, which is better suited for this use case. The cache now stores up to 15,000 entries for one hour, after which the LRU mechanism automatically evicts old items, keeping memory usage under control. We also consistently use the normalized URL for both setting and fetching cache entries.

I also added some comments to improve the readability :)

Checklist:

Tests passing (make test-community)?
Lint passing (make lint this requires golangci-lint)?

kashifkhan0771 · 2026-02-03T12:11:56Z

pkg/sources/gitlab/gitlab.go

@@ -1022,77 +1028,6 @@ func (s *Source) WithScanOptions(scanOptions *git.ScanOptions) {
 	s.scanOptions = scanOptions
 }



These funcs are moved at the end of the file.

Earlier, we used a map to temporarily store GitLab project metadata. While maps work well for small datasets, they don’t scale efficiently for larger ones. There was also a bug in the caching logic: when storing entries, we used the GitLab HTTPURLToRepo field as the cache key, but when retrieving entries, we used the normalized URL. As a result, cache lookups almost never succeeded, and the cache kept growing without being effectively used. With this fix, we’ve replaced the map with an LRU cache, which is better suited for this use case. The cache now stores up to 15,000 entries for one hour, after which the LRU mechanism automatically evicts old items, keeping memory usage under control. We also consistently use the normalized URL for both setting and fetching cache entries.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

pkg/sources/gitlab/gitlab.go

mustansir14 · 2026-02-04T04:55:59Z

pkg/sources/gitlab/project_cache.go

+	return &projectMetadataCache{
+		cache: expirable.NewLRU[string, *project](
+			15000, // upto 15000 entries
+			nil,
+			60*time.Minute, // time-based expiration - 1 hour
+		),


I'm interested to know about the thought process that went into deciding these numbers. Is that based on our past experience about the rate at which we scan gitlab projects?

Can there be a possibility of an entry getting expired before we might want to use it?

There isn’t any deep science behind these numbers. They’re rough, initial choices. The 15K limit is something most organizations won’t hit at all. For organizations with more than 15K projects, I think we should be able to process roughly 15K repositories per hour. In practice, it’s very unlikely that a repository would be enumerated and not scanned within an hour. We need to start with baseline numbers, and if we run into issues, we can always adjust them based on observed behavior.

If someone has a strong alternative (though I don’t think we do) for how many repositories we should process per hour, we can start with that number instead.

Using expirable LRU cache so that entries are automatically cleaned up after their TTL via lazy deletion and a background cleanup routine. Additionally, if the cache reaches the 15K entry limit, it will evict the least recently used entries by design so the new inserts are never blocked

I see. Thanks for the explanation 👍

mustansir14

I have some questions

kashifkhan0771 requested a review from a team February 3, 2026 12:10

kashifkhan0771 requested a review from a team as a code owner February 3, 2026 12:10

kashifkhan0771 commented Feb 3, 2026

View reviewed changes

kashifkhan0771 force-pushed the fix/ins-291 branch from e6ea089 to cfd505c Compare February 3, 2026 12:55

cursor bot reviewed Feb 3, 2026

View reviewed changes

pkg/sources/gitlab/gitlab.go Show resolved Hide resolved

kashifkhan0771 added 2 commits February 3, 2026 18:28

added log for cache hit in source metadata func

6394f7a

normalize repo url inside ChunkUnit

582adbf

kashifkhan0771 force-pushed the fix/ins-291 branch from c7d7e66 to 582adbf Compare February 3, 2026 13:37

kashifkhan0771 added 2 commits February 3, 2026 18:37

Merge branch 'main' into fix/ins-291

077baaa

Merge branch 'main' into fix/ins-291

184668f

mustansir14 reviewed Feb 4, 2026

View reviewed changes

mustansir14 approved these changes Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed and Improved Gitlab Project Metadata in-memory cache #4727

Fixed and Improved Gitlab Project Metadata in-memory cache #4727

kashifkhan0771 commented Feb 3, 2026 •

edited

Loading

Uh oh!

kashifkhan0771 Feb 3, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

mustansir14 Feb 4, 2026

Uh oh!

kashifkhan0771 Feb 4, 2026

Uh oh!

mustansir14 Feb 4, 2026

Uh oh!

mustansir14 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1022,77 +1028,6 @@ func (s Source) WithScanOptions(scanOptions git.ScanOptions) {
		s.scanOptions = scanOptions
		}

Fixed and Improved Gitlab Project Metadata in-memory cache #4727

Are you sure you want to change the base?

Fixed and Improved Gitlab Project Metadata in-memory cache #4727

Conversation

kashifkhan0771 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Checklist:

Uh oh!

kashifkhan0771 Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mustansir14 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

kashifkhan0771 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

mustansir14 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

mustansir14 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kashifkhan0771 commented Feb 3, 2026 •

edited

Loading