Skip to content

Fix #946: Add standard iterator traits to CharPointer_UTF8 for compat…#1562

Open
killerdevildog wants to merge 1 commit intojuce-framework:masterfrom
killerdevildog:fix-issue-946
Open

Fix #946: Add standard iterator traits to CharPointer_UTF8 for compat…#1562
killerdevildog wants to merge 1 commit intojuce-framework:masterfrom
killerdevildog:fix-issue-946

Conversation

@killerdevildog
Copy link

Fixes #946: Add standard iterator traits to CharPointer_UTF8 for compatibility with std algorithms

  • Added value_type, pointer, reference, iterator_category, and difference_type typedefs
  • Enables CharPointer_UTF8 to work with standard library algorithms like std::all_of
  • Uses random_access_iterator_tag as the most appropriate category for UTF-8 character iteration (since operations like + and - are supported, even if O(n) due to decoding)
  • Maintains full backward compatibility with existing code
  • Tested with both GCC and Clang compilers, including the repro from the issue and edge cases like empty strings and multi-byte UTF-8 characters

Thank you for submitting a pull request.

Please make sure you have read and followed our contribution guidelines (.github/contributing.md in this repository). Your pull request will not be accepted if you have not followed the instructions.

As a side note, I think that the policy described in CONTRIBUTING.md—where PRs are not merged directly into public branches but reproduced in the private repository—could potentially undervalue contributors by not preserving their commit history, authorship, or direct recognition in the public repo. While I understand the need for controlled integration in a hybrid open-source/commercial project, this approach might feel a bit unethical or discouraging to community members who invest time in fixes, as it limits visibility of their contributions (e.g., no GitHub credit for merges). Perhaps considering ways to attribute reproduced changes publicly, like mentioning contributors in changelogs or commit messages, could help maintain motivation and fairness. I'd be interested in the team's thoughts on this if possible!

…TF8 for compatibility with std algorithms

- Added value_type, pointer, reference, iterator_category, and difference_type typedefs
- Enables CharPointer_UTF8 to work with standard library algorithms like std::all_of
- Uses input_iterator_tag as the most appropriate category for UTF-8 character iteration
- Maintains full backward compatibility with existing code
- Tested with both GCC and Clang compilers
@jrlanglois
Copy link

jrlanglois commented Dec 29, 2025

This adds iterator traits to CharPointer_UTF8 only. If a user of JUCE were to change JUCE_STRING_UTF_TYPE to 16 or 32, these changes as-is would break the build for them (if they were to try and use STL algos). Also, iterator semantics will differ: UTF-16 is variable-width for some characters, and UTF-32 is fixed-width.

Another practical concern is the use of random_access_iterator_tag. UTF-8 is variable-width, so advancing or indexing doesn’t correspond 1:1 outside pure ASCII. For example, a single visible character like ë or an emoji occupies multiple bytes, so it + n or indexing doesn’t correspond to constant-time character access in the usual STL sense.

Anyway, this is why getAndAdvance exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some custom iterators are incompatible with standard algorithms

2 participants