Skip to content

Conversation

@benITo47
Copy link
Contributor

@benITo47 benITo47 commented Feb 3, 2026

Description

This PR migrates us from tokenisers-cpp to PyTorch tokenisers bundled with executorch

Introduces a breaking change?

  • Yes
  • No - User faces no changes

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

This changes need to be tested manually. Try running all our apps that consume tokenizers and see whether the output is ok.
By "OK" I mean

  • no unwanted special tokens
  • response that is related to the question (if model start answering about aquariums after being asked about ketchup, then it's not ok)
  • Output has proper punctuation, no missing or added spaces.

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

@benITo47 benITo47 requested review from chmjkb and msluszniak and removed request for chmjkb February 3, 2026 22:04
@benITo47 benITo47 force-pushed the @bo/change_tokenizers branch from e896b8e to cc00c58 Compare February 3, 2026 22:07
@benITo47 benITo47 changed the title Migrate to PyTorch tokenisers Migrate to PyTorch tokenizers Feb 3, 2026
@msluszniak msluszniak added executorch Issues and tasks related specifically to ExecuTorch chore PRs that are chores labels Feb 4, 2026
Copy link
Collaborator

@chmjkb chmjkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix tests b4 merging 👍🏻

Copy link
Collaborator

@chmjkb chmjkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the Error.get() calls

Comment on lines +356 to +358
inline jsi::Value getJsiValue(bool val, jsi::Runtime &runtime) {
return jsi::Value(val);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be above the template and below

inline jsi::Value getJsiValue(int val, jsi::Runtime &runtime) {
  return {runtime, val};
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell the reason for that, aesthetic one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for msluszniak.
Could you elaborate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aesthetic one

if (result.ok()) {
return static_cast<uint64_t>(result.get()); // TODO: CHANGE RETURN TYPE
}
return -1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like above, maybe reject promise on js side

@benITo47 benITo47 force-pushed the @bo/change_tokenizers branch from 752f379 to 8aaf940 Compare February 4, 2026 14:15
Copy link
Collaborator

@chmjkb chmjkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good on my end, thank u 🙏🏻

@msluszniak
Copy link
Member

@chmjkb I guess you wanted to approve, not add another review ;)

@benITo47 benITo47 merged commit b4770df into main Feb 4, 2026
4 checks passed
@benITo47 benITo47 deleted the @bo/change_tokenizers branch February 4, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore PRs that are chores executorch Issues and tasks related specifically to ExecuTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants