Skip to content

[MEDI] Extend IngestionChunk with TokenCount property #7263

@adamsitnik

Description

@adamsitnik
  1. Extend the IngestionChunk type that can be found here with a public readonly property that returns the number of tokens used to represent given chunk:
public int TokenCount { get; }
  1. Extend existing public constructor of this type with a mandatory int tokenCount parameter. Throw ArgumentOutOfRangeException when the value is negative.
  2. Update all the chunkers that we ship in Microsoft.Extensions.DataIngestion project to provide this value at creation time.
  3. Update all the tests from the Microsoft.Extensions.DataIngestion.Tests project to provide this value as well.
  4. Add new tests or extend existing ones with explicit token count verification.
  5. Ensure there are no build errors.

This is the first step toward implementing #6971

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions