Conversation
|
hi @ProgrammerIn-wonderland , could you help to review this PR, thank you 🙏 |
|
This looks fine but I want to make sure there aren't weird effects with max_tokens first since max_tokens means something completely different for Gemini (maximum output tokens) than it does for the OpenAI models (max tokens in conversation). I think this change has the potential to cause issues |
|
noted, btw for the gemini 3 flash, i followed the number in the docs, which shows 65,536
okay, i'm not sure how the internals work yet for this, but just bringing this to your attention 🫡 |
|
so I think these should be good to be different than openai ones, but i won't merge in case @ProgrammerIn-wonderland finds something off with it |
|
Yeah the issue here is still that we associate Max tokens with context length internally and do math based on that, we probably need to do a secondary, alternate calculation to properly follow geminis. |
for some reason the numbers we have are inaccurate, compared to google own docs
i found these while adding the gemini 3 flash
please help cross check and review this
although nothing is wrong/broken yet with these models?
they still work, but idk what the implication is having these incorrect information