Content-Length: 311769 | pFad | http://github.com/vllm-project/vllm-spyre/pull/179

57 Add maybe_update_max_tokens for class SpyrePlatform by gmarinho2 · Pull Request #179 · vllm-project/vllm-spyre · GitHub
Skip to content

Add maybe_update_max_tokens for class SpyrePlatform #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gmarinho2
Copy link
Contributor

FIX: #148

Currently the class SpyrePlatform uses the default max_tokens set in the OpenAI frontend code. This new method selects the warmup shape that fits the prompt and has the biggest shape['new_tokens']. This way SpyrePlatform will use the maximum number possible for token generation when max_tokens is not set in the request body.

Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

gmarinho2 added 3 commits May 22, 2025 14:44
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
@wallashss
Copy link
Collaborator

I think you should probably set this PR as draft and wait for the feedback of the vllm community on vllm-project/vllm#18557. This changes depends on that.

@gmarinho2 gmarinho2 marked this pull request as draft May 22, 2025 19:15
Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once the upstream PR gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect default max_completion_tokens being set
3 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/vllm-project/vllm-spyre/pull/179

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy