Add support for non aws models - openAI + gemini by madhurprash · Pull Request #206 · aws-samples/foundation-model-benchmarking-tool

madhurprash · 2024-10-01T21:51:13Z

This PR contains the following:

Added external_predictor.py for handling inferences for openAI + gemini
Prompt templates for gemini + openAI
configuration files for openai only + gemini only + openai/gemini/bedrock models

aarora79

I think we also have to make a chance in the metrics calculation notebook because it also does some pricing related calculations.

aarora79 · 2024-10-02T14:18:55Z

docs/benchmarking_non_aws_models.md

@@ -0,0 +1,27 @@
+# Benchmark non AWS models on FMBench


Since this is specific to OpenAI and Gemini so I would mention that directly instead of saying non AWS because any 3P or open-source model is non AWS in that sense. Change to Benchmark OpenAI and Gemini models.

aarora79 · 2024-10-02T14:19:15Z

docs/benchmarking_non_aws_models.md

@@ -0,0 +1,27 @@
+# Benchmark non AWS models on FMBench
+
+This feature enables users to benchmark non AWS models on FMBench, such as OpenAI and Gemini models. Current models that are tested with this feature are: `gpt-4o`, `gpt-4o-mini`, `gemini-1.5-pro` and `gemini-1.5-flash`.


FMBench -> FMBench.

aarora79 · 2024-10-02T14:20:45Z

docs/benchmarking_non_aws_models.md

@@ -0,0 +1,27 @@
+# Benchmark non AWS models on FMBench
+
+This feature enables users to benchmark non AWS models on FMBench, such as OpenAI and Gemini models. Current models that are tested with this feature are: `gpt-4o`, `gpt-4o-mini`, `gemini-1.5-pro` and `gemini-1.5-flash`.


"...users to benchmark non AWS..." -> "users to benchmark external models such as OpenAI and Gemini models on FMBench"

aarora79 · 2024-10-02T14:21:39Z

docs/benchmarking_non_aws_models.md

+
+### Prerequisites
+
+To benchmark a non AWS model, the configuration file requires an **API Key**. Mention your custom API key within the `inference_spec` section in the `experiments` within the configuration file. View an example below:


API Key -> a model provider provided API Key (such as an OpenAI key or a Gemini key)

We should not configure the key directly, but rather the path to the API key. This should be handled in the same way as we handle the hf_token.txt to read the HF token.

aarora79 · 2024-10-02T14:23:33Z

src/fmbench/configs/external/config-openai-google-bedrock.yml

+    read_bucket: {read_bucket}
+    scripts_prefix: scripts ## add your own scripts in case you are using anything that is not on jumpstart
+    script_files:
+    - hf_token.txt  ## add your scripts files you have in s3 (including inference files, serving stacks, if any)


I would add the path to openai_key.txt and gemini_key.txt in this list.

aarora79 · 2024-10-02T14:24:00Z

src/fmbench/configs/external/config-openai-google-bedrock.yml

+    max_length_in_tokens: 6000
+    payload_file: payload_en_5000-6000.jsonl
+  - language: en
+    min_length_in_tokens: 305


remove the 305 to 3997

aarora79 · 2024-10-02T14:24:16Z

src/fmbench/configs/external/config-openai-google-bedrock.yml

+
+
+metrics:
+  dataset_of_interest: en_500-1000 # en_5000-6000


change to 3000-4000

aarora79 · 2024-10-02T14:25:19Z

src/fmbench/configs/external/config-openai-google-bedrock.yml

+    inference_script: external_predictor.py
+    inference_spec:
+      split_input_and_parameters: no
+      api_key: <your-api-key>


remove this parameter, if an external_predictor is being used then it should automatically check if the openai_key.txt or gemini_key.txt is present then set it into env vars, do not need to have this parameter here.

aarora79 · 2024-10-02T14:26:17Z

src/fmbench/globals.py

 # NOTE: if tokenizer files are provided in the tokenizer directory then they take precedence
 # if the files are not present then we load the tokenizer for this model id from Hugging Face
-TOKENIZER_MODEL_ID = config['experiments'][0]['model_id']
+if config['experiments'][0].get('model_id', None) is not None:


rebaseline from main

aarora79 · 2024-10-02T14:30:04Z

src/fmbench/scripts/external_predictor.py

+        # The inference format for each option (OpenAI/Gemini) is the same using LiteLLM
+        # for streaming/non-streaming
+        # set the environment for the specific model 
+        if 'gemini' in self.endpoint_name:


We should just do this based on the presence of a file and not rely on endpoint name.

madhurprash linked an issue Oct 1, 2024 that may be closed by this pull request

Add support for non AWS models to be benchmarked #203

Open

aarora79 requested changes Oct 2, 2024

View reviewed changes

madhurprash force-pushed the add-support-for-non-aws-models branch from 90950d4 to daf7a2b Compare October 2, 2024 15:16

madhurprash force-pushed the add-support-for-non-aws-models branch from 1a00eb6 to 542179b Compare October 21, 2024 14:44

Ubuntu and others added 19 commits December 8, 2024 10:38

first working version for openAI on fmbench

baae8ba

openai prompt template - test wip

a54da8b

config file for openai+bedrock

97cab37

support for gemini

94a3a23

config files for openai & gemini - bug fixes to be done for api handling

8dc2849

api handling done, first working version

9385b37

updated predictor

fcc68e1

updated bug fix for tokenizer fix

343490d

config update for gemini

66d72fd

new config files for openai and gemini

722aca3

non aws models docs update

61245f9

Updating the model eval judge

64a1616

code refactoring

fb434cc

tested latest updates with openAI and Gemini

cbe6770

Update external_predictor.py

8a80c7d

mixtral file for g6.48xl on ec2

c380679

Update pricing.yml for opus and 3.5 sonnet

f8e6b3e

add sonnet 3.5/opus + retry logic and update pricing

c901882

add config file for gpt 4o for large prompts

eefa311

madhurprash force-pushed the add-support-for-non-aws-models branch from aa98452 to eefa311 Compare December 8, 2024 15:39

madhurprash and others added 5 commits December 8, 2024 15:48

open ai, gemini, nova models

75b635c

nova, openai addition

af6a52d

nova, openAI implementation

b4073d3

updating the nova openai config files

0c92e30

adding support for nova + openai + gemini

f9a875d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for non aws models - openAI + gemini#206

Add support for non aws models - openAI + gemini#206
madhurprash wants to merge 24 commits intomainfrom
add-support-for-non-aws-models

madhurprash commented Oct 1, 2024

Uh oh!

aarora79 left a comment

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

aarora79 Oct 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,27 @@
		# Benchmark non AWS models on FMBench

		This feature enables users to benchmark non AWS models on FMBench, such as OpenAI and Gemini models. Current models that are tested with this feature are: `gpt-4o`, `gpt-4o-mini`, `gemini-1.5-pro` and `gemini-1.5-flash`.


		### Prerequisites

		To benchmark a non AWS model, the configuration file requires an API Key. Mention your custom API key within the `inference_spec` section in the `experiments` within the configuration file. View an example below:

Conversation

madhurprash commented Oct 1, 2024

Uh oh!

aarora79 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants