Some code is missing

The code for processing sharegpt in the script prepare_train_data.sh uses sharegpt conversation splitter **(split_sharegpt_conversations.py)** , but there is no such code in the corresponding directory. Where can I find it? As shown below.
```
echo "Downloading ShareGPT dataset..."
wget -nc -P data/raw_train/sharegpt/ https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/HTML_cleaned_raw_dataset/sg_90k_part1_html_cleaned.json
wget -nc -P data/raw_train/sharegpt/ https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/HTML_cleaned_raw_dataset/sg_90k_part2_html_cleaned.json
echo "Splitting the ShareGPT dataset with 2048 max tokens per conversation..."
python split_sharegpt_conversations.py \
    --in-files data/raw_train/sharegpt/sg_90k_part1_html_cleaned.json data/raw_train/sharegpt/sg_90k_part2_html_cleaned.json \
    --out-file data/raw_train/sharegpt/sharegpt_html_cleaned_and_split_2048.json \
    --model-name-or-path /data3/MODELS/llama-7b-hf \
    --max-length 2048
echo "Splitting the ShareGPT dataset with 4096 max tokens per conversation..."
python split_sharegpt_conversations.py \
    --in-files data/raw_train/sharegpt/sg_90k_part1_html_cleaned.json data/raw_train/sharegpt/sg_90k_part2_html_cleaned.json \
    --out-file data/raw_train/sharegpt/sharegpt_html_cleaned_and_split_4096.json \
    --model-name-or-path /data3/MODELS/llama-7b-hf \
    --max-length 4096
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some code is missing #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some code is missing #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions