What is the details for the data used for reward model OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1? #3686

REIGN12 · 2023-09-13T02:53:34Z

Many thanks for your great open sourcing effort! And I am new to this field and I am particularly interested in the data for training reward model. I noticed that there is a simple dataset config for this, but I am a little bit confused about the details.

  datasets:
    - oasst_export:
        lang: "en,es,de,fr"
        input_file_path: 2023-03-27_oasst_research_ready_synth.jsonl.gz
        val_split: 0.1
    - anthropic_rlhf:
        fraction: 0.1
        max_val_set: 1000
    - shp:
        max_val_set: 1000
    - hellaswag:
        fraction: 0.5
        max_val_set: 1000
    - webgpt:
         val_split: 0.05
         max_val_set: 1000
    - hf_summary_pairs:
         fraction: 0.1
         max_val_set: 250

How can we use hellaswag as a comparison dataset? There seem to be multiple choices(rather than 2)
Is there any experimental evidence support the fraction setting we are currently using?

Many thanks for any responses in advance!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the details for the data used for reward model OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1? #3686

What is the details for the data used for reward model OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1? #3686

REIGN12 commented Sep 13, 2023 •

edited

What is the details for the data used for reward model OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1? #3686

What is the details for the data used for reward model OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1? #3686

Comments

REIGN12 commented Sep 13, 2023 • edited

REIGN12 commented Sep 13, 2023 •

edited