Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How to use LabelledRagDataset with RetrieverEvaluator or BatchEvalRunner #10445

Open
1 task done
austinmw opened this issue Feb 4, 2024 · 5 comments
Open
1 task done
Labels
question Further information is requested

Comments

@austinmw
Copy link
Contributor

austinmw commented Feb 4, 2024

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Hi, I have a LabelledRagDataset created with RagDatasetGenerator. Now how can I use RetrieverEvaluator or BatchEvalRunner with this? Are they compatible?

The documentation only mentions using it with a RagEvaluatorPack which is not customizable enough. My goal is to measure Hit Rate, MRR, Context Relevance, and Faithfulness, using Bedrock LLMs.

@austinmw austinmw added the question Further information is requested label Feb 4, 2024
Copy link

dosubot bot commented Feb 4, 2024

🤖

Based on the information provided in the context, it seems that the LabelledRagDataset can be used with RagEvaluatorPack as shown in the code snippet. However, the RetrieverEvaluator or BatchEvalRunner are not mentioned in the context, so it's unclear how they can be used with LabelledRagDataset.

As for customizing the llm, embedding model, prompts, and adding hit rate and MRR metrics, the context does not provide enough information to give a complete answer. It's possible that these customizations would require modifications to the RagEvaluatorPack or the creation of a custom evaluator.

Here is the code snippet from the context that shows how to use LabelledRagDataset with RagEvaluatorPack:

from llama_index.llama_pack import download_llama_pack

RagEvaluatorPack = download_llama_pack("RagEvaluatorPack", "./pack")

rag_evaluator = RagEvaluatorPack(
    query_engine=query_engine,  # built with the same source Documents as the rag_dataset
    rag_dataset=rag_dataset,
)
benchmark_df = await rag_evaluator.run()

The query_engine should be built with the same source Documents as the rag_dataset. The benchmark_df contains the mean scores for evaluation measures such as Correctness, Relevancy, Faithfulness and Context Similarity.

You can find more information in the evaluating with LlamaDatasets guide.

If you could provide more details about your use case or the specific customizations you're trying to make, I might be able to give a more precise answer.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@austinmw
Copy link
Contributor Author

austinmw commented Feb 5, 2024

It looks like:

  • BaseRetrievalEvaluator.aevaluate_dataset expects a EmbeddingQAFinetuneDataset dataset object
  • BatchEvalRunner.aevaluate_responses expects both queries and responses lists
  • RagEvaluatorPack expects a LabelledRagDataset

So currently we have three different evaluation mechanisms that require three different dataset formats. It seems LabelledRagDataset is a newer abstraction, and could be used as a single dataset format across all of these, so multiple different formats of essentially the same information is no longer needed?

@austinmw
Copy link
Contributor Author

austinmw commented Feb 8, 2024

Bump 😅

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 9, 2024
Copy link

dosubot bot commented May 9, 2024

Hi, @austinmw,

I'm helping the LlamaIndex team manage our backlog and am marking this issue as stale. From what I understand, you were inquiring about using a LabelledRagDataset with RetrieverEvaluator or BatchEvalRunner and whether they are compatible. There was a discussion about the documentation mentioning the use of LabelledRagDataset with RagEvaluatorPack and the need for more customization. It was suggested that customizations might require modifications to the RagEvaluatorPack or the creation of a custom evaluator. You later pointed out that there are currently three different evaluation mechanisms that require three different dataset formats and suggested that LabelledRagDataset could be used as a single dataset format across all of these.

Is this issue still relevant to the latest version of the LlamaIndex repository? If so, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contributions to the LlamaIndex project.

Dosu

@austinmw
Copy link
Contributor Author

austinmw commented May 9, 2024

not stale.

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant