-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Add additional metrics to METRIC_REGISTRY for Retrieval and support for different K-s #11030
Comments
🤖 Thank you for your detailed questions. Let's address them one by one:
To include these additional metrics, new classes would need to be created for each metric, similar to the I hope this answers your questions. If you have any further queries, feel free to ask. Sources
|
It is a pity that there are no explicit plans for the integration of further metrics like MAP@K or NDCG@K, as I now have to implement these metrics locally myself, although I would like to use the EDIT: @hkristof03 If it is suitable you could use the |
@aoezdTchibo I solved by getting the |
@hkristof03 Thanks for the link to Touchmetrics! I hadn't heard of this before and it looks very promising. |
Hey @hkristof03, Quick note incase you are still interested My PR was recently merged for adding a MRR@K and HitRate@K option, through usage of a "use_granular_..." attribute So we now have more native flexibility for our evals Hopefully you find it valuable 💪 |
Feature Description
Hi, thanks for this awesome library.
I found a few (for me) inconsistencies in the evaluation across modules.
The
RetrievalEvaluator
only supports Hit Rate and MRR (and Cohere, which is paid).llama_index/llama-index-core/llama_index/core/evaluation/retrieval/evaluator.py
Line 19 in ccc0b85
llama_index/llama-index-core/llama_index/core/evaluation/retrieval/metrics.py
Line 130 in ccc0b85
At the same time BeirEvaluator supports NDCG, MAP, Recall and Precision at different K-s.
llama_index/llama-index-core/llama_index/core/evaluation/benchmarks/beir.py
Line 100 in ccc0b85
Although, BEIR supports additional metrics as well.
https://github.com/beir-cellar/beir/blob/f062f038c4bfd19a8ca942a9910b1e0d218759d4/beir/retrieval/evaluation.py#L94
So my questions are:
RetrievalEvaluator
why do you only support MRR and Hit Rate, and not NDCG, MAP, Recall and Precision, at different K-s?METRIC_REGISTRY
?Thanks!
Reason
Only basic metrics are supported for Retrieval, independent of K retrieved documents.
Value of Feature
Additional metrics would give greater insights to the Retrieval and Ranking.
The text was updated successfully, but these errors were encountered: