[Feature Request]: Add additional metrics to METRIC_REGISTRY for Retrieval and support for different K-s #11030

hkristof03 · 2024-02-20T15:34:43Z

Feature Description

Hi, thanks for this awesome library.

I found a few (for me) inconsistencies in the evaluation across modules.

The RetrievalEvaluator only supports Hit Rate and MRR (and Cohere, which is paid).

llama_index/llama-index-core/llama_index/core/evaluation/retrieval/evaluator.py

Line 19 in ccc0b85

class RetrieverEvaluator(BaseRetrievalEvaluator):

llama_index/llama-index-core/llama_index/core/evaluation/retrieval/metrics.py

Line 130 in ccc0b85

METRIC_REGISTRY: Dict[str, Type[BaseRetrievalMetric]] = {

At the same time BeirEvaluator supports NDCG, MAP, Recall and Precision at different K-s.

llama_index/llama-index-core/llama_index/core/evaluation/benchmarks/beir.py

Line 100 in ccc0b85

for k in metrics_k_values:

Although, BEIR supports additional metrics as well.

https://github.com/beir-cellar/beir/blob/f062f038c4bfd19a8ca942a9910b1e0d218759d4/beir/retrieval/evaluation.py#L94

So my questions are:

For the RetrievalEvaluator why do you only support MRR and Hit Rate, and not NDCG, MAP, Recall and Precision, at different K-s?
For BEIR, why don't you support the additional metrics?
Are you planning to add these metrics to the METRIC_REGISTRY ?

Thanks!

Reason

Only basic metrics are supported for Retrieval, independent of K retrieved documents.

Value of Feature

Additional metrics would give greater insights to the Retrieval and Ranking.

The text was updated successfully, but these errors were encountered:

dosubot · 2024-02-20T15:35:00Z

🤖

Thank you for your detailed questions. Let's address them one by one:

The RetrievalEvaluator currently supports MRR (Mean Reciprocal Rank) and Hit Rate because these are the metrics that have been implemented in the METRIC_REGISTRY. The METRIC_REGISTRY is a dictionary that maps metric names to their corresponding classes, and the resolve_metrics function uses this registry to create instances of the metrics for evaluation. As for the additional metrics like NDCG (Normalized Discounted Cumulative Gain), MAP (Mean Average Precision), Recall, and Precision at different K-s, these are not currently supported because they have not been implemented in the METRIC_REGISTRY.
The BEIR evaluator supports a different set of metrics because it is designed to evaluate retrieval models in a different context. The BEIR evaluator is designed to evaluate models on the BEIR benchmark, which includes a variety of different tasks and datasets. The metrics supported by the BEIR evaluator are those that are relevant for these tasks and datasets.
As for the future plans to include these additional metrics in the METRIC_REGISTRY, the context does not provide any information about this. Therefore, it is not possible to answer whether there are plans to include these additional metrics for better insights into Retrieval and Ranking.

To include these additional metrics, new classes would need to be created for each metric, similar to the HitRate, MRR, and CohereRerankRelevancyMetric classes. These new classes would need to inherit from BaseRetrievalMetric and implement the compute method to calculate the metric. Once these classes are created, they can be added to the METRIC_REGISTRY dictionary.

I hope this answers your questions. If you have any further queries, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

aoezdTchibo · 2024-04-10T09:33:26Z

It is a pity that there are no explicit plans for the integration of further metrics like MAP@K or NDCG@K, as I now have to implement these metrics locally myself, although I would like to use the RetrievalEvaluator OOTB...

EDIT: @hkristof03 If it is suitable you could use the InformationRetrievalEvaluator from the sentence-transformer framework.

hkristof03 · 2024-04-10T11:39:01Z

@aoezdTchibo I solved by getting the "embedding_dict" keys (node ids) and the embeddings (values) from the vector store's ".to_dict()" method, then getting the node id - doc id map from the "text_id_to_ref_doc_id" from the "embedding_dict", then creating my own faiss index and evaluating the retrieval using the retrieval component of Torchmetrics. Sounds complicated but not that much in implementation. Although I had to write several workarounds for this library to be able to test the RAG system Retrieval component (and its components) on several datasets. For experimentation and extensive evaluation the LlamaIndex is not yet set up.

aoezdTchibo · 2024-04-10T12:09:21Z

@hkristof03 Thanks for the link to Touchmetrics! I hadn't heard of this before and it looks very promising.

AgenP · 2024-05-13T07:47:00Z

Hey @hkristof03,

Quick note incase you are still interested

My PR was recently merged for adding a MRR@K and HitRate@K option, through usage of a "use_granular_..." attribute

So we now have more native flexibility for our evals

Hopefully you find it valuable 💪

hkristof03 added enhancement New feature or request triage Issue needs to be triaged/prioritized labels Feb 20, 2024

hkristof03 changed the title ~~[Feature Request]:~~ [Feature Request]: Add additional metrics to METRIC_REGISTRY for Retrieval and support for different K-s Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Add additional metrics to METRIC_REGISTRY for Retrieval and support for different K-s #11030

[Feature Request]: Add additional metrics to METRIC_REGISTRY for Retrieval and support for different K-s #11030

hkristof03 commented Feb 20, 2024

dosubot bot commented Feb 20, 2024 •

edited

About Dosu

aoezdTchibo commented Apr 10, 2024 •

edited

hkristof03 commented Apr 10, 2024

aoezdTchibo commented Apr 10, 2024

AgenP commented May 13, 2024

[Feature Request]: Add additional metrics to METRIC_REGISTRY for Retrieval and support for different K-s #11030

[Feature Request]: Add additional metrics to METRIC_REGISTRY for Retrieval and support for different K-s #11030

Comments

hkristof03 commented Feb 20, 2024

Feature Description

Reason

Value of Feature

dosubot bot commented Feb 20, 2024 • edited

Sources

About Dosu

aoezdTchibo commented Apr 10, 2024 • edited

hkristof03 commented Apr 10, 2024

aoezdTchibo commented Apr 10, 2024

AgenP commented May 13, 2024

dosubot bot commented Feb 20, 2024 •

edited

aoezdTchibo commented Apr 10, 2024 •

edited