[Question]: Hybrid search? #12381

chengyjonathan · 2024-03-28T17:53:46Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

Hi there!

I'm wondering where would be the best place to get started if I wanted to implement a lexical, semantic, graph search hybrid.

I see the examples for doing lexical + semantic retrieval.

And I was wondering if there are any guides for doing semantic + kg retrieval.

dosubot · 2024-03-28T17:54:00Z

Hey @chengyjonathan! Great to see you venturing into new territories again 🚀

Based on the information provided in the LlamaIndex codebase, it appears that the framework already supports a hybrid search mechanism involving lexical, semantic, and knowledge graph retrieval. This is primarily facilitated through the KGTableRetriever class, where the retriever_mode parameter can be set to KGRetrieverMode.HYBRID, indicating a hybrid approach combining both keywords and embeddings to find relevant triplets in a knowledge graph.

The hybrid search functionality is implemented in the _retrieve method of the KGTableRetriever class. This method uses both lexical search (via keywords) and semantic search (via embeddings) to retrieve relevant nodes from the knowledge graph. If the retriever_mode is set to HYBRID, the method combines results from both lexical and semantic searches to enhance the retrieval process.

Here's a brief overview of how the hybrid search works:

Keyword Extraction: The method starts by extracting keywords from the query string using the _get_keywords method, which leverages a language model for keyword extraction.
Lexical Search: For each extracted keyword, it searches the knowledge graph for relevant nodes using the keyword search functionality (self._index_struct.search_node_by_keyword). This represents the lexical aspect of the search.
Semantic Search: If embeddings are used (self._retriever_mode != KGRetrieverMode.KEYWORD), the method computes the query embedding using the provided embedding model (self._embed_model.get_text_embedding). It then retrieves relevant text snippets based on their semantic similarity to the query embedding, using the embeddings stored in self._index_struct.embedding_dict.
Combining Results: If the retriever_mode is HYBRID, the method combines nodes found through both lexical and semantic searches. It ensures uniqueness of the results and prioritizes them based on relevance.
Final Node Retrieval: The method retrieves the final set of nodes based on the combined results from both searches, along with their associated text snippets if include_text is set to True.

You can find this implementation in the llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py file.

I hope this helps! If you have any further questions or need more clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich · 2024-03-28T18:06:56Z

I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix

https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion

chengyjonathan · 2024-03-28T18:11:10Z

I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix

https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion

Forgive me if this is wrong, but would I need two separate indexes? One composed of kg triplets?

Or could I just mix bm25, embedding based, and a kg based retriever?

logan-markewich · 2024-03-28T18:23:08Z

@chengyjonathan 3 retrievers, that then get composed into one. How you get those retrievers is up to you 👍🏻

anhthyngo · 2024-05-12T15:30:43Z

I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix

https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion

@logan-markewich

If I had a KnowledgeGraphRAGRetriever and a VectorIndexAutoRetriever and wanted to fuse them - would the SQLAutoVectoryQueryEngine framework but for Knowledge Graphs be the correct approach over this Fusion retriever?

https://docs.llamaindex.ai/en/stable/examples/query_engine/SQLAutoVectorQueryEngine/

chengyjonathan added the question Further information is requested label Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Hybrid search? #12381

[Question]: Hybrid search? #12381

chengyjonathan commented Mar 28, 2024

dosubot bot commented Mar 28, 2024 •

edited

About Dosu

logan-markewich commented Mar 28, 2024

chengyjonathan commented Mar 28, 2024

logan-markewich commented Mar 28, 2024

anhthyngo commented May 12, 2024

[Question]: Hybrid search? #12381

[Question]: Hybrid search? #12381

Comments

chengyjonathan commented Mar 28, 2024

Question Validation

Question

dosubot bot commented Mar 28, 2024 • edited

Sources

About Dosu

logan-markewich commented Mar 28, 2024

chengyjonathan commented Mar 28, 2024

logan-markewich commented Mar 28, 2024

anhthyngo commented May 12, 2024

dosubot bot commented Mar 28, 2024 •

edited