Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Hybrid search? #12381

Open
1 task done
chengyjonathan opened this issue Mar 28, 2024 · 5 comments
Open
1 task done

[Question]: Hybrid search? #12381

chengyjonathan opened this issue Mar 28, 2024 · 5 comments
Labels
question Further information is requested

Comments

@chengyjonathan
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

Hi there!

I'm wondering where would be the best place to get started if I wanted to implement a lexical, semantic, graph search hybrid.

I see the examples for doing lexical + semantic retrieval.

And I was wondering if there are any guides for doing semantic + kg retrieval.

@chengyjonathan chengyjonathan added the question Further information is requested label Mar 28, 2024
Copy link

dosubot bot commented Mar 28, 2024

Hey @chengyjonathan! Great to see you venturing into new territories again 🚀

Based on the information provided in the LlamaIndex codebase, it appears that the framework already supports a hybrid search mechanism involving lexical, semantic, and knowledge graph retrieval. This is primarily facilitated through the KGTableRetriever class, where the retriever_mode parameter can be set to KGRetrieverMode.HYBRID, indicating a hybrid approach combining both keywords and embeddings to find relevant triplets in a knowledge graph.

The hybrid search functionality is implemented in the _retrieve method of the KGTableRetriever class. This method uses both lexical search (via keywords) and semantic search (via embeddings) to retrieve relevant nodes from the knowledge graph. If the retriever_mode is set to HYBRID, the method combines results from both lexical and semantic searches to enhance the retrieval process.

Here's a brief overview of how the hybrid search works:

  1. Keyword Extraction: The method starts by extracting keywords from the query string using the _get_keywords method, which leverages a language model for keyword extraction.

  2. Lexical Search: For each extracted keyword, it searches the knowledge graph for relevant nodes using the keyword search functionality (self._index_struct.search_node_by_keyword). This represents the lexical aspect of the search.

  3. Semantic Search: If embeddings are used (self._retriever_mode != KGRetrieverMode.KEYWORD), the method computes the query embedding using the provided embedding model (self._embed_model.get_text_embedding). It then retrieves relevant text snippets based on their semantic similarity to the query embedding, using the embeddings stored in self._index_struct.embedding_dict.

  4. Combining Results: If the retriever_mode is HYBRID, the method combines nodes found through both lexical and semantic searches. It ensures uniqueness of the results and prioritizes them based on relevance.

  5. Final Node Retrieval: The method retrieves the final set of nodes based on the combined results from both searches, along with their associated text snippets if include_text is set to True.

You can find this implementation in the llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py file.

I hope this helps! If you have any further questions or need more clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@logan-markewich
Copy link
Collaborator

I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix

https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion

@chengyjonathan
Copy link
Author

I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix

https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion

Forgive me if this is wrong, but would I need two separate indexes? One composed of kg triplets?

Or could I just mix bm25, embedding based, and a kg based retriever?

@logan-markewich
Copy link
Collaborator

@chengyjonathan 3 retrievers, that then get composed into one. How you get those retrievers is up to you 👍🏻

@anhthyngo
Copy link

I think any custom retriever could do this. Or using the query fusion retriever and passing in all the retrievers you want to mix

https://docs.llamaindex.ai/en/stable/examples/retrievers/relative_score_dist_fusion/?h=query+fusion

@logan-markewich

If I had a KnowledgeGraphRAGRetriever and a VectorIndexAutoRetriever and wanted to fuse them - would the SQLAutoVectoryQueryEngine framework but for Knowledge Graphs be the correct approach over this Fusion retriever?

https://docs.llamaindex.ai/en/stable/examples/query_engine/SQLAutoVectorQueryEngine/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants