Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Elasticsearch index creation #10109

Closed
americanthinker opened this issue Jan 18, 2024 · 6 comments
Closed

[Bug]: Elasticsearch index creation #10109

americanthinker opened this issue Jan 18, 2024 · 6 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@americanthinker
Copy link

Bug Description

When trying to create an Elasticsearch index using the llamaindex VectorStoreIndex.from_documents method, I continuallly get a TypeError

Version

0.9.31

Steps to Reproduce

#create ES client

password = os.environ['ELASTICSEARCH_PASSWORD']
endpoint = 'https://localhost:9200'
cert_path = '/home/elastic/notebooks/vectorsearch-applications/http_ca.crt'
client = Elasticsearch(hosts=endpoint, ca_certs=cert_path, basic_auth=('elastic', password))

#create storage context

store = ElasticsearchStore(index_name="paul_graham", es_client=client)
storage_context = StorageContext.from_defaults(vector_store=store)

#create service context

splitter = SentenceSplitter(chunk_overlap=0, chunk_size=128)
embed_model = "local:BAAI/bge-small-en-v1.5"
service_context = ServiceContext.from_defaults(embed_model=embed_model, text_splitter=splitter)

#create index (or at least try to anyway...)

index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, service_context=service_context, show_progress=True)

Relevant Logs/Tracbacks

TypeError                                 Traceback (most recent call last)
Cell In[23], line 15
     12 embed_model = "local:BAAI/bge-small-en-v1.5"
     13 service_context = ServiceContext.from_defaults(embed_model=embed_model, text_splitter=splitter)
---> 15 index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, service_context=service_context, show_progress=True)

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/base.py:107, in BaseIndex.from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
     98     docstore.set_document_hash(doc.get_doc_id(), doc.hash)
    100 nodes = run_transformations(
    101     documents,  # type: ignore
    102     service_context.transformations,
    103     show_progress=show_progress,
    104     **kwargs,
    105 )
--> 107 return cls(
    108     nodes=nodes,
    109     storage_context=storage_context,
    110     service_context=service_context,
    111     show_progress=show_progress,
    112     **kwargs,
    113 )

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:52, in VectorStoreIndex.__init__(self, nodes, index_struct, service_context, storage_context, use_async, store_nodes_override, insert_batch_size, show_progress, **kwargs)
     50 self._store_nodes_override = store_nodes_override
     51 self._insert_batch_size = insert_batch_size
---> 52 super().__init__(
     53     nodes=nodes,
     54     index_struct=index_struct,
     55     service_context=service_context,
     56     storage_context=storage_context,
     57     show_progress=show_progress,
     58     **kwargs,
     59 )

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/base.py:72, in BaseIndex.__init__(self, nodes, index_struct, storage_context, service_context, show_progress, **kwargs)
     70 if index_struct is None:
     71     assert nodes is not None
---> 72     index_struct = self.build_index_from_nodes(nodes)
     73 self._index_struct = index_struct
     74 self._storage_context.index_store.add_index_struct(self._index_struct)

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:271, in VectorStoreIndex.build_index_from_nodes(self, nodes, **insert_kwargs)
    263 if any(
    264     node.get_content(metadata_mode=MetadataMode.EMBED) == "" for node in nodes
    265 ):
    266     raise ValueError(
    267         "Cannot build index from nodes with no content. "
    268         "Please ensure all nodes have content."
    269     )
--> 271 return self._build_index_from_nodes(nodes, **insert_kwargs)

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:243, in VectorStoreIndex._build_index_from_nodes(self, nodes, **insert_kwargs)
    241     run_async_tasks(tasks)
    242 else:
--> 243     self._add_nodes_to_index(
    244         index_struct,
    245         nodes,
    246         show_progress=self._show_progress,
    247         **insert_kwargs,
    248     )
    249 return index_struct

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:197, in VectorStoreIndex._add_nodes_to_index(self, index_struct, nodes, show_progress, **insert_kwargs)
    195 for nodes_batch in iter_batch(nodes, self._insert_batch_size):
    196     nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress)
--> 197     new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)
    199     if not self._vector_store.stores_text or self._store_nodes_override:
    200         # NOTE: if the vector store doesn't store text,
    201         # we need to add the nodes to the index struct and document store
    202         for node, new_id in zip(nodes_batch, new_ids):
    203             # NOTE: remove embedding from node to avoid duplication

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/vector_stores/elasticsearch.py:316, in ElasticsearchStore.add(self, nodes, create_index_if_not_exists, **add_kwargs)
    293 def add(
    294     self,
    295     nodes: List[BaseNode],
   (...)
    298     **add_kwargs: Any,
    299 ) -> List[str]:
    300     """Add nodes to Elasticsearch index.
    301 
    302     Args:
   (...)
    314         BulkIndexError: If AsyncElasticsearch async_bulk indexing fails.
    315     """
--> 316     return asyncio.get_event_loop().run_until_complete(
    317         self.async_add(nodes, create_index_if_not_exists=create_index_if_not_exists)
    318     )

File /anaconda/envs/openai/lib/python3.10/site-packages/nest_asyncio.py:99, in _patch_loop.<locals>.run_until_complete(self, future)
     96 if not f.done():
     97     raise RuntimeError(
     98         'Event loop stopped before Future completed.')
---> 99 return f.result()

File /anaconda/envs/openai/lib/python3.10/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception.with_traceback(self._exception_tb)
    202 return self._result

File /anaconda/envs/openai/lib/python3.10/asyncio/tasks.py:232, in Task.__step(***failed resolving arguments***)
    228 try:
    229     if exc is None:
    230         # We use the `send` method directly, because coroutines
    231         # don't have `__iter__` and `__next__` methods.
--> 232         result = coro.send(None)
    233     else:
    234         result = coro.throw(exc)

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/vector_stores/elasticsearch.py:356, in ElasticsearchStore.async_add(self, nodes, create_index_if_not_exists, **add_kwargs)
    354 if create_index_if_not_exists:
    355     dims_length = len(nodes[0].get_embedding())
--> 356     await self._create_index_if_not_exists(
    357         index_name=self.index_name, dims_length=dims_length
    358     )
    360 embeddings: List[List[float]] = []
    361 texts: List[str] = []

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/vector_stores/elasticsearch.py:246, in ElasticsearchStore._create_index_if_not_exists(self, index_name, dims_length)
    237 async def _create_index_if_not_exists(
    238     self, index_name: str, dims_length: Optional[int] = None
    239 ) -> None:
    240     """Create the AsyncElasticsearch index if it doesn't already exist.
    241 
    242     Args:
    243         index_name: Name of the AsyncElasticsearch index to create.
    244         dims_length: Length of the embedding vectors.
    245     """
--> 246     if await self.client.indices.exists(index=index_name):
    247         logger.debug(f"Index {index_name} already exists. Skipping creation.")
    249     else:

TypeError: object HeadApiResponse can't be used in 'await' expression
@americanthinker americanthinker added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 18, 2024
Copy link

dosubot bot commented Jan 18, 2024

🤖

Thank you for providing detailed information about the issue you're encountering. It seems like there's a TypeError when trying to create an Elasticsearch index using the VectorStoreIndex.from_documents method in LlamaIndex v0.9.31.

To better assist you, could you please provide the following additional information:

  1. What is the exact version of Elasticsearch you're using?
  2. Could you provide a sample of the docs you're trying to index? Please make sure to anonymize or remove any sensitive data before sharing.
  3. Are there any other error messages or logs that appear before the TypeError?

This information will help me understand the context better and provide a more accurate solution.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@logan-markewich
Copy link
Collaborator

It's doing if await self.client.indices.exists(index=index_name): but the error tells me that this isn't actually an async operation for some reason 🤔

@americanthinker
Copy link
Author

Why is async being used here to create an index anyway?

@logan-markewich
Copy link
Collaborator

@americanthinker to avoid code duplication, it looks like all the sync methods rum the async version using asyncio.run(..)

If you have time to debug or make a PR, it would be greatly appreciated (I don't have access to elasticsearch to test at the moment)

@sreenivasanm6
Copy link

Team, I am also seeing similar (same) issue when Elasticsearch server of version 8.8 is run on local machine. But, the same is working when I am using cloud keys. Any issue in the configurations when using it locally? Please let me know if you need any details further.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 10, 2024
Copy link

dosubot bot commented May 10, 2024

Hi, @americanthinker,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. The issue you opened is related to a TypeError when creating an Elasticsearch index using the VectorStoreIndex.from_documents method in the llamaindex library version 0.9.31. There were discussions with dosubot requesting additional information about the Elasticsearch version, sample docs being indexed, and any other error messages. logan-markewich pointed out that the async operation is not being recognized as such, and there was a discussion about the use of async for index creation. sreenivasanm6 also reported a similar issue with Elasticsearch server version 8.8 when run locally.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!

@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

3 participants