[Bug]: Elasticsearch index creation #10109

americanthinker · 2024-01-18T02:04:01Z

Bug Description

When trying to create an Elasticsearch index using the llamaindex VectorStoreIndex.from_documents method, I continuallly get a TypeError

Version

0.9.31

Steps to Reproduce

#create ES client

password = os.environ['ELASTICSEARCH_PASSWORD']
endpoint = 'https://localhost:9200'
cert_path = '/home/elastic/notebooks/vectorsearch-applications/http_ca.crt'
client = Elasticsearch(hosts=endpoint, ca_certs=cert_path, basic_auth=('elastic', password))

#create storage context

store = ElasticsearchStore(index_name="paul_graham", es_client=client)
storage_context = StorageContext.from_defaults(vector_store=store)

#create service context

splitter = SentenceSplitter(chunk_overlap=0, chunk_size=128)
embed_model = "local:BAAI/bge-small-en-v1.5"
service_context = ServiceContext.from_defaults(embed_model=embed_model, text_splitter=splitter)

#create index (or at least try to anyway...)

index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, service_context=service_context, show_progress=True)

Relevant Logs/Tracbacks

TypeError                                 Traceback (most recent call last)
Cell In[23], line 15
     12 embed_model = "local:BAAI/bge-small-en-v1.5"
     13 service_context = ServiceContext.from_defaults(embed_model=embed_model, text_splitter=splitter)
---> 15 index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, service_context=service_context, show_progress=True)

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/base.py:107, in BaseIndex.from_documents(cls, documents, storage_context, service_context, show_progress, **kwargs)
     98     docstore.set_document_hash(doc.get_doc_id(), doc.hash)
    100 nodes = run_transformations(
    101     documents,  # type: ignore
    102     service_context.transformations,
    103     show_progress=show_progress,
    104     **kwargs,
    105 )
--> 107 return cls(
    108     nodes=nodes,
    109     storage_context=storage_context,
    110     service_context=service_context,
    111     show_progress=show_progress,
    112     **kwargs,
    113 )

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:52, in VectorStoreIndex.__init__(self, nodes, index_struct, service_context, storage_context, use_async, store_nodes_override, insert_batch_size, show_progress, **kwargs)
     50 self._store_nodes_override = store_nodes_override
     51 self._insert_batch_size = insert_batch_size
---> 52 super().__init__(
     53     nodes=nodes,
     54     index_struct=index_struct,
     55     service_context=service_context,
     56     storage_context=storage_context,
     57     show_progress=show_progress,
     58     **kwargs,
     59 )

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/base.py:72, in BaseIndex.__init__(self, nodes, index_struct, storage_context, service_context, show_progress, **kwargs)
     70 if index_struct is None:
     71     assert nodes is not None
---> 72     index_struct = self.build_index_from_nodes(nodes)
     73 self._index_struct = index_struct
     74 self._storage_context.index_store.add_index_struct(self._index_struct)

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:271, in VectorStoreIndex.build_index_from_nodes(self, nodes, **insert_kwargs)
    263 if any(
    264     node.get_content(metadata_mode=MetadataMode.EMBED) == "" for node in nodes
    265 ):
    266     raise ValueError(
    267         "Cannot build index from nodes with no content. "
    268         "Please ensure all nodes have content."
    269     )
--> 271 return self._build_index_from_nodes(nodes, **insert_kwargs)

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:243, in VectorStoreIndex._build_index_from_nodes(self, nodes, **insert_kwargs)
    241     run_async_tasks(tasks)
    242 else:
--> 243     self._add_nodes_to_index(
    244         index_struct,
    245         nodes,
    246         show_progress=self._show_progress,
    247         **insert_kwargs,
    248     )
    249 return index_struct

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:197, in VectorStoreIndex._add_nodes_to_index(self, index_struct, nodes, show_progress, **insert_kwargs)
    195 for nodes_batch in iter_batch(nodes, self._insert_batch_size):
    196     nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress)
--> 197     new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)
    199     if not self._vector_store.stores_text or self._store_nodes_override:
    200         # NOTE: if the vector store doesn't store text,
    201         # we need to add the nodes to the index struct and document store
    202         for node, new_id in zip(nodes_batch, new_ids):
    203             # NOTE: remove embedding from node to avoid duplication

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/vector_stores/elasticsearch.py:316, in ElasticsearchStore.add(self, nodes, create_index_if_not_exists, **add_kwargs)
    293 def add(
    294     self,
    295     nodes: List[BaseNode],
   (...)
    298     **add_kwargs: Any,
    299 ) -> List[str]:
    300     """Add nodes to Elasticsearch index.
    301 
    302     Args:
   (...)
    314         BulkIndexError: If AsyncElasticsearch async_bulk indexing fails.
    315     """
--> 316     return asyncio.get_event_loop().run_until_complete(
    317         self.async_add(nodes, create_index_if_not_exists=create_index_if_not_exists)
    318     )

File /anaconda/envs/openai/lib/python3.10/site-packages/nest_asyncio.py:99, in _patch_loop.<locals>.run_until_complete(self, future)
     96 if not f.done():
     97     raise RuntimeError(
     98         'Event loop stopped before Future completed.')
---> 99 return f.result()

File /anaconda/envs/openai/lib/python3.10/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception.with_traceback(self._exception_tb)
    202 return self._result

File /anaconda/envs/openai/lib/python3.10/asyncio/tasks.py:232, in Task.__step(***failed resolving arguments***)
    228 try:
    229     if exc is None:
    230         # We use the `send` method directly, because coroutines
    231         # don't have `__iter__` and `__next__` methods.
--> 232         result = coro.send(None)
    233     else:
    234         result = coro.throw(exc)

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/vector_stores/elasticsearch.py:356, in ElasticsearchStore.async_add(self, nodes, create_index_if_not_exists, **add_kwargs)
    354 if create_index_if_not_exists:
    355     dims_length = len(nodes[0].get_embedding())
--> 356     await self._create_index_if_not_exists(
    357         index_name=self.index_name, dims_length=dims_length
    358     )
    360 embeddings: List[List[float]] = []
    361 texts: List[str] = []

File /anaconda/envs/openai/lib/python3.10/site-packages/llama_index/vector_stores/elasticsearch.py:246, in ElasticsearchStore._create_index_if_not_exists(self, index_name, dims_length)
    237 async def _create_index_if_not_exists(
    238     self, index_name: str, dims_length: Optional[int] = None
    239 ) -> None:
    240     """Create the AsyncElasticsearch index if it doesn't already exist.
    241 
    242     Args:
    243         index_name: Name of the AsyncElasticsearch index to create.
    244         dims_length: Length of the embedding vectors.
    245     """
--> 246     if await self.client.indices.exists(index=index_name):
    247         logger.debug(f"Index {index_name} already exists. Skipping creation.")
    249     else:

TypeError: object HeadApiResponse can't be used in 'await' expression

The text was updated successfully, but these errors were encountered:

dosubot · 2024-01-18T02:04:33Z

🤖

Thank you for providing detailed information about the issue you're encountering. It seems like there's a TypeError when trying to create an Elasticsearch index using the VectorStoreIndex.from_documents method in LlamaIndex v0.9.31.

To better assist you, could you please provide the following additional information:

What is the exact version of Elasticsearch you're using?
Could you provide a sample of the docs you're trying to index? Please make sure to anonymize or remove any sensitive data before sharing.
Are there any other error messages or logs that appear before the TypeError?

This information will help me understand the context better and provide a more accurate solution.

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich · 2024-01-18T02:21:58Z

It's doing if await self.client.indices.exists(index=index_name): but the error tells me that this isn't actually an async operation for some reason 🤔

americanthinker · 2024-01-18T23:57:07Z

Why is async being used here to create an index anyway?

logan-markewich · 2024-01-19T00:17:47Z

@americanthinker to avoid code duplication, it looks like all the sync methods rum the async version using asyncio.run(..)

If you have time to debug or make a PR, it would be greatly appreciated (I don't have access to elasticsearch to test at the moment)

sreenivasanm6 · 2024-02-09T13:58:07Z

Team, I am also seeing similar (same) issue when Elasticsearch server of version 8.8 is run on local machine. But, the same is working when I am using cloud keys. Any issue in the configurations when using it locally? Please let me know if you need any details further.

dosubot · 2024-05-10T16:03:57Z

Hi, @americanthinker,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. The issue you opened is related to a TypeError when creating an Elasticsearch index using the VectorStoreIndex.from_documents method in the llamaindex library version 0.9.31. There were discussions with dosubot requesting additional information about the Elasticsearch version, sample docs being indexed, and any other error messages. logan-markewich pointed out that the async operation is not being recognized as such, and there was a discussion about the use of async for index creation. sreenivasanm6 also reported a similar issue with Elasticsearch server version 8.8 when run locally.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!

americanthinker added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 18, 2024

dosubot bot mentioned this issue Apr 12, 2024

[Bug]: TypeError: object HeadApiResponse can't be used in 'await' expression #12787

Open

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 10, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Elasticsearch index creation #10109

[Bug]: Elasticsearch index creation #10109

americanthinker commented Jan 18, 2024

dosubot bot commented Jan 18, 2024 •

edited

logan-markewich commented Jan 18, 2024

americanthinker commented Jan 18, 2024

logan-markewich commented Jan 19, 2024

sreenivasanm6 commented Feb 9, 2024

dosubot bot commented May 10, 2024

[Bug]: Elasticsearch index creation #10109

[Bug]: Elasticsearch index creation #10109

Comments

americanthinker commented Jan 18, 2024

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Jan 18, 2024 • edited

logan-markewich commented Jan 18, 2024

americanthinker commented Jan 18, 2024

logan-markewich commented Jan 19, 2024

sreenivasanm6 commented Feb 9, 2024

dosubot bot commented May 10, 2024

dosubot bot commented Jan 18, 2024 •

edited