-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Unable to delete pinecone stored vectors with new pinecone serverless index architecture #13451
Comments
To adapt to Pinecone's new serverless index architecture and handle deletions without direct metadata filtering, you can follow a two-step approach:
Here's a code snippet to guide you through this process: def delete_vectors_by_metadata(self, metadata_filters):
# Query to get vector IDs based on metadata filters
filter = _to_pinecone_filter(metadata_filters) # Convert metadata filters to Pinecone's format
query_result = self._pinecone_index.query(
vector=[0.0] * self._pinecone_index.describe_index_stats()["dimension"],
filter=filter,
top_k=10000,
include_metadata=False,
namespace=self.namespace,
)
vector_ids = [match.id for match in query_result.matches]
# Delete vectors by IDs
for vector_id in vector_ids:
self._pinecone_index.delete(ids=[vector_id], namespace=self.namespace) This approach requires you to first identify the vectors to be deleted through a query and then delete them using their IDs. It's a shift from the previous method where metadata filtering directly facilitated deletions. Regarding the def add(
self,
nodes: List[BaseNode],
**add_kwargs: Any,
) -> List[str]:
ids = []
entries = []
for node in nodes:
node_id = node.node_id
# Prepare the entry with node ID, vector, and metadata
entries.append(entry)
ids.append(node_id)
self._pinecone_index.upsert(entries, namespace=self.namespace, batch_size=self.batch_size, **self.insert_kwargs)
return ids By following this updated approach, you can effectively manage deletions in Pinecone's serverless indexes, ensuring you can still remove vectors as needed despite the architectural changes.
|
@dosu although the add method returns the vector ids, when it is called from the ingestion pipeline run method, this run method does not return the ids. and the approach you provided seems rather convoluted for something as simple as deleting vectors, i think a better solution is to modify the pineconevectorstoree class similar to the modification in the init method to identify the pinecone version and serverless or not in order to initiate the pinecone client, to do the same for delete |
To address the challenge of deleting vectors in Pinecone's serverless architecture without direct metadata filtering, especially when vector IDs are not returned by the ingestion pipeline's
By focusing on these key areas, you can create a more efficient and effective solution for managing vector deletions in Pinecone's serverless architecture. This approach not only addresses the immediate challenge of deleting vectors without metadata filtering but also enhances the overall robustness and flexibility of your vector management strategy.
|
Bug Description
Pinecone is forcing all free plans to migrate to serverless indexes. According to their release, with serverless you are no longer able to delete vectors by metadata filtering, which is the primary way to delete using llama_index:
` def delete(self, ref_doc_id: str, **delete_kwargs: Any) -> None:
"""
Delete nodes using with ref_doc_id.
Currently, the new way to remove ids belonging to the same document is to use
# To iterate over all result pages using a generator function for ids in index.list(prefix='doc1#', namespace='ns1'): print(ids) index.delete(ids=ids, namespace=namespace)
This requires knowledge of the vector id that was used when llama_index uses the vector_store.add method.
self._pinecone_index.upsert( entries, namespace=self.namespace, batch_size=self.batch_size, **self.insert_kwargs, ) return ids
However, when you use a vector_store as part of the ingestion pipeline class, the vector ids are not returned in the run method:
How should one be deleting from pinecone now? I think it's pretty useful information to store the pinecone vector ids for both tracking purposes and deleting now that it is essential.
Version
0.10.30
Steps to Reproduce
Index a document to pinecone and try to remove it
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: