You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched both the documentation and discord for an answer.
Question
Hello,
I've got a question that I suspect is a configuration issue rather than a bug. When I run my ingestion pipeline (code below) repeatedly on the same set of source documents, my vector database increases in size on each run, suggesting that some of the information is being duplicated rather than upserted. I believe I have the vector store, docstore, and cache set up correctly with Postgres, as they are populated on ingestion and queries work fine. I've reviewed these issues: 1, 2. In both it's suggested that the docstore be explicitly saved and loaded. However, I don't know if that applies in my case, since I am using a database rather than a local directory.
I'd be very appreciative of any help, either in advancing my understanding of how repeated ingestions on the same source material are supposed to work or in directly fixing my problem. Thanks in advance.
Thanks @logan-markewich . For some reason, they do not. I tested the ingestion on a single file. Running the ingestion a few seconds apart for the same file, without modifying the file between runs, gives me two different IDs in the docstore table.
Obviously, the hash is the same but the ID is different. I cannot figure out why this is occurring, but I'm wondering if it could be due to calling get_docstore too many times rather than calling it once and passing it to the other functions.
Question Validation
Question
Hello,
I've got a question that I suspect is a configuration issue rather than a bug. When I run my ingestion pipeline (code below) repeatedly on the same set of source documents, my vector database increases in size on each run, suggesting that some of the information is being duplicated rather than upserted. I believe I have the vector store, docstore, and cache set up correctly with Postgres, as they are populated on ingestion and queries work fine. I've reviewed these issues: 1, 2. In both it's suggested that the docstore be explicitly saved and loaded. However, I don't know if that applies in my case, since I am using a database rather than a local directory.
I'd be very appreciative of any help, either in advancing my understanding of how repeated ingestions on the same source material are supposed to work or in directly fixing my problem. Thanks in advance.
Perform ingestion:
Functions to create document store, vector store and cache:
The text was updated successfully, but these errors were encountered: