Future improvements and possibilities
We have only scratched the surface. A lot more can be done for this.
Improve accuracy
We already have made a rather complex pipeline includeing FAISS wrapper, embeddings and RAG, but the accuracy can be improved further.
-
Some documents may not be present as pdf files but images or some other medium. We can make features for reading text even from these.
-
Some documents are unable to be classified into any category. The tool can request human guidance for these from the admin.
-
A custom LLM can be trained on thousands of documents from these categories to become specifically good at this task.
Chatbot
This is already in the conceptual stage and I am already working on it.
We can make the chatbot which will answer document retention related queries and also be able to edit the retention schedule or manipulate the documents through verbal prompts.
Automation
- The entire process of classifying a document and computing its expiry can be automated, so that whenever a new document is added to the repository, its expiry statistics will automatically be reflected in the dashboard.