The requirement for in-memory and disk storage helps clarify a number of the {hardware} specs for Copilot+ PCs, with double the earlier Home windows base reminiscence necessities in addition to bigger, quicker SSDs. Usefully, there’s a decrease CPU requirement over different vector search algorithms, with at-scale implementations in Azure companies requiring solely 5% of the CPU conventional strategies use.
You’ll want a separate retailer for the info that’s being listed. Having separate shops for each your indexes and the supply of your embeddings does have its points. Should you’re working with personally identifiable info or different regulated information, you possibly can’t neglect guaranteeing that the supply information is encrypted. This will add overhead on queries, however apparently Microsoft is engaged on software-based safe enclaves that may each encrypt information at relaxation and in use, decreasing the danger of PII leaking or prompts being manipulated by malware.
DiskANN is an implementation of an approximate nearest neighbor search, utilizing a Vamana graph index. It’s designed to work with information that modifications often, which makes it a great tool for agent-like AI purposes that have to index native information or information held in companies like Microsoft 365, resembling electronic mail or Groups chats.