Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High level: Rethink the embedding database structure #5

Open
0hq opened this issue Jul 3, 2023 · 0 comments
Open

High level: Rethink the embedding database structure #5

0hq opened this issue Jul 3, 2023 · 0 comments

Comments

@0hq
Copy link
Owner

0hq commented Jul 3, 2023

tinyvector should look to be as simple as possible while being as powerful as possible. What is the best abstraction for an embedding database that minimizes the complexity and size of the codebase?

We've made a few assumptions so far:

  1. Only one index can be tied per table, since we're assuming most users won't need multiple indexes on the same data.
  2. Indexes can only be tied to a single table and cannot span multiple tables or special clauses. This might need to change in the future? Do we want to allow indexes to be built on multiple tables/with complex filtering?
  3. Indexes should try to not be mutable, instead, should force manual deletion and recreation? We may want to have a number of mutable indexes for compatibility, but it seems to be more straightforward (from a performance and a user experience perspective) to intend for most indexes to be immutable.
  4. Holding all indexes in memory and intending for vertical scaling seems like the simplest way to build tinyvector. In most common use-cases, it seems that vectors can easily be held in memory on reasonable hardware. If needed, you can do dimensionality reduction on your vectors to decrease memory impact and increase performance. Is this the right direction?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant