An explosion of natural language AI-powered apps that began with the launch of ChatGPT last year caused an unprecedented increase in demand for vector databases throughout the software industry.
Why vector databases? Because they are the most efficient storage option for embeddings which are numerical representation of data that AI models like GPT-4 uses to process and understand text.
Why not relational databases? While possible, they are not as efficient as vector databases, we'll see how exactly later in this article.
So, without further ado, let's define what vector databases are and explore how you can use them for your RAG app.
Relational (SQL) vs. Vector Databases
Relational databases are your best bet if you're working with structured data that require simple or complex querying capabilities. Vector databases excel at handling unstructured data such as images, audio, and text which makes them the best option for tasks involving machine learning and similarity searches.
Traditional Database Management Systems
We've all worked with SQL at some point. So I'm assuming you're familiar with how traditional databases handle storing and retrieving information. If not, here's a quick link to get started.
They are great for what they're built for, storing and querying structured data but not so much when handling high-dimensional numerical data, such as text, images, and audio.
For that kind of data, we're going to take a look at Vector Databases.
Introduction to Vector Databases
What are Vectors
As we've discussed earlier, vectors are numerical representations of data representing various types of information, such as images, audio, geolocation, or other data types.
Using the OpenAI embedding model, for example, we can see how the text "canine companions say" is transformed into a vector representation using OpenAI's embedding model.
The great thing about vectors is that we can find similarities between them relatively easily using techniques like Euclidean distance or cosine similarity. This helps a lot in fields like data analysis and machine learning.
What are Vector Databases
Vector databases are a type of specialized database management system that are built to handle vector data efficiently.
Benefits of a Vector Database
There are numerous benefits to using a vector database. Here are the most obvious ones:
- Efficient storage and retrieval: Vectorized data can be stored and compared computationally inexpensively, making it easier to manage and retrieve large amounts of data.
- Support for similarity search: Vector queries in vector databases typically search for similar vectors using one or several query vectors. This allows for applications such as reverse image search, recommender systems, and chatbots.
Popular Vector DBMS Choices
- Chroma (Open source and free)
Coming from a SQL Background
Okay, so you're coming from a SQL background and you're looking to add AI capabilities to your app. You have two options:
- Integrate with a Vector Database
- If you already use Postgres, give pgvector a try
I recommend you take a look at the most popular Vector database choices (above) to start experimenting with integration into your application.
If you're in the process or planning to build an AI-powered app, I suggest you get familiar with some of the popular vector databases. Sooner or later you'll find yourself working with embeddings and vectorized data.
Frequently asked questions
Which database do I use for my new project?
It depends. But you'll most likely be working with both traditional and vector databases if you need to store and retrieve vector data.
Do I need to move my data from an SQL to a Vector Database?
As mentioned above, the answer is No. You can (and should) keep your existing SQL database. However, you may decide to run a separate vector database if you need to store and retrieve high-dimension vector data.
Drop your comments or questions below and follow me on X (formerly twitter) for more updates!
Thanks for reading.