Why LlamaIndex, Gemini, and Pinecone? Why not! Some blog members requested that I ditch OpenAI and write about other models, so there you have it.
What we're going to build
For this tutorial, I will show you how to build a simple Python app that uses Gemini Pro, LlamaIndex, and Pinecone to do the following:
- Scrape the contents of this post using
- Convert them to Vector Embeddings using
- Store the embeddings on a Pinecone index
- Load the Pinecone index and perform a similarity search
- Use LlamaIndex to query the
But wait! Before we get our hands dirty, let's go over the following essential definitions so you know what I'm talking about.
A quick roundup of the basics
What is Google Gemini?
Google Gemini is a multimodal large language model (or LLM) developed by Google DeepMind. It comes in three flavors and understands information across various data types: text, code, audio, image, and video. It's the GPT killer from Google.
What is LlamaIndex?
LlamaIndex is a data framework that helps integrate private data with LLMs to build RAG apps. It offers tools such as data ingestion, indexing, and querying. The framework is available in Python and TypeScript.
What is Pinecone?
Pinecone is a high-performance vector database for machine learning, natural language processing, and computer vision. It offers low-latency vector search and is known for its scalability and ability to handle high-dimensional data.
What is RAG (Retrieval-Augmented Generation)
RAG is a technique that enhances large language models by adding external knowledge, such as information from PDFs or websites (like we're doing today).
The architecture retrieves relevant facts from a data source outside the model's training data, expanding the model's knowledge.
Here's my best attempt at designing a diagram that shows the RAG architecture:
From theory into practice
Great! Now that the boring stuff is out of the way, it's time to get our toolkit and write some code.
We're going to start by creating a free Pinecone account.
How to create a Pinecone index
Let's set up our Pinecone vector database and index. We'll use the index to store and retrieve the data.
Step 1: Create a free Pinecone account
Head over to the Pinecone site and create a free account. Pinecone offers a free tier which is limited to one project and one index.
Step 2: Create a Pinecone project
Pinecone automatically creates a starter project for you. We're going to use it to create our index.
To create the index, click on the Indexes tab on the sidebar then click on the Create Index button as shown in the image below:
Next, choose any name for the index and make sure to enter
768 as the value for the Dimensions property then choose
cosine from the Metric dropdown. You can leave everything else as-is.
Here's what your index configuration should look like:
Now, click on the Create Index button to create your first Pinecone index.
create_index method of the Pinecone client and passing the configuration as parameters.
Step 3: Grab your Pinecone API key
Okay, with our Index set up and ready to go we'll need to grab our API key from the Pinecone console.
Go to the API Keys tab from the sidebar as shown below and copy your key:
How to get a Google Gemini API key
Once inside Google AI Studio, click on Get API key from the sidebar and copy your API key as shown below:
Good! Now we have our Google Gemini API and Pinecone API keys which is all we need to jump to the next part and write some code.
Building a RAG pipeline using LlamaIndex
I'm using a
macOS Ventura 13.6.4 but the steps below should be more or less the same for you. In case you run into problems, drop a comment below and I'll try to help.
Step 1: Prepare the environment
Let's create our project directory and our
main.py file. In your terminal and from your working directory, type the following commands:
Now, let's create and activate a virtual environment. This step is optional, but I highly recommend it. To do so, type the following commands in your terminal:
Next, we'll need to download and install the following packages:
pip, type the following command in your terminal:
It may take a few seconds until
pip downloads the packages and sets everything up. If all goes well, you should see a success message.
Step 2: Import libraries and define API keys
We'll need to import a few libraries and take care of some basics. Copy the code below into your
Make sure to replace
YOUR_PINECONE_API_KEY with your respective API keys that you copied earlier.
If you prefer, you can use the
export command to assign your keys to the environment variables like so:
Step 3: Create a Pinecone client
To send data back and forth between the app and Pinecone, we'll need to instantiate a Pinecone client. It's a one-liner:
Step 4: Select the Pinecone index
Using our Pinecone client, we can select the Index that we previously created and assign it to the variable
Step 5: Download webpage contents
Next, we'll need to download the contents of this post. To do this, we will use the LlamaIndex
BeautifulSoupWebReader data loader:
Step 6: Generate embeddings using
By default, LlamaIndex assumes you're using OpenAI to generate embeddings. To configure it to use Gemini instead, we need to set up the service context which lets LlamaIndex know which
llm and which embedding model to use.
set_global_service_context essentially sets the service context as the global default meaning that we will not need to pass it to every step in the LlamaIndex pipeline.
Step 7: Generate and store embeddings in the Pinecone index
VectorStoreIndex class, LlamaIndex takes care of sending the data chunks to the embedding model and then handles storing the vectorized data into the Pinecone index.
Alternatively, if you want to load your existing index you can use the
from_vector_store method of the
VectorStoreIndex class as follows:
storage_context it contains the vector store information which includes the
Step 8: Query Pinecone vector store
Now the contents of the
URL are converted to embeddings and stored in the Pinecone index.
Let's perform a similarity search by querying the index:
query_engine uses the query to find similar information from the index. Once it does, the information is passed to the Gemini Pro model, which generates an answer.
Now you've officially built a RAG pipeline using LlamaIndex. Congrats!
Step 9: Print the response
And that's all! Now let's run our Python app. In your terminal window type the following command:
Here's what Gemini Pro has to say about what I think of LlamaIndex:
The author thinks that LlamaIndex is a powerful data framework that supports a wide range of large language models. It is incredibly simple to extract and store custom data in a Pinecone vector store. Moreover, LlamaIndex simplifies querying the index and makes communicating with Google's Gemini Pro model easy.
Full source code
Beautiful! You made it this far which means you like the content which in turn means that you MUST subscribe to the blog. It's free and you can unsubscribe anytime.
LlamaIndex is a powerful data framework that supports a wide range of large language models. As you've seen, it is incredibly simple to extract and store your custom data in a Pinecone vector store.
Moreover, LlamaIndex simplifies querying your index and makes communicating with Google's Gemini Pro model easy.
If you have any questions, suggestions, or anything else please drop a comment below. Also, follow me on X (Twitter) for more updates.
More from Getting Started with AI
- An introduction to RAG tools and frameworks: Haystack, LangChain, and LlamaIndex
- The beginner's guide to start using the new Gemini Pro models in Google AI Studio
- Big news: Gemini the GPT rival from Google DeepMind is finally here
- LlamaIndex: A closer look into storage customization, persisting and loading data
- LlamaIndex: Using data connectors to build a custom ChatGPT for private documents
- Introduction to augmenting LLMs with private data using LlamaIndex
- From traditional SQL to vector databases in the age of artificial intelligence