What is the difference between fine-tuning and vector embeddings

In this post, we're going to define what fine-tuning and vector embeddings are and look at which approach could be better suited for your specific use case.

What is the difference between fine-tuning and vector embeddings
Fine-tuning vs. Vector Embeddings: Which one to choose?
This is an introductory post about the difference between fine-tuning and vector embeddings in the context of natural language processing (NLP) applications. If you want to go straight to the conclusion, just scroll to the "The Big Question" part of this post.


This question pops up a lot, especially when you aim to enhance the output of an LLM, whether by providing it with additional information beyond what it already knows, or when you intend to make it more specialized in one or more specific areas.

Fine-tuning vs. Embeddings

  • Which one is better?
  • Which one should I choose?
  • Can I use them together?

Well to answer these questions, we're going start by defining what fine-tuning and embeddings are, then look at some common use cases so that we have a better top-level understanding of the ideas and concepts behind these terms.

What is Fine-tuning?

Large Language Models (or LLMs for short) are trained on a vast amount of data, and while models (like GPT-4) are able to generate great output to your queries they sometimes do not respond as you'd want them to.

Let's assume you want ChatGPT to respond as if it were "you". And let's also assume that you're a young sarcastic person and into tech.

In order to do that, we'll need to improve our query by including this information so that the LLM knows how to respond.

Let's see how.

Writing a Better Prompt

To get what we want, we can usually improve our prompt by doing at least the below:

  1. Ask the model to answer in a specific tone.
  2. Give the model a persona (such as: "You are a software engineer") 

Take this query for example: "Is AI going to take over the world?". If we should submit it as-is, ChatGPT will respond as it always does. To tweak it to behave like we want it to, we'll change the query to become:

"You are a sarcastic young person who likes software, and technology. You tend to make jokes in your answers. Respond to this: Is AI going to take over the world".

To which ChatGPT answered:

Oh, definitely! World domination by calculators is just around the corner. 🤖😉

Ok, that's not the greatest example in the world, but as you can see this is definitely a different answer than if we were to query ChatGPT without all the added context.

Fine-Tuning Process

Basically, fine-tuning teaches the model to always answer like you. Just like we did in the previous example, except we won't provide it with the context in each query but we'll instead train it using examples of how we want it to respond so it can always do that by itself.

Fine-tuning involves the following steps:

  1. Preparing and uploading training data
  2. Training a new fine-tuned model
  3. Using the fine-tuned model
Fine-tuning takes considerable time and effort. That is why, it is generally recommended to improve results using prompt engineering and then implement fine-tuning if the model is unable to yield the required results.

Fine-Tuning Use Cases

See the official OpenAI documentation:

Some common use cases where fine-tuning can improve results:

Setting the style, tone, format, or other qualitative aspectsImproving reliability at producing a desired outputCorrecting failures to follow complex promptsHandling many edge cases in specific waysPerforming a new skill or task that’s hard to articulate in a prompt

What are Vector Embeddings?

Vector Embeddings are numerical vector representation of words. They are not only limited to text but can also represent images, videos, and other types of data.

The main benefits of using Vector Embeddings, are:

  1. Mathematical representation of our data.
  2. Relationships within our data are preserved.
  3. Data is compressed.

Great, next we're going to take a high-level look at how embeddings are created, and used in the context of Natural Language Processing (or NLP).

Embedding Models

Embedding Models are specialized models that are able to convert a collection of data into vector representations (or Embeddings). They take in unstructured data and then convert it to a numerical representation.

So for example the vectorized version of the word "hello" is: [0.2, 0.5, -0.1]

The added value of this, is that we'll be able to quickly look up similarities with other vector embeddings. Some common mathematical measures for doing that are the cosine similarity and the dot product.

Embeddings are key components in various NLP applications that range from sentiment analysis to text classification and more.

Popular embedding models include OpenAI Embeddings, and Word2vec.

Embedding Databases

Just like there are SQL and NoSQL databases that handle storing data. There are specialized embedding databases that make working with embeddings efficient. Some popular embedding databases include:

Vector Embeddings Use Cases

  • Search engines recommend similar queries to yours that may also be relevant.
  • Sentiment analysis uses vector embeddings to determine the polarity of a given text (positive, negative, or neutral).
  • Q&A over documents such can easily look up similar content based on your query input.

LlamaIndex and LangChain

LlamaIndex or LangChain are tools that enable easy integration between a Large Language Model (such as GPT) and your private data, such as PDF, Databases, APIs, and more. They achieve this by using embedding models to build embeddings from your data sources during the ingestion phase, and then look for similarities during the query phase which retrieves related information that is then finally passed to the LLM as context to generate a specific answer to a question.

This process makes vector embeddings ideal in cases where you'd want LLMs to generate specific answers to a question from your own data sources.

If you've ever stumbled upon an AI tool that lets you chat with your documents, it's definitely using Vector Embeddings.

Further Reading

  • For a complete list of posts about LangChain, please check out this page.
  • For a complete list of posts about LlamaIndex, please check out this page.

The Big Question

So, which one should you choose?

If you're looking for exact answers to your queries go with Vector Embeddings. However, if you're looking to make a model respond in a certain way, or style a better approach would be to fine-tune that model to your liking.

Keep in mind that vector embeddings and fine-tuning are not mutually exclusive; in fact, they are often used together in various natural language processing (NLP) applications.

I hope this clears up any confusion that you may have had and please do share your comments or questions below and I'll be more than happy to respond!

I'm also posting about similar topics on X (formally Twitter). If you're on the platform, follow me for more!