How to Integrate and Handle LLM Memory using LangChain

LLMs are stateless, meaning they do not have memory that lets them keep track of conversations. However, using LangChain we'll see how to integrate and manage memory easily.

How to Integrate and Handle LLM Memory using LangChain
LangChain LLM Memory

Introduction

So, you've been playing around with LangChain and LLMs or you're in the process of building your AI-powered app. Awesome, if you're looking to find out how you can manage LLM memory in LangChain, this post is about just that.

πŸ‘‹
New to LangChain? Start with this introductory post first. It'll give you a great overview of everything you need to know before diving in. Come back when you're done!

The Need for Memory

Suppose you're talking to your friend about a movie they recently saw. However, your friend does not have a memory. He or she can only respond to a question without remembering what you were talking about.

Here's the conversation:

You: "Hey, did you catch a movie last night?"
Your Friend: "Oh, absolutely! best movie I've seen."
You: "What was it about?"
Your Friend: "What was what about?"

You get the picture. Your friend can't maintain a conversation since he or she does not have a memory.

(My wife says I don't have a memory as well...) πŸ‘€

Since LLMs are stateless by default, they are completely unaware of any context or conversations you've previously had with them so they do not remember what you've been talking about. That is why it's critical to integrate some kind of way to tell the Large Language Model with each new prompt what you've been talking about so that it can maintain an actual conversation with you.

LLM Memory in LangChain

Code below is written using Python, you'll just need to be a little familiar with the syntax.

Okay, let's get started.

Loading OpenAI API Key

We're going to load our API key from the .env file, which is the recommended way of doing it. You can of course just hard code it in your application directly.

  1. In your project directory type touch .env
  2. Open the .env file and type OPENAI_API_KEY="..." then save.
  3. Install python-dotenv from terminal using pip by running: pip install python-dotenv
  4. Create app.py file and paste the following:
import os

from dotenv import load_dotenv

# Load environment variables from the .env file
load_dotenv()

Importing Modules

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
  • ChatOpenAI is used to create a new model given some arguments.
  • ConversationChain is used to have a conversation and load context from memory.
  • ConversationBufferMemory is used to store conversation memory.

Conversing with the Model

llm = ChatOpenAI(temperature=0.2)
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory, verbose=False)

We've set up our llm using default OpenAI settings. The temperature parameter here defines the accuracy of the predictions, the less the better.

Okay, now let's have the same conversation that we've had with our friend from the paragraph above:

print(conversation.predict(input="Hey, did you catch a movie last night?"))

>>> Yes, I watched the Batman movie last night. It was quite entertaining and action-packed.

Testing LangChain Memory Support

Great answer. Okay, let's follow up with another related question now:

print(conversation.predict(input="What was it about?"))

>>> The Batman movie was about Batman's quest to protect Gotham City from a new villain.

Testing LangChain Memory Support

As you can see, the LLM was able to respond to a follow-up question. How did this happen? Well let's take a look at this line of code again:

conversation = ConversationChain(llm=llm, memory=memory, verbose=False)

You can see here, we're passing in a ConversationBufferMemory() object to the ConversationChain's memory parameter. This enables LangChain to keep track of each transaction (or query) that we've sent to the LLM by storing it in a list.

πŸ’‘
Since LangChain is storing all queries in memory using ConversationBufferMemory(), it is able to send the full context with each subsequent query.

Manipulating Memory

Now let's assume that we want to manipulate the memory variables. Luckily LangChain provides a few methods that can help us do that.

We're going to look at these ConversationBufferMemory methods, specifically:

  • load_memory_variables
  • save_context
  • clear

memory.load_memory_variables

By calling the memory.load_memory_variables({}) method, we'll be able to take a closer look at exactly what the memory contains. This should include all the inputs and outputs. Meaning everything that we sent to the LLM as well as all the LLM's responses.

Let's do that below:

print(memory.load_memory_variables({}))

Here's the history:

{'history': "Human: Hey, did you catch a movie last night?\nAI: Yes, I watched the Batman movie last night. It was quite entertaining and action-packed.\nHuman: What was it about?\nAI: The Batman movie was about Batman's quest to protect Gotham City from a new villain."}

memory.save_context

Now let's assume you want to manually add context to the memory. We can make use of the save_context method like this:

memory.save_context({"input": "Assume Batman was actually a chicken."}, { "output": "OK" })

Since we manually added context into the memory, LangChain will append the new information to the context and pass this information along with the conversation history to the LLM.

To test this, we can run the predict method and wait for the answer:

print(conversation.predict(input="Is Batman a human?"))

>>> No, Batman is not a human. He is a chicken in this scenario.

Awesome. Sorry Batman. πŸ”


memory.clear

Calling the clear method will just wipe the context and clear all the memory contents.


Managing Memory Size

You might be wondering, what if a user is having a very long conversation with the LLM, what happens in that case? The context will become too large at some point. This will result in more expensive queries for you. A way to deal with this is to use the ConversationBufferWindowMemory instead of the ConversationBufferMemory.

From the official docs:

ConversationBufferWindowMemory keeps a list of the interactions of the conversation over time. It only uses the last K interactions. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

That's exactly what we need, let's take a look at how we can implement this:

# Import from LangChain memory
from langchain.memory import ConversationBufferWindowMemory

# Instantiate memory object from `ConversationBufferWindowMemory` instead of `ConversationBufferMemory`


llm = ChatOpenAI(temperature=0.2)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(llm=llm, memory=memory, verbose=False)

In this example, the context will only hold one interaction (hence the k=1 parameter). Meaning if we were to take our previous example, LangChain would only have information about the last query.

Of course, you can manipulate the k value to suit your specific needs.

Wrapping Up

That's all folks! In this post, we've seen how to work with memory using LangChain and how we can manipulate and customize the context used to query the LLM.

If you've enjoyed this post, please go ahead and subscribe so you get the latest information before anyone else does! Once you subscribe, you'll get notified by Email as soon as a new post is available so you're always up to date.

⚠️
There are many more memory features that LangChain provides, for a complete list please go through the official docs here.

Membership is completely free and just requires your email and name. Most importantly, I'll never send you spam.

Thanks for reading!


Further readings

More from Getting Started with AI

More from the Web

Frequently Asked Questions

How does memory work in LangChain?

LangChain stores human input and AI output in a list known as memory. Since LLMs are stateless and are only aware of the latest query, LangChain sends the context (which means the full conversation history) with every new query to the LLM.

Does LangChain manage memory out of the box?

Yes, it does. LangChain takes care of memory out of the box. It keeps track of human input and AI output so that conversations remain relevant.

Can I customize memory using LangChain?

Yes, it is. You can easily customize memory storage by using the load_memory_variables, save_context, and clear methods from the ConversationMemoryBuffer class. You can also customize the buffer size by providing a k=int value to the ConversationBufferWindowMemory class.