All things GPT in the age of AI with ChatGPT, AutoGPT, and AgentGPT

In this post, we take a look at what ChatGPT is, its potential, and how agents could be used in the future to perform a wide range of tasks.

All things GPT in the age of AI with ChatGPT, AutoGPT, and AgentGPT
We have now entered into the GPT and Agents era

Unless you've been living under a rock for the past couple of months, you must have heard about ChatGPT. But if you never did, I'll ask ChatGPT to introduce itself:

Hello! I am ChatGPT, a large language model developed by OpenAI, based on the GPT-3.5 architecture. I am designed to understand natural language input and generate human-like responses to a wide range of prompts and questions. Essentially, I am an AI-powered conversational agent that can chat with users to answer their queries, provide information, and assist them in various tasks. I have been trained on a massive amount of text data from the internet and other sources, which allows me to understand and generate responses to a diverse set of topics.

This is completely written by ChatGPT, and the first thing you notice is its strikingly human-like writing style which gives us a glimpse of its copywriting capability.

Just to be clear, ChatGPT itself is just an interface sitting on top of a Large Language Model (LLM) called GPT. This model takes in a text input and outputs a well-structured response. It does this by predicting what is the best next word as it generates the output. (We'll dive deeper into the technicalities in another post)

GPT and the Internet

The model itself is not connected to the internet directly, but it is trained using data up until September 2021. So if you ask it about anything after that date, it'll just tell you that it's stuck in 2021:

I'm sorry, but I cannot answer this question as it is currently the year 2021, and I am not capable of predicting future events or developments. As an AI language model, my knowledge cutoff is in September 2021, so any events that occur after that time are unknown to me.

Some people argue that since the model is not connected to the internet, it has limited capabilities and is unable to go outside of its "box". But, recently OpenAI, the company behind the technology, announced that they are adding plugin support to ChatGPT. Essentially, plugins will connect ChatGPT to specialized services. If you think about it, ChatGPT already has all it needs to understand and interpret human language, so it can already act as a brain to drive connected tools and services around it. If we take a look at the official website we can already see plugins from Expedia, InstaCart, OpenTable, and others.

Take a look at Expedia's tweet showcasing the plugin:

At this point, it's safe to say goodbye to your travel agent. Actually, you will say goodbye to so many other assistive-type services that we know today. This is just one example, but these models and by extension the tools around them will take over most, if not all jobs eventually.

Experimentation Time

Ok, so you're unable to decide on where to eat out on Friday with your spouse. In the next coming months, we'll probably be able to send this to ChatGPT: "Book us a nice table for two this Friday at 6:00 PM. Choose any cuisine, it doesn't matter. Send the confirmation by email." - And it'll be done, just like that. We're not there yet, but it's really just a matter of time.

Let's kick it up a notch and ask ChatGPT to interpret the intent of our prompt:

ChatGPT explaining the intent of the prompt

OK, impressive but still, how will it make the reservation? Well, let's ask it to convert the intent to something that another service can understand, typically a JSON object.

💡
If you're not familiar with the term, a JSON object is a popular format that is used by computers to talk to each other. Think of it like a common computer language that enables the exchange of information between different websites, mobile apps, and other services.

So let's go ahead and ask ChatGPT to convert our intent to a JSON object:

ChatGPT converting a chat message to a JSON like a champ!

I then asked it to add some actions (or steps) that can be taken in order to achieve the goal of making a reservation. It said "Certainly!" and adjusted the JSON to this:

{
  "intent": "book_reservation",
  "steps": [
    {
      "description": "Search for available restaurants",
      "service": "restaurant_api",
      "action": "search",
      "parameters": {
        "date": "2023-05-05",
        "time": "18:00:00",
        "party_size": 2,
        "cuisine": "any"
      }
    },
    {
      "description": "Select a restaurant",
      "service": "restaurant_api",
      "action": "select",
      "parameters": {
        "restaurant_id": "<restaurant_id>"
      }
    },
    {
      "description": "Book a table",
      "service": "reservation_api",
      "action": "book",
      "parameters": {
        "date": "2023-05-05",
        "time": "18:00:00",
        "party_size": 2,
        "restaurant_id": "<restaurant_id>"
      }
    },
    {
      "description": "Send confirmation email",
      "service": "email_api",
      "action": "send",
      "parameters": {
        "to": "example@example.com",
        "subject": "Reservation Confirmation",
        "body": "Your reservation for 2 at <restaurant_name> on <date> at <time> has been confirmed."
      }
    }
  ]
}

For the non-technical people reading this, it basically gave us four steps to complete to achieve our goal:

  1. Search for available restaurants
  2. Select a restaurant
  3. Book a table
  4. Send confirmation email

For each step above, it provided an action and a service. Basically, the service will represent a plugin and the action will represent what that plugin does. It took all of our requirements from our simple prompt, analyzed it, and transformed it into something computers can work with, basically bridging the gap between our human language and computer language, there are no more barriers.

This is huge! Today it's a chat window, tomorrow it's an implant in your brain that can process your thoughts and take physical actions on your behalf. Like turning on a light bulb by just thinking about it, or maybe opening the garage door, or sending an email using not Outlook or Gmail, but your mind...

💡
You might think that this scenario might be light years away, but you'd be wrong. Elon Musk's Neuralink brain implants were recently approved for human trials. It's almost impossible not to see these two technologies working together at some point in the future.

All things GPT

Recently, I came across a few noteworthy projects, namely AutoGPT and AgentGPT, which implement a basic version of the concept discussed in the article. These projects take an end goal as input, devise a plan to accomplish it, and execute each step to the best of their ability. You won't be able to use them to generate a billion dollars yet.

Steps are executed using plugins, or third-party API integrations. You can check them out right now if you'd like. For AutoGPT, you'll some technical knowledge since it does not have a web interface.

AgentGPT on the other hand offers a web interface but requires you to sign up to try the service.

Conclusion

We're at a very critical turning point for humanity, we can now interact with computers in a completely natural way, using our human language. The power these tools have today will exponentially increase in the future as more and more people, businesses, and governments adopt them. This will definitely have a positive impact on our lives, but at what cost?