Skip to Content

The efficient use of tokens for multi-agent systems

Jonathan Aston
Oct 1, 2024

What are multi-agent systems?

Multi-agent systems with AI are those systems where autonomous agents are equipped with AI capabilities, working together to achieve the desired outcome. An agent in this context can be as generic as an entity which is acting on another entity’s behalf. In multi-agent AI systems, AI agents (bots) cooperate to help achieve the goals of people owning processes and tasks.

How do tokens work?

Put simply, a token is a piece of a word or text that can be used as input for a large language model (LLM) like ChatGPT. All passages of text are broken into tokens, but not every word is a token, some are broken down further. For example, the word “chat” is one token, but a longer word like “tokenization” might be broken into multiple tokens.

When you input text into ChatGPT, for example, the text is converted into a sequence of tokens in a process called tokenization. The model processes these tokens and generates a sequence of output tokens, which are then converted back into text.

Why does understanding tokenization matter?

The reason why it matters is because models have token limits, and models also have pricing determined by the number of tokens in the input and output.

Models like GPT-3.5-turbo have a maximum number of tokens they can process in a single request. For instance, GPT-3.5-turbo can handle up to 4096 tokens which is around 3,000 words (both input and output combined). These limits are put on the models to ensure they work effectively and can respond quickly.

The number of tokens processed affects the computational resources required and the cost of using the mode, so the more tokens, the more cost.

Token limits are one of the reasons why retrieval-augmented generation (RAG) involves the use of traditional search tools to help subset the relevant information into the prompt to enable vast quantities of information to be processed in efficient ways.

What are the costs?

The costs on paper may not seem high, but they can mount up especially in multi-agent systems. Let’s explore the costs a little.

Here are the costs of two common models offered by OpenAI. You can see that there are substantial differences in cost between the two models (prices correct August 2024).

ModelPricingPricing with Batch API
GPT-3.5-turbo-0125  $0.50 / 1M input tokens $1.50 / 1M output tokens$0.25 / 1M input tokens $0.75 / 1M output tokens
GPT-4o$5.00 / 1M input tokens $15.00 / 1M output tokens$2.50 / 1M input tokens $7.50 / 1M output tokens
https://openai.com/api/pricing

What do tokens look like in reality and how do they count up?

Here is a single-agent example with a total of 13 input tokens and 15 output tokens, which is not many:

Prompt: “I think we should replace the sensor immediately to avoid any issues.” (13 tokens)

Answer: “I agree, let’s replace the sensor right away to prevent any problems.” (15 tokens)

Here is a multi-agent example with a total of 1005 input tokens (approximately 77 times as many as the single-agent example because of the need to input the responses of all agents to all agents each round) and 153 output tokens (around 10 times as many as the single-agent example):

Round 1 prompt: “I think we should replace the sensor immediately to avoid any issues.” (13 tokens × 5 agents = 65 tokens)

Round 1 answers: Initial opinions

  • Agent 1: “I think we should replace the sensor immediately to avoid any issues.” (13 tokens)
  • Agent 2: “I suggest we first run a diagnostic test to determine the fault.” (13 tokens)
  • Agent 3: “We might need to recalibrate the sensor instead of replacing it.” (11 tokens)
  • Agent 4: “Let’s review the sensor’s error logs before deciding.” (10 tokens)
  • Agent 5: “Perhaps the issue is with the wiring, not the sensor itself.” (11 tokens)

Round 2 prompt: Includes all the past answers from round 1 and the initial prompt (71 tokens × 5 agents = 355 tokens).

Round 2 answers: Arguments and counterarguments

  • Agent 1: “Replacing the sensor ensures we eliminate the fault quickly.” (10 tokens)
  • Agent 2: “Diagnostics can reveal if the sensor really needs replacement.” (10 tokens)
  • Agent 3: “Recalibration is cost-effective and might fix the issue.” (9 tokens)
  • Agent 4: “Error logs can provide insights into the root cause.” (8 tokens)
  • Agent 5: “Checking wiring first can save time if it’s not the sensor.” (9 tokens)

Round 3 prompt: Includes all the past answers from rounds 1 and 2 and the initial prompt (117 tokens × 5 agents = 585 tokens).

Round 3 answers: Further discussion and attempt to reach consensus.

  • Agent 1: “If diagnostics show a major fault, we can then replace it.” (12 tokens)
  • Agent 2: “Agreed, diagnostics first, then decide the next steps.” (9 tokens)
  • Agent 3: “Can we perform diagnostics and recalibration simultaneously?” (10 tokens)
  • Agent 4: “Yes, and while doing so, review the error logs.” (9 tokens)
  • Agent 5: “And inspect the wiring as a part of the diagnostics.” (9 tokens)

Conclusions and the challenges of scale

We can see that the multi-agent example uses 1005 input tokens and 153 output tokens, which is a huge increase on the 13 input tokens and 15 output tokens of the single agent. While the value of the multi-agent system may be worth the extra cost for the additional value in response, we can see that costs can increase quickly when they are determined by token volume, and therefore, the architectural design of multi-agent systems should consider this spiralling token use and cost. This becomes a much bigger issue when we have proactive agents seeking out work and having discussions with each other frequently.

If we have one sensor and we need to make a decision every one minute, what do these costs look like?  

  • Single-agent (tokens per day and cost):
    • Input tokens: 13 × 60 × 24 = 18,720 (GPT-4o non-API 18,720 * ($5 / 1,000,000) = $0.09)
    • Output tokens: 15 × 60 × 24 = 21,600 (GPT-4o non-API 21,600 * ($15 / 1,000,000) = $0.32)
  • Multi-agent (tokens per day and cost):
    • Input tokens: 1005 × 60 × 24 = 1,447,200 (GPT-4o non-API 1,447,200 * ($5 / 1,000,000) = $7.24)
    • Output tokens: 153 × 60 × 24 = 220,320 (GPT-4o non-API 220,320 * ($15 / 1,000,000) = $3.30)

So, we see the cost per day of these two systems being $0.41 for the single-agent and $10.54 for the multi-agent system which is approximately 26 times more expensive. The difference in cost becomes even greater when viewed by week or month and the number of sensors may well push volume and costs up even further. So, do we abandon multi-agent systems, or can we mitigate these spiralling token costs?

Top tips

  • Use GPT3.5 turbo instead of GPT4. This is a good option for simple tasks, and we already see that costs can be much lower for simpler models.
  • Use a model hosted by someone for free. This can be offered from services such as Groq.
  • Use a local model such as LLaMA 7B. This involves downloading a model and running it locally so the compute costs are on your own infrastructure and, therefore, can be managed yourself and could be cheaper. However, simple/smaller LLMs are those available for download today, so a compromise on performance might have to be made for this option.
  • Use token limits. A lot of LLMs have settings for limiting the output tokens of an LLM and this can have a significant downstream effect especially if you are giving the entire dialogue to the next agent in a multi-agent system.
  • Be careful when you use applications like CrewAI, as they employ a quality and context self-checking and updating mechanism that updates the context and runs a query to check if the agent has answered the question properly. This can double all the token use in the system as well.

While multi-agent systems can have a lot of value, there is often a cost to the increase in value and performance. Our conclusion is that there is a great need for good architectural design in multi-agent systems for them to be cost-effective.

About Generative AI Lab:

We are the Generative AI Lab, expert partners that help you confidently visualize and pursue a better, sustainable, and trusted AI-enabled future. We do this by understanding, pre-empting, and harnessing emerging trends and technologies. Ultimately, making possible trustworthy and reliable AI that triggers your imagination, enhances your productivity, and increases your efficiency. We will support you with the business challenges you know about and the emerging ones you will need to know to succeed in the future. One of our three key focus areas is multi-agent systems, alongside small language models (SLM) and hybridAI. This blog is part of a series of blogs, Points of View (POVs) and demos around multi-agency to start a conversation about how multi-agency will impact us in the future. For more information on the AI Lab and more of the work we have done visit this page: AI Lab.

Meet the author

Jonathan Aston

Data Scientist, AI Lab, Capgemini’s Insights & Data
Jonathan Aston specialized in behavioral ecology before transitioning to a career in data science. He has been actively engaged in the fields of data science and artificial intelligence (AI) since the mid-2010s. Jonathan possesses extensive experience in both the public and private sectors, where he has successfully delivered solutions to address critical business challenges. His expertise encompasses a range of well-known and custom statistical, AI, and machine learning techniques.