Skip to main content
All CollectionsGeneral
Understanding Token Consumption

Understanding Token Consumption

This article explains what tokens are and how token consumption works when Kleo’s AI creates content for you.

Updated over a week ago

If you've used Kleo to generate content, you may have come across the term “token.” We’ll break down what a token means in simple terms, and show how both your input and the AI’s output contribute to the total tokens used. By the end, you’ll have a clear sense of how your input length and Kleo’s response length affect token usage.

What is a Token?

Tokens are the basic building blocks of text that Kleo’s AI processes. In simple terms, a token is a small piece of text – it could be a whole word, part of a word, or just a few characters​. For example, the sentence “AI is amazing” can be broken into three tokens: “AI,” “ is,” and “amazing”​. Even spaces and punctuation count as tokens – the AI takes those into account when processing text​. In other words, tokens don’t always align exactly with whole words; a long word might be split into multiple tokens, and a short word or a single character can be a token by itself.

Here are a few rough rules of thumb to understand tokens in everyday language​:

  • 1 token ≈ 4 characters of English text (including letters, spaces, or punctuation).

  • 1 token ≈ 0.75 words (so 100 tokens is roughly 75 English words)​.

  • Common words or short words often correspond to one token, while a long or complex word might be broken into several tokens​. For instance, the long word “supercalifragilisticexpialidocious” would be split into multiple tokens, but a short word like “AI” would be just one token.

Because tokens are a more consistent unit than words (which can vary greatly in length across languages and phrases), they are used to measure how much text the AI is handling. Large language models like the one behind Kleo break down all text into tokens in order to understand your input and generate a response. You can think of tokens as the “language units” the AI works with under the hood – the AI reads and writes text one token at a time.

How Does Token Usage Work in Kleo?

Whenever you use Kleo’s AI, tokens are consumed in two ways: through your prompt (input) and the completion (output) that the AI generates. In each interaction, total token consumption = tokens in your input + tokens in the AI’s response. Here’s how it breaks down:

  1. Your Input (Prompt) Tokens: This is the text you type or provide to Kleo. When you submit a request or question, the AI converts your input into tokens before processing it. For example, if you ask “How does AI work?” the system might break that into tokens like “How”, “ does”, “ AI”, and “ work?”​. A short question like this doesn’t use many tokens – just a handful in this case.

  2. The AI’s Response (Completion) Tokens: After analyzing your prompt, Kleo’s AI generates an answer token-by-token. The output text is also composed of tokens. For instance, if the answer to the above question is “AI works by analyzing data.”, that reply would be tokenized into pieces like “AI”, “ works”, “ by”, “ analyzing”, “ data.”. Just like the input, each part of the output counts toward the token total.

Because both your query and the AI’s answer use up tokens, longer inputs or more detailed outputs will increase the token count for a single request. In practical terms, if you send a lengthy prompt, it uses more tokens, and if you request a long or complex answer, the AI will produce more tokens to fulfill that. The total tokens “consumed” for that interaction is the sum of the prompt tokens and the completion tokens.

Important: The AI model behind Kleo has a limit on how many tokens it can handle in a single request (this limit is often called the context window). This limit is shared between your input and the AI’s output​. For example, if the model could process up to about 4,000 tokens in one go and your prompt used 3,500 of those, the response could only use up to ~500 tokens before hitting the limit. In normal usage you won’t usually hit this ceiling, but extremely long prompts or responses may be truncated to stay within the allowed number of tokens.

Did this answer your question?