OpenAI — Best Practices of Using Tokens

Tony
6 min readApr 5, 2024

What is OpenAI Token

In the realm of OpenAI’s advanced language models, such as GPT-3.5 and GPT-4, the term “token” refers to a sequence of characters that commonly appear together in a text. These models are designed to understand and predict the statistical relationships between these tokens.

The process of breaking down text into tokens can vary between different models. For instance, GPT-3.5 and GPT-4 utilize a different tokenization process compared to their predecessors, resulting in different tokens for the same input text.

As a general guideline, one token is approximately equivalent to four characters in English text, which is roughly three-quarters of a word. Therefore, 100 tokens would be approximately equal to 75 words.

For example, let’s consider the sentence “OpenAI is great!”. In this sentence, the tokens could be broken down as follows:

[“Open”, “AI”, “ is”, “ great”, “!”]

Each of these is considered a token. The exact breakdown can vary depending on the specific tokenization process used by the model. For instance, some models might treat “OpenAI” as a…

--

--