To interact with traditional IT systems an LLM would have to consume and produce data in a strictly defined format. LLMs typically have no issue ingesting all sorts of (mixed) data – not only do the LLMs accept natural language, these languages may be mixed and contain spelling errors! An LLM will typically produce natural language answers in the language the model was trained or finetuned in. Multilingual models may produce answers in various languages, and sometimes even follow instructions to output the answers in a particular natural language.
Getting LLMs to output data in a structured format is not so easy however. While LLMs may be finetuned to output data in certain formats, there are no guarantees that output conforms to a certain format (i.e. JSON, CSV, YAML) or schema (JSON-Schema for instance).
There is however a neat trick to force such conformance: output token biasing. To understand this, it is crucial to first understand how LLMs generate their output anyway. An LLM produces output in the form of tokens – typically short or common words or fragments of words. Each token that the user sees as output is sampled from a distribution of tokens. In each iteration, the LLM produces, for each token it knows about, a probability of that token being the most likely next token. From this distribution, a token is then sampled, which is also fed back into the model for the next generation. There are various sampling approaches, and the method chosen has great impact on the ‘creativity’ of the output. A sampler that more frequently sample a token that does not have the highest probability will cause the output to be more ‘wild’ in a sense.
In order to force a certain data format and schema, the distribution for the next token as generated by the LLM can be influenced. For instance, if we want to generate JSON and the output thus far is {"status":
, we know that the next token must be (the start of) a valid JSON value and cannot be e.g. a }
character. The probability for all tokens that would be disallowed in this situation according to the JSON grammar can be set to zero, which forces the sampler to choose a valid token.
It is even possible to take a schema into account. In the previous example, if we know the only valid values for the “status” field are "ok"
or "error"
, we can simply allow any tokens next whose prefix matches either of these values: "
would be allowed, and so would "err
, but not "err"
or "foo
.
I wrote an implementation of a biaser in Rust using JSON Schema, a language to specify a schema for a JSON object. The schema for the above example would look like e.g.:
{
"type": "object",
"additionalProperties": false,
"required": ["status"],
"properties": {
"status": {
"type": "string",
"enum": ["ok", "err"]
}
}
}
The implementation can be found here in the poly_bias crate. The crate contains a parser state machine that is updated for each token generated.