Parsing Outputs

Under agent.parsers we implement some interfaces to simplify parsing outputs from LLMs. The simplest example, SimpleOutputParser class is designed to parse structured text outputs based on predefined keys.

Overview

The SimpleOutputParser class extends the OutputParser class, providing methods to format instructions and parse raw outputs into a structured dictionary. It is particularly useful for parsing outputs where data is organized in a “key: value” format.

Formatting Instructions

The formatting_instructions method provides a clear guideline on how the output should be structured for the parser to understand. These instructions can be fed into the LLM. It dynamically constructs instructions based on the expected keys provided during the class initialization.

Example:

from agent.parsers import SimpleOutputParser

parser = SimpleOutputParser(["Thought", "Answer"])
print(parser.formatting_instructions())

This would output:

Strictly answer in the format:
Thought: <thought>
Answer: <answer>

We recommend passing this into your jinja templates if you wish to automate the process of changing parsers (e.g., using the JsonOutputParser)

Parsing Raw Output

The parse_raw_output method takes a raw string output and extracts information based on the predefined keys. It uses regular expressions to find matches and constructs a dictionary with keys normalized to lowercase.

Example Usage:

raw_output = """
Name: John Doe
Age: 30
Location: New York
"""

parser = SimpleOutputParser(["Name", "Age", "Location"])
parsed_data = parser.parse_raw_output(raw_output)
print(parsed_data)

This would result in:

{
    'name': 'John Doe',
    'age': '30',
    'location': 'New York'
}

Note

When keys are repeated, SimpleOutputParser will join the values with a newline. For example an output of: “Thought: XX\nThought: YY” will result in a parsed response {“thought”: “XX\nYY”}.

Implementation Details

The parsing is achieved using the re.findall function from the Python standard library, with a dynamically constructed pattern based on the expected_keys provided during initialization of the parser. This allows for flexible parsing of various output formats while maintaining a consistent dictionary structure for the results.

Note

The keys in the resulting dictionary are normalized to lowercase and stripped of trailing characters like “:” and spaces for consistency.

Conclusion

The SimpleOutputParser class offers a straightforward way to parse and structure raw text outputs based on predefined keys. By following the formatting instructions and utilizing the parse_raw_output method, developers can easily extract and manipulate data from structured text outputs in their applications.