# BalatroLLM

> LLM-powered bot that plays Balatro using strategic decision making

BalatroLLM is a bot that uses Large Language Models (LLMs) to play [Balatro](https://www.playbalatro.com/), the popular roguelike poker deck-building game.
The bot analyzes game states, makes strategic decisions, and executes actions through the [BalatroBot](https://github.com/coder/balatrobot) client.


# Documentation

# Setup

This guide will help you install and configure BalatroLLM.

## Prerequisites

- **[uv](https://docs.astral.sh/uv/)**: for managing Python environment and dependencies
- **[Balatro Game](https://www.playbalatro.com/)**: You need a copy of Balatro installed
- **[BalatroBot](https://github.com/coder/balatrobot)**: The underlying framework for Balatro automation
- **Access to LLM model**: exposing an OpenAI-compatible chat/completion API

BalatroBot Setup

Setting up Balatro with the BalatroBot mod requires careful configuration. Please refer to the [BalatroBot](https://github.com/coder/balatrobot) documentation and follow the instructions step by step. Ensure that BalatroBot is installed and running before proceeding with the BalatroLLM installation.

## Installation

1. Clone the repository

```bash
git clone --depth 1 https://github.com/coder/balatrollm.git
cd balatrollm
```

2. Create environment and install dependencies

```bash
uv sync --no-dev
```

When running `uv sync`, `uv` automatically downloads the required Python version, creates a new environment at `.venv`, and installs the project dependencies.

3. Activate environment

```bash
source .venv/bin/activate
```

4. Test that the new commands are available

```bash
balatrollm --help
balatrobench --help
```

Auto venv activation & Environment Variables

You can use [direnv](https://direnv.net/) to automatically activate the environment when you enter the project directory. The `.envrc.example` file contains an example configuration for direnv.

## Provider Configuration

You need to configure your chosen provider. We recommend configuring the provider through environment variables using `.envrc` (see `.envrc.example`)

- `BALATROLLM_BASE_URL`: API base URL
- `BALATROLLM_API_KEY`: API key for LLM provider

Now you should be able to run

```bash
balatrollm --list-models
```

to see the available models. You can now set the model environment variable:

- `BALATROLLM_MODEL`: Model to use

CLI precedence

All `BALATROLLM_*` environment variables have a corresponding CLI argument. The environment variables are used as defaults when running `balatrollm`. CLI arguments take precedence over the corresponding environment variable. For example, you can set a default model with `BALATROLLM_MODEL` but use another one: `balatrollm --model "..."`.

# Usage

Learn how to run BalatroLLM, configure strategies, and customize gameplay parameters.

## Simple Run

Assuming that you have followed the [setup guide](../setup/) and configured the provider, you can run BalatroLLM with the following steps:

1. Start Balatro with the BalatroBot mod using the utility script:

```bash
bash balatro.sh
```

2. Start BalatroLLM:

```bash
balatrollm --model openai/gpt-oss-20b
```

3. Watch the gameplay!

## Advanced Usage

### BalatroBot

The `BALATROBOT_*` variables are used to configure Balatro and BalatroBot. It is recommended to set the variables that you don't change often in the `.envrc`.

- `BALATROBOT_HOST`: The host to run the server on. Defaults to `127.0.0.1`.
- `BALATROBOT_PORT`: The port to run the server on. Defaults to `12346`.
- `BALATROBOT_HEADLESS`: Avoid rendering the game on the screen. Set to `1` to enable.
- `BALATROBOT_FAST`: Faster animations and gameplay. Set to `1` to enable.
- `BALATROBOT_AUDIO`: Enable audio. Set to `1` to enable.
- `BALATROBOT_RENDER_ON_API`: Render the frame only on an API call.

These are the environment variables set by `balatro.sh` using its flags. For example, to run the game in fast mode, you can run: `bash balatro.sh --fast`.

Usually, you don't need to set these variables manually.

### BalatroLLM

The `BALATROLLM_*` variables are used as defaults for the BalatroLLM CLI. It is recommended to set the variables that you don't change often in the `.envrc`.

- `BALATROLLM_BASE_URL`: The base URL to use. (required)
- `BALATROLLM_API_KEY`: The API key to use. (usually required)
- `BALATROLLM_MODEL`: The model to use. (required)
- `BALATROLLM_STRATEGY`: The strategy to use. (default: `default`)
- `BALATROLLM_RUNS_PER_SEED`: The number of runs per seed. (default: `1`)
- `BALATROLLM_SEEDS`: The seeds to use. If empty, a random seed is used. You can also use a comma-separated list of seeds.
- `BALATROLLM_NO_SCREENSHOT`: Whether to take screenshots. Screenshots are not available in headless mode. (default: `0`, i.e. take screenshots)
- `BALATROLLM_USE_DEFAULT_PATHS`: Whether to use BalatroBot's default storage paths. It's not recommended to change this. (default: `0`)

Each of these variables has a corresponding BalatroLLM CLI flag. For example, `--model` is the BalatroLLM CLI flag for `BALATROLLM_MODEL`.

### Examples

- Run on two seeds (3 runs each):

```text
bash balatro.sh --fast
balatrollm \
  --model openai/gpt-oss-20b \
  --seeds AAAAAAA,BBBBBBB \
  --runs-per-seed 3
```

- Run faster across 2 Balatro instances in parallel:

```text
bash balatro.sh --fast --ports 12346,12347
balatrollm \
  --model openai/gpt-oss-20b \
  --seeds AAAAAAA,BBBBBBB \
  --runs-per-seed 3 \
  --ports 12346,12347
```

- Run even faster in headless mode:

```text
bash balatro.sh --headless --fast --ports 12346,12347
balatrollm \
  --model openai/gpt-oss-20b \
  --seeds AAAAAAA,BBBBBBB \
  --runs-per-seed 3 \
  --ports 12346,12347 \
  --no-screenshot
```

- Run on resource-constrained devices with screenshots:

```text
bash balatro.sh --fast --render-on-api
balatrollm \
  --model openai/gpt-oss-20b \
  --seeds AAAAAAA,BBBBBBB \
  --runs-per-seed 3
```

## Models

If you have configured the provider, you should be able to see the available models by running

```bash
balatrollm --list-models
```

Before using a model, ensure that the model (exact name) is already present in the `config/models.yaml` file. This YAML file is used for model configuration. Update it accordingly to run balatrollm with the model of your choice.

The current model configuration includes some of the models supported by [OpenRouter](https://openrouter.ai/). This is the provider we are using to run balatrollm with various models.

## Strategies

Strategies define how the LLM bot approaches decision-making during gameplay. Each strategy consists of Jinja2 templates that generate the prompts sent to the language model, providing different playing styles and approaches.

BalatroLLM ships with two built-in strategies:

- **default**: Conservative, financially disciplined approach
- **aggressive**: High-risk, high-reward strategy with aggressive spending

To run balatrollm with a specific strategy, use the `--strategy` flag:

```bash
balatrollm --strategy default
```

```bash
balatrollm --strategy aggressive
```

For detailed information about how strategies work, their structure, and how to contribute your own strategies, see the [Strategies documentation](../strategies/).

# Analysis

Analyze run data, generate performance benchmarks, and visualize results with BalatroBench.

## Run Data Collection

When you run BalatroLLM, all game data is automatically collected and organized in the `runs` directory. Each run is stored in a hierarchical structure that makes it easy to compare different models, strategies, and game sessions.

```text
runs/
└── v0.13.2/                          # Version
    └── default/                      # Strategy
        └── openai/                   # Vendor
            └── gpt-oss-20b/          # Model
                └── 20251024_120206_331_RedDeck_s1__AAAAAAA/  # Run directory
                    ├── config.json           # Model configuration and API settings
                    ├── strategy.json         # Strategy template used
                    ├── stats.json            # Aggregated performance metrics
                    ├── gamestates.jsonl      # Game state at each decision point
                    ├── requests.jsonl        # Prompts sent to the LLM
                    ├── responses.jsonl       # Model responses and actions
                    ├── run.log               # Complete text log
                    └── screenshots/          # PNG images of game states
```

Each run directory contains several files that capture different aspects of the game session. The configuration and strategy files record the setup used for the run. The stats file contains aggregated performance metrics like total rounds completed, token usage, and costs. The three JSONL files log every step of the game, recording game states, LLM prompts, and model responses. The run log provides a complete text record, and the screenshots directory contains PNG images of the game state at each step (when screenshot mode is enabled).

## Benchmark Analysis

The `balatrobench` CLI tool processes run data to generate comprehensive benchmark statistics and leaderboards. Benchmarks can be generated in two different modes depending on what you want to analyze.

### Models

Use the models mode when you want to compare how different models perform within the same strategy. This mode is useful for answering questions like "Which model plays the default strategy best?" or "How do different vendors' models compare on the aggressive strategy?"

```bash
balatrobench --models
```

The results are organized with leaderboards for each strategy, making it easy to identify the top-performing models:

```text
benchmarks/models/
├── manifest.json
└── v0.13.2/                          # Version
    └── default/                      # Strategy
        ├── leaderboard.json          # Models ranked for this strategy
        └── openai/                   # Vendor
            ├── gpt-oss-20b/          # Model
            │   └── 20251024_120206_331_RedDeck_s1__AAAAAAA/  # Run
            │       └── request-00001/         # Individual request
            │           ├── request.md         # Full LLM prompt
            │           ├── reasoning.md       # Model reasoning
            │           ├── tool_call.json     # Action taken
            │           └── screenshot.png     # Game state
            └── gpt-oss-20b.json      # Aggregated model statistics
```

### Strategies

Use the strategies mode when you want to compare how different strategies perform for the same model. This mode helps answer questions like "Does the aggressive strategy work better than the default for GPT-4?" or "Which strategy should I use with Claude?"

```bash
balatrobench --strategies
```

The strategies mode generates leaderboards organized by model, with statistics for each strategy:

```text
benchmarks/strategies/
├── manifest.json
└── v0.13.2/                          # Version
    └── openai/                       # Vendor
        └── gpt-oss-20b/              # Model
            ├── leaderboard.json      # Strategies ranked for this model
            ├── default/              # Strategy
            │   ├── stats.json        # Aggregated statistics
            │   └── gpt-oss-20b/      # Run details
            │       └── 20251024_120206_331_RedDeck_s1__AAAAAAA/  # Run
            │           └── request-00001/         # Individual request
            │               ├── request.md         # Full LLM prompt
            │               ├── reasoning.md       # Model reasoning
            │               ├── tool_call.json     # Action taken
            │               └── screenshot.png     # Game state
            └── aggressive/           # Other strategies
                └── [similar structure]
```

Both modes preserve detailed request-level data including the full LLM prompts, reasoning output, tool calls, and screenshots for in-depth analysis.

## BalatroBench Integration

[BalatroBench](https://coder.github.io/balatrobench/) is a web-based dashboard for visualizing benchmark results. You can run it locally to explore your data through interactive charts and leaderboards.

First, clone the BalatroBench repository:

```bash
git clone https://github.com/coder/balatrobench.git
```

Next, copy or symlink your benchmark data into the BalatroBench data directory. You can move the benchmarks directly:

```bash
mv benchmarks /path/to/balatrobench/data/benchmarks
```

Or create a symbolic link to keep the data in your BalatroLLM directory:

```bash
ln -s $(pwd)/benchmarks /path/to/balatrobench/data/benchmarks
```

Finally, start a local web server to view the dashboard:

```bash
cd /path/to/balatrobench
python3 -m http.server 8001
```

Open your browser to `http://localhost:8001` to explore the interactive visualization of your benchmark results.

# Strategies

Learn how strategies work in BalatroLLM, including their structure, implementation using Jinja2 templates, and how to contribute your own.

## Overview

Strategies in BalatroLLM define how the LLM bot approaches decision-making during gameplay. Each strategy consists of Jinja2 templates that generate the prompts sent to the language model, along with metadata and configuration files.

The strategy system allows for different playing styles - from conservative, financially-disciplined approaches to aggressive, high-risk strategies - by modifying the context, guidance, and available tools provided to the LLM.

## Strategy Structure

Each strategy is a directory under `src/balatrollm/strategies/` containing exactly **5 required files**:

```text
src/balatrollm/strategies/{strategy_name}/
├── manifest.json          # Strategy metadata
├── STRATEGY.md.jinja      # Strategy-specific guide and approach
├── GAMESTATE.md.jinja     # Game state representation template
├── MEMORY.md.jinja        # Response history tracking template
└── TOOLS.json             # Strategy-specific function definitions
```

### Strategy Naming Requirements

Strategy names must follow these rules:

- **Lowercase letters and numbers only** (e.g., `aggressive`, `value_based`, `risky2`)
- **Valid Python identifier** (cannot start with a number)
- **Underscores allowed, hyphens forbidden** (e.g., `high_risk` ✓, `high-risk` ✗)
- **No spaces or special characters**

## manifest.json

The `manifest.json` file defines strategy metadata with **5 required fields**:

```json
{
  "name": "Default",
  "description": "Conservative, financially disciplined approach to Balatro",
  "author": "BalatroBench",
  "version": "0.1.0",
  "tags": ["conservative", "financial"]
}
```

### Required Fields

- **name** (string): Human-readable strategy name displayed to users
- **description** (string): Brief description of the strategy's approach and philosophy
- **author** (string): Author identifier or organization name
- **version** (string): Strategy version in semantic versioning format (e.g., "0.1.0")
- **tags** (array of strings): Categorization tags for filtering and organization

### Versioning

Strategy versions are **independent** from BalatroLLM versions. Increment the strategy version when making changes:

- **Patch** (0.1.0 → 0.1.1): Bug fixes, typo corrections
- **Minor** (0.1.0 → 0.2.0): New features, significant prompt improvements
- **Major** (0.1.0 → 1.0.0): Complete strategy overhaul, breaking changes

## Jinja2 Templates

Strategies use [Jinja2](https://jinja.palletsprojects.com/) templating to dynamically generate prompts based on the current game state. Templates are compiled at runtime when the bot makes decisions.

### Available Context Variables

All Jinja2 templates have access to:

- **`G`**: The complete game state dictionary containing:

  - Current hand, jokers, consumables
  - Money, remaining hands/discards
  - Blind information, ante level
  - Deck composition, played cards
  - And more...

- **`constants`**: Balatro game constants including:

  - `constants.jokers`: All joker definitions
  - `constants.consumables`: Tarot, Planet, and Spectral cards
  - `constants.vouchers`: Available vouchers
  - `constants.tags`: Tag definitions
  - `constants.editions`: Card editions (Foil, Holographic, Polychrome)
  - `constants.enhancements`: Card enhancements (Bonus, Mult, Wild, Glass, Steel, Stone, Gold, Lucky)
  - `constants.seals`: Card seals (Gold, Red, Blue, Purple)

### Custom Filters

The template environment includes a custom `from_json` filter for parsing JSON strings within templates:

```jinja
{{ some_json_string | from_json }}
```

### Template Files

#### STRATEGY.md.jinja

Defines the strategy's core philosophy and decision-making approach. This template provides high-level guidance to the LLM about how to play the game.

Example structure:

```jinja
You are an expert Balatro player. Analyze the game state and make strategic decisions...

# Strategy Philosophy

Your approach is [conservative/aggressive/balanced]...

# Decision-Making Priorities

1. [Priority 1]
2. [Priority 2]
...
```

#### GAMESTATE.md.jinja

Presents the current game state in a format optimized for LLM comprehension. This template formats all relevant game information.

Example access patterns:

```jinja
## Current Situation

- Money: ${{ G.dollars }}
- Hands remaining: {{ G.hands }}
- Current score: {{ G.chips }} / {{ G.blind.chips }}

## Your Hand

{% for card in G.hand %}
- {{ card.rank }} of {{ card.suit }}
{% endfor %}
```

#### MEMORY.md.jinja

Tracks previous responses and errors to provide context for decision-making. This template helps the LLM learn from mistakes and maintain consistency.

Context variables:

- `responses`: List of previous LLM responses
- `last_error_call_msg`: Last error message from failed tool calls
- `last_failed_call_msg`: Last failed tool call details

## TOOLS.json

Defines the function calls available to the LLM during different game phases. The structure maps game states to available tools:

```json
{
  "SELECTING_HAND": [
    {
      "type": "function",
      "function": {
        "name": "play_hand_or_discard",
        "description": "Play cards as a poker hand or discard them",
        "parameters": {
          "type": "object",
          "properties": {
            "action": {
              "type": "string",
              "enum": ["play_hand", "discard"]
            },
            "cards": {
              "type": "array",
              "items": {"type": "integer"}
            }
          },
          "required": ["action", "cards"]
        }
      }
    }
  ],
  "SHOP": [...]
}
```

### Available Game States

- **SELECTING_HAND**: During hand selection phase (playing/discarding)
- **SHOP**: During shop phase (buying, selling, using consumables)

### Common Tools

**SELECTING_HAND phase:**

- `play_hand_or_discard`: Play or discard selected cards
- `rearrange_hand`: Reorder cards in hand
- `rearrange_jokers`: Reorder jokers for optimal scoring
- `sell_joker`: Sell a joker for money
- `sell_consumable`: Sell a consumable for money
- `use_consumable`: Use a Tarot/Planet/Spectral card

**SHOP phase:**

- `shop`: Perform shop actions (next_round, reroll, buy_card, redeem_voucher)
- `sell_joker`: Sell a joker
- `sell_consumable`: Sell a consumable
- `use_consumable`: Use a Tarot/Planet/Spectral card
- `rearrange_jokers`: Reorder jokers

## Strategy Validation

BalatroLLM performs **two-stage validation** when loading strategies:

1. **Template Validation** (via `StrategyManager`):

   - Verifies all 4 template files exist (STRATEGY.md.jinja, GAMESTATE.md.jinja, MEMORY.md.jinja, TOOLS.json)
   - Raises `FileNotFoundError` if any template file is missing

1. **Metadata Validation** (via `StrategyManifest`):

   - Verifies manifest.json exists
   - Validates all 5 required fields are present
   - Raises `FileNotFoundError` if manifest.json is missing
   - Raises `ValueError` if required fields are missing

Validation occurs at runtime when a strategy is selected.

## Contributing Your Own Strategy

### 1. Study Existing Strategies

Review the built-in strategies to understand structure and best practices:

- `src/balatrollm/strategies/default/`: Conservative approach
- `src/balatrollm/strategies/aggressive/`: High-risk approach

### 2. Create Strategy Directory

```bash
mkdir src/balatrollm/strategies/your_strategy_name
```

### 3. Create Required Files

Create all 5 required files using existing strategies as templates:

1. **manifest.json**: Define metadata
1. **STRATEGY.md.jinja**: Define strategy philosophy and approach
1. **GAMESTATE.md.jinja**: Format game state presentation
1. **MEMORY.md.jinja**: Format response history
1. **TOOLS.json**: Define available functions (usually copied from existing strategies)

### 4. Test Locally

Test your strategy to ensure it works correctly:

```bash
balatrollm --strategy your_strategy_name
```

Common issues:

- Jinja2 syntax errors in templates
- Missing required fields in manifest.json
- Invalid JSON in TOOLS.json
- File naming mismatches

### 5. Submit Pull Request

1. Fork the BalatroLLM repository
1. Create a feature branch: `git checkout -b feat/add-strategy-your_strategy_name`
1. Add your strategy directory with all required files
1. Commit following conventional commits: `feat: add [strategy_name] strategy`
1. Open a pull request with:
   - Clear title describing your strategy
   - Brief description of the strategy's approach
   - Any notable differences from existing strategies

### Quality Standards

Submissions must meet these standards:

- **Complete**: All 5 files present and functional
- **Valid**: Templates compile without errors, JSON is well-formed
- **Documented**: Clear strategy philosophy and decision-making approach
- **Unique**: Offers meaningfully different gameplay from existing strategies
- **Tested**: Locally verified to work with at least one complete game

### Review Process

Strategy contributions are reviewed for:

- Compliance with naming and structure requirements
- Template functionality and Jinja2 compatibility
- Manifest.json completeness and validity
- Strategy uniqueness and gameplay value
- Code quality and documentation clarity

Once approved, your strategy will be available to all BalatroLLM users via the `--strategy` flag.

## Best Practices

### Template Design

- **Be concise**: LLMs work better with clear, focused prompts
- **Provide context**: Include relevant game information without overwhelming
- **Use formatting**: Headers, lists, and emphasis help LLM comprehension
- **Test iteratively**: Run games and refine based on bot behavior

### Strategy Philosophy

- **Define clear priorities**: What matters most? (economy, joker synergies, risk management)
- **Explain trade-offs**: Help the LLM understand when to break rules
- **Provide examples**: Concrete scenarios guide decision-making
- **Stay consistent**: Maintain the same approach throughout templates

### Debugging

If your strategy produces errors:

1. **Check template syntax**: Ensure valid Jinja2 (matching braces, proper filters)
1. **Verify manifest.json**: All fields present, valid JSON format
1. **Test TOOLS.json**: Valid JSON, matches OpenAI function calling format
1. **Review game logs**: Check `runs/` directory for detailed error messages
1. **Compare with defaults**: See how built-in strategies handle similar situations

## Examples

### Example manifest.json

```json
{
  "name": "Aggressive",
  "description": "High-risk, high-reward strategy with aggressive spending and bold decisions",
  "author": "BalatroBench",
  "version": "0.1.0",
  "tags": ["aggressive", "high-risk", "bold"]
}
```

### Example Jinja2 Template Snippet

```jinja
## Financial Status

Current money: ${{ G.dollars }}
Interest rate: $1 per $5 saved (max $5 at $25+)

{% if G.dollars >= 25 %}
You're earning maximum interest ($5). Consider strategic spending.
{% elif G.dollars >= 20 %}
Almost at max interest! Saving ${{ 25 - G.dollars }} more will maximize returns.
{% else %}
Focus on immediate power upgrades over interest at this stage.
{% endif %}
```

### Example Tool Definition

```json
{
  "type": "function",
  "function": {
    "name": "shop",
    "description": "Perform shop actions including buying cards, rerolling, or proceeding to next round",
    "parameters": {
      "type": "object",
      "properties": {
        "action": {
          "type": "string",
          "enum": ["next_round", "reroll", "buy_card", "redeem_voucher"],
          "description": "The shop action to perform"
        },
        "index": {
          "type": "integer",
          "description": "Index of card to buy (0-indexed) or voucher to redeem"
        },
        "reasoning": {
          "type": "string",
          "description": "Brief explanation of why you're taking this action"
        }
      },
      "required": ["action"]
    }
  }
}
```