Name	Name	Last commit message	Last commit date
parent directory ..
__pycache__	__pycache__
README.md	README.md
__init__.py	__init__.py
eval_sql_optim.py	eval_sql_optim.py
fake_database.py	fake_database.py
initial_code.py	initial_code.py
run_iterx.py	run_iterx.py
task_id.txt	task_id.txt
test_queries.py	test_queries.py

Task: SQL Query Optimization

Difficulty Level: ⭐⭐⭐ (3 stars)

Background

In this task, we build SQL query transformers that rewrite inefficient queries into optimized versions. Database query optimization is a fundamental problem in computer science where the same logical query can have vastly different execution times depending on how it's structured.

The evaluation uses a simulated e-commerce database with 1.6M+ rows across 5 tables. Your transformer receives SQL queries containing common inefficiencies (correlated subqueries, redundant operations, suboptimal joins) and must produce equivalent queries that execute faster. The schema and test queries can be easily adjusted to match your real-world database for domain-specific optimization.

The get_reward(code_path) function returns (reward, error_msg, details):

reward: Average speedup across ALL queries with harsh penalty for failures. Successful queries contribute speedup = original_time / transformed_time. Failed/incorrect queries contribute -1.0 penalty each. Final reward is clamped to minimum 0. Higher is better (>1 means faster).
error_msg: Error message if evaluation failed, None otherwise.
details: String with per-query results (success/failure, speedup, error messages) for learning from specific failures.

Task Description

Goal: Generate a Python function transform_query(sql: str) -> str that transforms SQL queries into optimized versions.

Setup:

Database: SQLite with 1.6M+ rows
Tables: users (100K), categories (100), products (10K), orders (500K), order_items (1M)
Test queries: 25 queries across 8 optimization categories
Database is cached at /tmp/sql_query_optim_db.sqlite (created on first run)

Requirements:

Function must be named transform_query
Must take a SQL string and return a SQL string
Transformed query must return identical results to the original
Transformed query should execute faster than the original
Must handle edge cases (empty results, NULL values)

Things to Avoid:

Returning queries that produce different results
Returning empty or invalid SQL
Infinite loops in transformation logic
Hardcoded transformations for specific query IDs

Database Schema:

-- Users table (100K rows)
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL,
    city TEXT NOT NULL,
    age INTEGER NOT NULL,
    created_at DATE NOT NULL
);

-- Categories table (100 rows, hierarchical)
CREATE TABLE categories (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    parent_id INTEGER  -- NULL for top-level
);

-- Products table (10K rows)
CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    category_id INTEGER NOT NULL,
    price REAL NOT NULL,
    stock INTEGER NOT NULL,
    created_at DATE NOT NULL
);

-- Orders table (500K rows)
CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    user_id INTEGER NOT NULL,
    total_amount REAL NOT NULL,
    status TEXT NOT NULL,
    created_at DATE NOT NULL
);

-- Order items table (1M rows)
CREATE TABLE order_items (
    id INTEGER PRIMARY KEY,
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    unit_price REAL NOT NULL
);

Reward Description

The reward formula penalizes failures harshly:

reward = max(0, (total_speedup + num_failures × FAILURE_PENALTY) / num_queries)

Where:

total_speedup = sum of speedups from successful queries (original_time / transformed_time)
num_failures = count of failed + incorrect queries
FAILURE_PENALTY = -1.0 (each failure subtracts 1.0 from the total)
num_queries = 25 total test queries

Examples:

Scenario	Calculation	Reward
25 success (2x avg)	50 / 25	2.0
20 success (2x avg), 5 failed	(40 - 5) / 25	1.4
5 success (2x avg), 20 failed	max(0, (10 - 20) / 25)	0.0

Higher is better (>1 means overall speedup). Failures are harshly penalized to encourage correctness.

Initial Code

import re

def transform_query(sql: str) -> str:
    """
    Transform an SQL query into an optimized version.
    
    This baseline implementation returns the query unchanged.
    
    Args:
        sql: Input SQL query string
        
    Returns:
        Optimized SQL query string (same query in this baseline)
    """
    # Baseline: return query unchanged
    return sql

File Structure

iterx_code/sql_query_optim/
├── README.md              # This file
├── __init__.py            # Package exports
├── fake_database.py       # Database creation and caching
├── test_queries.py        # 25 test queries with hints
├── eval_sql_optim.py      # Evaluation function
├── initial_code.py        # Baseline transformer
└── run_iterx.py           # IterX evaluation runner

Running iterX

python run_iterx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Task: SQL Query Optimization

Background

Task Description

Reward Description

Initial Code

File Structure

Running iterX

FilesExpand file tree

sql_query_optim

Directory actions

More options

Directory actions

More options

Latest commit

History

sql_query_optim

Folders and files

parent directory

README.md

Task: SQL Query Optimization

Background

Task Description

Reward Description

Initial Code

File Structure

Running iterX