Reaction Space Generation

This section describes how AMLRO constructs the reaction search space and selects the initial training conditions using the get_reaction_scope entry-point function.

This is the first step in any AMLRO workflow.

Overview

Reaction space generation serves two purposes:

Construct the full combinatorial reaction space based on user-defined continuous and categorical parameters.
Select an initial subset of reactions for training using a chosen sampling strategy.

The reaction scope is generated using:

get_reaction_scope(
    config=config,
    sampling="sobol",
    training_size=10,
    write_files=True,
    exp_dir=exp_dir,
)

This function generates all reaction combinations and writes them to full_combo.csv for use as reaction grid for active learning optimization.

Function Arguments

Configuration Dictionary

config defines the reaction parameters and objectives and is described in detail in configuration.

Only reaction variables and vales are read from config at this stage.

Sampling Strategy

The sampling argument controls how the initial training reactions are selected from the full reaction space.

Supported options include:

"lhs" – Latin Hypercube Sampling
"sobol" – Sobol low-discrepancy sequence
"random" – Random sampling

The sampling method affects only the initial training set and does not alter the full reaction space.

Implementation details:

Latin Hypercube Sampling (LHS) is implemented using PyDOE2 with a min–max criterion, 1000 iterations, and a fixed random seed for reproducibility.
Sobol sampling is implemented using scipy.qmc.
Random sampling uses uniform random selection.

Training Size

training_size specifies the number of reaction conditions selected for the initial training set.

This value should reflect experimental or computational budget constraints. Typical values range from 5 to 50 reactions, depending on problem complexity. Donot go lesser than 5.

Experiment Directory

exp_dir defines the directory where all generated reaction space and training files are written.

If the directory does not exist, it will be created automatically. This directory should use with full optimization cycle. Specially if you manually adding reaction_data.csv includes here.

Example:

exp_dir = "exp_data"

File Generation

When write_files=True, the reaction space generation step produces the following files inside exp_dir:

full_combo.csv Encoded representation of the complete reaction space. Categorical variables are stored as integer indices.
full_combo_decoded.csv Human-readable version of the full reaction space with original categorical values restored. (This file is generated if only reaction space <= 20000)
training_combo.csv Initial subset of reaction conditions selected using the specified sampling strategy. These reactions are intended to be performed experimentally or evaluated via simulation.

These files allow users to inspect, visualize, or modify reaction conditions prior to experimentation.

Inspecting the Reaction Space

Users may inspect the generated reaction combinations using standard data analysis tools.

Example:

import pandas as pd

df = pd.read_csv("exp_data/training_combo.csv")
print(df.head())

This is particularly useful for verifying parameter ranges, sampling behavior, and categorical encoding.

Relationship to the AMLRO Workflow

Reaction space generation is a one-time initialization step. Once completed, users proceed to:

Perform experiments or simulations for conditions listed in training_combo.csv
Invoke active learning to propose new reaction conditions

For details on how experimental feedback is incorporated, see training_data.rst.

For details on batch selection and optimization, see active_learning.rst.