.. _reaction_space: Reaction Space Generation ========================= This section describes how AMLRO constructs the **reaction search space** and selects the **initial training conditions** using the ``get_reaction_scope`` entry-point function. This is the **first step** in any AMLRO workflow. Overview -------- Reaction space generation serves two purposes: 1. Construct the **full combinatorial reaction space** based on user-defined continuous and categorical parameters. 2. Select an **initial subset of reactions** for training using a chosen sampling strategy. The reaction scope is generated using: .. code-block:: python get_reaction_scope( config=config, sampling="sobol", training_size=10, write_files=True, exp_dir=exp_dir, ) This function generates all reaction combinations and writes them to ``full_combo.csv`` for use as reaction grid for active learning optimization. Function Arguments ------------------ Configuration Dictionary ~~~~~~~~~~~~~~~~~~~~~~~~ ``config`` defines the **reaction parameters and objectives** and is described in detail in :doc:`configuration`. Only reaction variables and vales are read from ``config`` at this stage. Sampling Strategy ~~~~~~~~~~~~~~~~~ The ``sampling`` argument controls how the **initial training reactions** are selected from the full reaction space. Supported options include: - ``"lhs"`` – Latin Hypercube Sampling - ``"sobol"`` – Sobol low-discrepancy sequence - ``"random"`` – Random sampling The sampling method affects **only the initial training set** and does not alter the full reaction space. Implementation details: - Latin Hypercube Sampling (LHS) is implemented using **PyDOE2** with a min–max criterion, 1000 iterations, and a fixed random seed for reproducibility. - Sobol sampling is implemented using ``scipy.qmc``. - Random sampling uses uniform random selection. Training Size ~~~~~~~~~~~~~ ``training_size`` specifies the number of reaction conditions selected for the **initial training set**. This value should reflect experimental or computational budget constraints. Typical values range from 5 to 50 reactions, depending on problem complexity. Donot go lesser than 5. Experiment Directory ~~~~~~~~~~~~~~~~~~~~ ``exp_dir`` defines the directory where all generated reaction space and training files are written. If the directory does not exist, it will be created automatically. This directory should use with full optimization cycle. Specially if you manually adding ``reaction_data.csv`` includes here. Example: .. code-block:: python exp_dir = "exp_data" File Generation --------------- When ``write_files=True``, the reaction space generation step produces the following files inside ``exp_dir``: - ``full_combo.csv`` Encoded representation of the complete reaction space. Categorical variables are stored as integer indices. - ``full_combo_decoded.csv`` Human-readable version of the full reaction space with original categorical values restored. (This file is generated if only reaction space <= 20000) - ``training_combo.csv`` Initial subset of reaction conditions selected using the specified sampling strategy. These reactions are intended to be performed experimentally or evaluated via simulation. These files allow users to **inspect, visualize, or modify** reaction conditions prior to experimentation. Inspecting the Reaction Space ----------------------------- Users may inspect the generated reaction combinations using standard data analysis tools. Example: .. code-block:: python import pandas as pd df = pd.read_csv("exp_data/training_combo.csv") print(df.head()) This is particularly useful for verifying parameter ranges, sampling behavior, and categorical encoding. Relationship to the AMLRO Workflow ---------------------------------- Reaction space generation is a **one-time initialization step**. Once completed, users proceed to: 1. Perform experiments or simulations for conditions listed in ``training_combo.csv`` 2. Invoke active learning to propose new reaction conditions For details on how experimental feedback is incorporated, see :doc:`training_data.rst`. For details on batch selection and optimization, see :doc:`active_learning.rst`.