amlro package

Submodules

amlro.const module

amlro.generate_reaction_conditions module

amlro.generate_reaction_conditions.generate_reaction_grid(config: Dict) DataFrame[source]

Generates all reaction conditions in the requested reaction space.

Given all continuous and categorical input parameters defining a reaction space, every reaction parameter combination is generated, forming a uniform grid of points over the reaction space. From the config dictionary, the config["continuous"]["feature_names"], config["continuous"] ["feature_bounds"], and config["continuous"]["resolutions"] are used to define the reaction space. If there are categorical parameters, then config["categorical"]["feature_names"] and config["categorical"]["values"] are also required. It is allowed for there to be only continuous, only categorical, or a mixture of both types of parameters to be present.

An encoded variant is also returned with categorical parameter values represented as numbers for ease-of-use by other algorithms and optimizers.

Parameters:

config (Dict) – Dictionary of parameters, their bounds and resolution.

Returns:

reaction space and encoded reaction space.

Return type:

pd.DataFrame

amlro.generate_reaction_conditions.get_reaction_scope(config: Dict, sampling: str = 'random', write_files: bool = False, exp_dir: str = None) DataFrame[source]

Generate the full reaction space and training reaction conditions.

If required write function can be enabled to generate full combo file and traning combo files.

Parameters:
  • config (Dict) – Dictionary of parameters, their bounds and resolution.

  • sampling (str, optional) – Sampling methods for generating traning reaction conditions, defaults to ‘random’

  • write_files (bool, optional) – Option to enable writing files, defaults to False

  • exp_dir (str, optional) – experimental directory for saving data files, defaults to None

Returns:

returning two dataframes with full reaction space and training conditions

Return type:

pd.DataFrame

Raises:

ValueError – incorrect sampling method.

amlro.generate_reaction_conditions.write_reaction_scope(full_combo_df: DataFrame, full_combo_encoded_df: DataFrame, training_combo_df: DataFrame, exp_dir: str)[source]

write reaction scopes into files

This code will generate following two files, full combo file (full reaction space) and training combo file (sub sample of reaction conditions to generate training data file) in given experimental directory.

Parameters:
  • full_combo_df (pd.DataFrame) – full reaction space dataframe

  • full_combo_encoded_df – full reaction space dataframe that

catogerical varible encoded into numerical :type full_combo_encoded_df: pd.DataFrame :param training_combo_df: training reaction conditions dataframe :type training_combo_df: pd.DataFrame :param exp_dir: experimental directory for saving data files, :type exp_dir: str

amlro.generate_training_data module

amlro.generate_training_data.generate_training_data(exp_dir: str, config: Dict, parameters: List = [], obj_values: List = [], filename: str = 'reactions_data.csv', termination: bool = False) List[Any][source]

Generates a training dataset for the ML model.

This function handles the generation of training data point for the optimization process involving experimental reactions. The function is designed to work iteratively,where each iteration represents a generate new training reaction. Depending on the iteration number (itr) and the termination flag, it writes the experimental data to files and provides the next set of reaction conditions.

Parameters:
  • exp_dir (str) – experimental directory for saving data files

  • config (Dict) – Dictionary of parameters

  • parameters (List, optional) – Previous experiment parameters or initial parameters, defaults to [].

  • obj_values (List, optional) – experimental yield from previous experiment.

  • filename (str, optional) – filename for reaction data file, filename for reaction data file, defaults to the value of amlro.const.REACTION_DATA_FILENAME.

  • termination (bool, optional) – termination of the training function after last iteration without returning next reaction conditions, defaults to False

Returns:

parameter set for next experiment.

Return type:

List

amlro.generate_training_data.get_next_training_conditions(exp_dir: str, config: Dict, filename: str = 'reactions_data.csv') List[Any][source]

Returns next reaction conditions from the training reaction space.

Parameters:
  • exp_dir (str) – experimental directory for saving data files

  • config (Dict) – Dictionary of parameters

  • filename (str, optional) – filename for reaction data file, filename for reaction data file, defaults to the value of amlro.const.REACTION_DATA_FILENAME.

Returns:

parameter set for next experiment.

Return type:

List

amlro.generate_training_data.load_training_conditions(training_combo: str = 'training_combo.csv') List[Any][source]

Loads the training data.

Reads the combination file as pandas data frame and return the reaction combination data as list.

Parameters:

training_combo (str) – training parameter combination file path, defaults to the value of amlro.const.TRAINING_COMBO_FILE.

Returns:

training parameter combination list

Return type:

List

amlro.generate_training_data.write_data_to_training_files(exp_dir: str, config: Dict, parameters: List = [], obj_values: List = [], filename: str = 'reactions_data.csv')[source]

Writes experimental reaction data into the reaction data files.

Parameters:
  • exp_dir (str) – experimental directory for saving data files

  • config (Dict) – Dictionary of parameters

  • filename (str, optional) – filename for reaction data file, filename for reaction data file, defaults to the value of amlro.const.REACTION_DATA_FILENAME.

  • obj_values (List, optional) – experimental yield from previous experiment.

  • filename – filename for reaction data file, defaults to reaction_data.csv.

amlro.generate_training_data.write_line_to_a_file(training_file: str, reaction_results: str) None[source]

writes a data line to a file

Writes the prev reaction conditions and experimental objective values at the end of the training data file.

Parameters:
  • training_file (str) – traning set file path

  • reaction_results (str) – previous reaction conditions and objective values

amlro.generate_training_data.write_training_data(exp_dir: str, objectives: List[List], config: Dict, filename: str = 'reactions_data.csv')[source]

writes the experiment data.

This function can be used if we needs to perform all the training conditions together and write the experimental results together. Batch of training experiments results/objectives are neded to collected.

Parameters:
  • exp_dir (str) – experimental directory for saving data files

  • objectives (List[List]) – objective values for training reaction conditions

  • filename (str, optional) – filename for reaction data file, filename for reaction data file, defaults to the value of amlro.const.REACTION_DATA_FILENAME.

Raises:

ValueError – if lengths of training conditions and objectives dont match.

amlro.ml_models module

This module provides a framework for retrieving various regression models along with their corresponding hyperparameter grids for hyper parameter tuning. Regression Models: - ElasticNet - Decision Tree - Random Forest - Gradient Boosting - XGBoost - AdaBoost - Support Vector Regressor - K-Nearest Neighbors (KNN) - Bayesian Ridge

Each function returns a regressor model object and a hyperparameter grid for use in hyperparameter optimization (e.g., using grid search in optimizer.model_training).

The main function get_regressor_model(model: str) serves as the entry point to select a model based on user input, returning the chosen model and grid for training and tuning.

Usage:

model, param_grid = get_regressor_model(‘xgb’)

amlro.ml_models.adaboost_regressor() Tuple[AdaBoostRegressor, Dict][source]

Creates an AdaBoost regressor with a grid of hyperparameters.

Returns:

The AdaBoostRegressor model and hyperparameter grid.

Return type:

Tuple[AdaBoostRegressor, Dict]

amlro.ml_models.bayesian_ridge_regressor() Tuple[BayesianRidge, Dict][source]

Creates a BayesianRidge regressor with a grid of hyperparameters.

Returns:

The BayesianRidge model and hyperparameter grid.

Return type:

Tuple[BayesianRidge, Dict]

amlro.ml_models.decision_tree_regressor() Tuple[DecisionTreeRegressor, Dict][source]

Creates a DecisionTree regressor with a grid of hyperparameters.

Returns:

The DecisionTreeRegressor model and hyperparameter grid.

Return type:

Tuple[DecisionTreeRegressor, Dict]

amlro.ml_models.elastic_net_regressor() Tuple[ElasticNet, Dict][source]

Creates an ElasticNet regressor with a grid of hyperparameters.

Returns:

The ElasticNet model and hyperparameter grid.

Return type:

Tuple[ElasticNet,Dict]

amlro.ml_models.get_regressor_model(model: str, seed: int = None) Tuple[object, Dict][source]

Retrieves the specified regressor model and its hyperparameter grid.

Parameters:
  • model (str) – The name of the model to retrieve. Supported values are: ‘ela_net’, ‘dtree’, ‘rf’, ‘gb’, ‘xgb’, ‘aboost’, ‘svr’, ‘knn’, ‘bayesian_ridge’.

  • seed (int, defaults to None) – Random seed for define random state of the ML model

Returns:

The selected model and its hyperparameter grid.

Return type:

Tuple[model object, Dict]

amlro.ml_models.gradient_boost_regressor() Tuple[GradientBoostingRegressor, Dict][source]

Creates a GradientBoosting regressor with a grid of hyperparameters.

Returns:

The GradientBoostingRegressor model and hyperparameter grid.

Return type:

Tuple[GradientBoostingRegressor, Dict]

amlro.ml_models.knn_regressor() Tuple[KNeighborsRegressor, Dict][source]

Creates a K-Nearest Neighbors (KNN) regressor with a grid of hyperparameters.

Returns:

The KNeighborsRegressor model and hyperparameter grid.

Return type:

Tuple[KNeighborsRegressor, Dict]

amlro.ml_models.random_forest_regressor() Tuple[RandomForestRegressor, Dict][source]

Creates a RandomForest regressor with a grid of hyperparameters.

Returns:

The RandomForestRegressor model and hyperparameter grid.

Return type:

Tuple[RandomForestRegressor, Dict]

amlro.ml_models.support_vector_regressor() Tuple[SVR, Dict][source]

Creates a Support Vector Regressor (SVR) with a grid of hyperparameters.

Returns:

The SVR model and hyperparameter grid.

Return type:

Tuple[SVR, Dict]

amlro.ml_models.xgboost_regressor() Tuple[XGBRegressor, Dict][source]

Creates an XGBoost regressor with a grid of hyperparameters.

Returns:

The XGBoost model and hyperparameter grid.

Return type:

Tuple[xgboost.XGBRegressor, Dict]

amlro.optimizer module

amlro.optimizer.categorical_feature_decoding(config: Dict, best_combo: List[Any]) List[Any][source]

This method converts encoded parameter list into decoded list. Convert categorical feature values back into its names.

Parameters:
  • config (Dict) – Initial reaction feature configurations

  • best_combo (List[Any]) – parameter list required for decoding

Returns:

Decoded parameter list

Return type:

List[Any]

amlro.optimizer.categorical_feature_encoding(config: Dict, prev_parameters: List[Any]) List[Any][source]

This method converts decoded parameter list into encoded list. Convert categorical feature values into its numerical values.

Parameters:
  • config (Dict) – Initial reaction feature configurations

  • prev_parameters (List[Any]) – parameter list required for encoding

Returns:

encoded parameter list

Return type:

List[Any]

amlro.optimizer.get_optimized_parameters(exp_dir: str, config: Dict, parameters_list: List[List] = [[]], objectives_list: List[List] = [[]], model: str = 'gb', filename: str = 'reactions_data.csv', batch_size: int = 1, termination: bool = False) List[List][source]

Trains a machine learning model using the provided training data and configuration, then predicts and retrieves the next best reaction parameters based on the optimization objectives. Then, writes the training data and predictions to files.

Parameters:
  • exp_dir (str) – experimental directory for saving data files

  • config (Dict) – Configuration dictionary specifying features, objectives, directions, and other settings for optimizer.

  • parameters_list (List[List], optional) – List of reaction conditions for training, defaults to [[]].

  • objectives_list (List[List], optional) – List of objective values corresponding to the reaction conditions, defaults to [[]].

  • model (str, optional) – The machine learning model type to use, e.g., ‘gb’ for Gradient Boosting, defaults to ‘gb’.

  • filename (str, optional) – The name of the file where reaction data is stored, defaults to REACTION_DATA_FILENAME.

  • batch_size (int, optional) – The number of best reaction conditions to return, defaults to 1.

  • termination (bool, optional) – If True, the function will terminate early after saving data, defaults to False.

Returns:

A list of lists containing the next predicted reaction conditions.

Return type:

List[List]

amlro.optimizer.load_data(reactions_data: str, combination_file: str, config: Dict) Tuple[DataFrame, DataFrame, DataFrame][source]

Loading dataset files

Reads the training set file and all combination file as pandas data frames and split into x train , y train and test datasets. When loading the combination file, data rows will be deleted if they are included in training file.

Parameters:
  • training_file (str) – path to the training set file.

  • combination_file (str) – path to the combination file.

  • config (Dict) – Dictionary of optimizer parameters.

Returns:

x and y training datasets and test dataset.

Return type:

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

amlro.optimizer.mo_model_training(x_train: DataFrame, y_train: DataFrame, model: str = 'gb') object[source]

Train the multi output regressor model and return the best model.

Parameters:
  • x_train (pd.Dataframe) – training dataset that contains feature values.

  • y_train (pd.Dataframe) – target dataframe that contains objective values.

  • model (str) – Regressor model name. Valid options: ‘ela_net’, ‘dtree’, ‘rf’, ‘gb’, ‘xgb’, ‘aboost’, ‘svr’, ‘knn’, ‘bayesian_ridge’. Default is ‘gb’ (Gradient Boosting).

Returns:

trained regressor model

Return type:

model object with a predict method (e.g., sklearn regressor)

amlro.optimizer.model_training(x_train: DataFrame, y_train: DataFrame, model: str = 'gb') object[source]

Train the regressor model and return the best model.

Parameters:
  • x_train (pd.Dataframe) – training dataset that contains feature values.

  • y_train (pd.Dataframe) – target dataframe that contains objective values.

  • model (str) – Regressor model name. Valid options: ‘ela_net’, ‘dtree’, ‘rf’, ‘gb’, ‘xgb’, ‘aboost’, ‘svr’, ‘knn’, ‘bayesian_ridge’. Default is ‘gb’ (Gradient Boosting).

Returns:

trained regressor model

Return type:

model object with a predict method (e.g., sklearn regressor)

amlro.optimizer.predict_next_parameters(regr, data: DataFrame, n_curr: int, config: Dict, batch_size: int = 1) DataFrame[source]

Predicts the yield from all the combination data using a trained regressor model and return the best combinations.

The function handles both single-objective and multi-objective optimization. In the single-objective case, it sorts the predicted results based on the specified direction (maximization or minimization). In the multi-objective case, it identifies the Pareto front and ranks solutions by computing weighted sums of normalized objectives. Weights are defined as -1 for min and +1 for max.

Parameters:
  • regr (model object with a predict method (e.g., sklearn regressor)) – trained regressor model

  • data (pd.Dataframe) – test dataset that contains full reaction space

  • n_curr (int) – current size of the training data

  • config (Dict) – Dictionary of optimizer parameters

  • batch_size (int, optional) – Number of reactions conditions need as predictions defaults to 1.

Returns:

batch of best predicted parameter

Return type:

pd.Dataframe

amlro.optimizer.stringify_parameters_objectives(parameters_list: List[List], objectives_list: List[List], config: Dict) Tuple[List[str], List[str]][source]

Combine reaction parameters and respective objectives values as a comma sepearated string for writing into reaction data file.

Convert parameters and their respective objective values into comma-separated strings, both encoded and decoded, for writing into a reaction data file. The function encodes categorical features based on the provided configuration.

Parameters:
  • parameters_list (List[List]) – A list of reaction conditions, where each condition is a list of parameter values.

  • objectives_list (List[List]) – A list of objective values where list of corrosponding objective values for each reaction condition.

  • config (Dict) – Dictionary of optimizer parameters

Returns:

Two lists of strings: (1) the parameters with encoded categorical features combined with objective values, and (2) the original (decoded) parameter values combined with objective values.

Return type:

Tuple[List[str], List[str]]

amlro.optimizer.write_data_to_training(training_file: str, parameter_list: List[str]) None[source]

writing the prev best predicted combination and experimental yield at the end of the training set file.

Parameters:
  • training_file (str) – traning set file path

  • prev_parameters (List[str]) – previous best combo and yield

amlro.pareto module

amlro.pareto.calculate_frontier_depth(config: Dict, nfeatures: int, n_curr: int, batch_size: int) int[source]

Calculates how many Pareto fronts to explore based on data volume.

Uses a stateless exponential decay function to dynamically balance exploration and exploitation. To prevent premature convergence on sparse datasets, the decay is strictly bounded by the ratio of intial training data and collected datapoints.

Parameters:
  • config (Dict) – Dictionary of optimizer parameters

  • nfeatures (int) – Length of the feature space

  • n_curr (int) – current size of the training data

  • batch_size (int) – Number of reactions conditions need as predictions.

Returns:

number of pareto fronts/ Frontier depth

Return type:

int

amlro.pareto.get_ranked_pareto_fronts(config: Dict, nfeatures: int, pred_df: DataFrame, frontier_depth: int) DataFrame[source]

Selects and ranks layers of Pareto fronts according to the frontier depth.

Performs non-dominated sorting on the prediction grid. Iteratively extracts the current Pareto optimal front, assigns it a rank, removes it from the grid, and repeats to expose deeper, sub-optimal frontiers.

Parameters:
  • config (Dict) – Dictionary of optimizer parameters

  • nfeatures (int) – Length of the feature space

  • pred_df (pd.DataFrame) – Dataframe with predictions for the full grid space

  • frontier_depth (int) – number of pareto fronts/ Frontier depth

Returns:

Ranked pareto fronts

Return type:

pd.DataFrame

amlro.pareto.identify_pareto_front(prediction_data: List, directions: List, nfeatures: int) array[source]

Identifies the Pareto front from a list of points in a multi-objective space.

The Pareto front is a set of non-dominated points in a multi-objective optimization problem. A point is considered to be on the Pareto front if no other point in the set dominates it. This function examines each point and determines whether it should be included in the Pareto front.

Parameters:
  • prediction_data (List) – A list of points/predictions, where each point contains both feature values and objective values. The objective values should follow the features in each point.

  • directions (List) – Optimization direction for each objective. Each entry should be “min” for minimization or “max” for maximization.

  • nfeatures (int) – Length of the feature space

Returns:

List of pareto solutions

Return type:

np.array

amlro.pareto.is_dominated_by_any(target_point: ndarray, all_points: ndarray, directions: List[str]) bool[source]

Determines whether a single point is Pareto-dominated by ANY point in a dataset.

Pareto dominance is a concept used in multi-objective optimization to compare solutions based on multiple objectives. A point A is said to Pareto-dominate another point B if A is no worse than B in all objectives and strictly better than B in at least one objective.

Instead of one-to-one comparison, this function utilizes high-speed NumPy broadcasting to simultaneously check if target_point is Pareto-dominated by ANY of the solutions in the all_points array. It handles mixed optimization goals by mathematically aligning ‘min’ and ‘max’ directions into a uniform scale.

Parameters:
  • target_point (np.ndarray) – A 1D array of a objective values of the the first point.

  • all_points (np.ndarray) – A 2D array containing the objective values of the entire grid.

  • directions (List[str]) – Optimization direction for each objective. Each entry should be “min” for minimization or “max” for maximization.

Returns:

True if target_point is dominated by at least one other point in the dataset. False if it is non-dominated (it belongs on the Pareto front).

Return type:

bool

amlro.pareto.is_pareto_dominant(point1: List, point2: List, directions: List) bool[source]

Determines whether one point dominates another in a multi-objective space according to Pareto dominance.

Pareto dominance is a concept used in multi-objective optimization to compare two points (or solutions) based on multiple objectives. A point A is said to Pareto-dominate another point B if A is no worse than B in all objectives and better than B in at least one objective.

This function checks whether point1 Pareto-dominates point2 considering the specified optimization directions for each objective.

Parameters:
  • point1 (List) – List of objective values for the first point

  • point2 (List) – List of objective values for the second point

  • directions (List) – Optimization direction for each objective. Each entry should be “min” for minimization or “max” for maximization.

Returns:

True if point1 Pareto-dominates point2, otherwise False.

Return type:

bool

amlro.pareto.select_batch_from_ranked_fronts(ranked_df: DataFrame, config: Dict, nfeatures: int, batch_size: int, feature_columns: List[str]) DataFrame[source]

Scores a pool of ranked Pareto fronts and selects a batch using stratified sampling.

Normalizes the objective predictions and applies the user-defined directional weights to calculate a single score. To increase the exploration, it explicitly samples the top candidate from each available frontier depth before filling additional datapoints from forntier rank 1 to meet the required batch size.

Parameters:
  • ranked_df (pd.DataFrame) – Data frame with ranked pareto_fronts

  • config (Dict) – Dictionary of optimizer parameters

  • nfeatures (int) – Length of the feature space

  • batch_size (int) – Number of reactions conditions need as predictions.

  • feature_columns (List[str]) – Parameter names for continous and categorical features

Returns:

batch of best predicted parameter

Return type:

pd.Dataframe

amlro.sampling_methods module

amlro.sampling_methods.feature_scaling(samples: List[List], config: Dict, res_factor: float = 1.0) DataFrame[source]

This function will scale and map the continous and categorical features from latin hypercube and sobol sampling space. These methods will generate coordinates between [0,1] for each dimention.

From the config dictionary, the config["continuous"]["feature_names"], config["continuous"] ["feature_bounds"], and config["continuous"]["resolutions"] are used rescale the continous features and config["continuous"]["bounds"] and config["continuous"]["feature_names"] are used to rescale categorical features.

Parameters:
  • samples (List[List]) – sub sample generated from sampling method

  • config (Dict) – Dictionary of parameters, their bounds and resolution.

  • res_factor (float, optional) – resolution factor to define rounding decimal places, defaults to 1.0

Returns:

scaled traning reaction conditions dataframe

Return type:

pd.DataFrame

amlro.sampling_methods.latin_hypercube_sampling(config: Dict, sample_size: int = 20, res_factor: float = 1.0) DataFrame[source]

Generate subsample from full reaction space using latent hypercube sampling.

Parameters:
  • config (Dict) – Dictionary of parameters, their bounds and resolution.

  • sample_size (int, optional) – sub sample size, defaults to 20

  • res_factor (float, optional) – resolution factor to define rounding decimal places, defaults to 1.0

Returns:

sub sample of reaction space needed for training set generation

Return type:

pd.DataFrame

amlro.sampling_methods.random_sampling(df: DataFrame, sample_size: int = 20) DataFrame[source]

Generate subsample from full reaction space using random sampling.

Parameters:
  • df (pd.DataFrame) – Dataframe with full reaction space

  • sample_size (int, optional) – sub sample size, defaults to 20

Returns:

sub sample of reaction space needed for training set generation

Return type:

pd.DataFrame

amlro.sampling_methods.sobol_sequnce_sampling(config: Dict, sample_size: int = 20, res_factor: float = 1.0) DataFrame[source]

Generate subsample from full reaction space using Sobol sequnce sampling.

Parameters:
  • config (Dict) – Dictionary of parameters, their bounds and resolution.

  • sample_size (int, optional) – sub sample size, defaults to 20

  • res_factor (float, optional) – resolution factor to define rounding decimal places, defaults to 1.0

Returns:

sub sample of reaction space needed for training set generation

Return type:

pd.DataFrame

amlro.validations module

amlro.validations.validate_optimizer_config(config: Dict) None[source]

Validates the config dict for the optimizer and generate training set.

Parameters:

config (Dict) – Configuration to be checked

Raises:
  • ValueError – If lengths of directions and objectives do not match.

  • ValueError – If any direction is not ‘min’ or ‘max’.

amlro.validations.validate_reaction_scope_config(config: Dict) None[source]

Validates the part of the configuration dictionary for generating grids.

Parameters:

config (Dict) – Configuration to be checked

Raises:
  • ValueError – At least one given bound is invalid.

  • ValueError – At least one given resolution is invalid.

Module contents