amlro package
Submodules
amlro.const module
amlro.generate_reaction_conditions module
- amlro.generate_reaction_conditions.generate_reaction_grid(config: Dict) DataFrame[source]
Generates all reaction conditions in the requested reaction space.
Given all continuous and categorical input parameters defining a reaction space, every reaction parameter combination is generated, forming a uniform grid of points over the reaction space. From the config dictionary, the
config["continuous"]["feature_names"],config["continuous"] ["feature_bounds"], andconfig["continuous"]["resolutions"]are used to define the reaction space. If there are categorical parameters, thenconfig["categorical"]["feature_names"]andconfig["categorical"]["values"]are also required. It is allowed for there to be only continuous, only categorical, or a mixture of both types of parameters to be present.An encoded variant is also returned with categorical parameter values represented as numbers for ease-of-use by other algorithms and optimizers.
- Parameters:
config (Dict) – Dictionary of parameters, their bounds and resolution.
- Returns:
reaction space and encoded reaction space.
- Return type:
pd.DataFrame
- amlro.generate_reaction_conditions.get_reaction_scope(config: Dict, sampling: str = 'random', write_files: bool = False, exp_dir: str = None) DataFrame[source]
Generate the full reaction space and training reaction conditions.
If required write function can be enabled to generate full combo file and traning combo files.
- Parameters:
config (Dict) – Dictionary of parameters, their bounds and resolution.
sampling (str, optional) – Sampling methods for generating traning reaction conditions, defaults to ‘random’
write_files (bool, optional) – Option to enable writing files, defaults to False
exp_dir (str, optional) – experimental directory for saving data files, defaults to None
- Returns:
returning two dataframes with full reaction space and training conditions
- Return type:
pd.DataFrame
- Raises:
ValueError – incorrect sampling method.
- amlro.generate_reaction_conditions.write_reaction_scope(full_combo_df: DataFrame, full_combo_encoded_df: DataFrame, training_combo_df: DataFrame, exp_dir: str)[source]
write reaction scopes into files
This code will generate following two files, full combo file (full reaction space) and training combo file (sub sample of reaction conditions to generate training data file) in given experimental directory.
- Parameters:
full_combo_df (pd.DataFrame) – full reaction space dataframe
full_combo_encoded_df – full reaction space dataframe that
catogerical varible encoded into numerical :type full_combo_encoded_df: pd.DataFrame :param training_combo_df: training reaction conditions dataframe :type training_combo_df: pd.DataFrame :param exp_dir: experimental directory for saving data files, :type exp_dir: str
amlro.generate_training_data module
- amlro.generate_training_data.generate_training_data(exp_dir: str, config: Dict, parameters: List = [], obj_values: List = [], filename: str = 'reactions_data.csv', termination: bool = False) List[Any][source]
Generates a training dataset for the ML model.
This function handles the generation of training data point for the optimization process involving experimental reactions. The function is designed to work iteratively,where each iteration represents a generate new training reaction. Depending on the iteration number (itr) and the termination flag, it writes the experimental data to files and provides the next set of reaction conditions.
- Parameters:
exp_dir (str) – experimental directory for saving data files
config (Dict) – Dictionary of parameters
parameters (List, optional) – Previous experiment parameters or initial parameters, defaults to [].
obj_values (List, optional) – experimental yield from previous experiment.
filename (str, optional) – filename for reaction data file, filename for reaction data file, defaults to the value of amlro.const.REACTION_DATA_FILENAME.
termination (bool, optional) – termination of the training function after last iteration without returning next reaction conditions, defaults to False
- Returns:
parameter set for next experiment.
- Return type:
List
- amlro.generate_training_data.get_next_training_conditions(exp_dir: str, config: Dict, filename: str = 'reactions_data.csv') List[Any][source]
Returns next reaction conditions from the training reaction space.
- Parameters:
- Returns:
parameter set for next experiment.
- Return type:
List
- amlro.generate_training_data.load_training_conditions(training_combo: str = 'training_combo.csv') List[Any][source]
Loads the training data.
Reads the combination file as pandas data frame and return the reaction combination data as list.
- Parameters:
training_combo (str) – training parameter combination file path, defaults to the value of amlro.const.TRAINING_COMBO_FILE.
- Returns:
training parameter combination list
- Return type:
List
- amlro.generate_training_data.write_data_to_training_files(exp_dir: str, config: Dict, parameters: List = [], obj_values: List = [], filename: str = 'reactions_data.csv')[source]
Writes experimental reaction data into the reaction data files.
- Parameters:
exp_dir (str) – experimental directory for saving data files
config (Dict) – Dictionary of parameters
filename (str, optional) – filename for reaction data file, filename for reaction data file, defaults to the value of amlro.const.REACTION_DATA_FILENAME.
obj_values (List, optional) – experimental yield from previous experiment.
filename – filename for reaction data file, defaults to reaction_data.csv.
- amlro.generate_training_data.write_line_to_a_file(training_file: str, reaction_results: str) None[source]
writes a data line to a file
Writes the prev reaction conditions and experimental objective values at the end of the training data file.
- amlro.generate_training_data.write_training_data(exp_dir: str, objectives: List[List], config: Dict, filename: str = 'reactions_data.csv')[source]
writes the experiment data.
This function can be used if we needs to perform all the training conditions together and write the experimental results together. Batch of training experiments results/objectives are neded to collected.
- Parameters:
- Raises:
ValueError – if lengths of training conditions and objectives dont match.
amlro.ml_models module
This module provides a framework for retrieving various regression models along with their corresponding hyperparameter grids for hyper parameter tuning. Regression Models: - ElasticNet - Decision Tree - Random Forest - Gradient Boosting - XGBoost - AdaBoost - Support Vector Regressor - K-Nearest Neighbors (KNN) - Bayesian Ridge
Each function returns a regressor model object and a hyperparameter grid for use in hyperparameter optimization (e.g., using grid search in optimizer.model_training).
The main function get_regressor_model(model: str) serves as the entry point to select a model based on user input, returning the chosen model and grid for training and tuning.
- Usage:
model, param_grid = get_regressor_model(‘xgb’)
- amlro.ml_models.adaboost_regressor() Tuple[AdaBoostRegressor, Dict][source]
Creates an AdaBoost regressor with a grid of hyperparameters.
- Returns:
The AdaBoostRegressor model and hyperparameter grid.
- Return type:
Tuple[AdaBoostRegressor, Dict]
- amlro.ml_models.bayesian_ridge_regressor() Tuple[BayesianRidge, Dict][source]
Creates a BayesianRidge regressor with a grid of hyperparameters.
- Returns:
The BayesianRidge model and hyperparameter grid.
- Return type:
Tuple[BayesianRidge, Dict]
- amlro.ml_models.decision_tree_regressor() Tuple[DecisionTreeRegressor, Dict][source]
Creates a DecisionTree regressor with a grid of hyperparameters.
- Returns:
The DecisionTreeRegressor model and hyperparameter grid.
- Return type:
Tuple[DecisionTreeRegressor, Dict]
- amlro.ml_models.elastic_net_regressor() Tuple[ElasticNet, Dict][source]
Creates an ElasticNet regressor with a grid of hyperparameters.
- Returns:
The ElasticNet model and hyperparameter grid.
- Return type:
Tuple[ElasticNet,Dict]
- amlro.ml_models.get_regressor_model(model: str, seed: int = None) Tuple[object, Dict][source]
Retrieves the specified regressor model and its hyperparameter grid.
- Parameters:
- Returns:
The selected model and its hyperparameter grid.
- Return type:
Tuple[model object, Dict]
- amlro.ml_models.gradient_boost_regressor() Tuple[GradientBoostingRegressor, Dict][source]
Creates a GradientBoosting regressor with a grid of hyperparameters.
- Returns:
The GradientBoostingRegressor model and hyperparameter grid.
- Return type:
Tuple[GradientBoostingRegressor, Dict]
- amlro.ml_models.knn_regressor() Tuple[KNeighborsRegressor, Dict][source]
Creates a K-Nearest Neighbors (KNN) regressor with a grid of hyperparameters.
- Returns:
The KNeighborsRegressor model and hyperparameter grid.
- Return type:
Tuple[KNeighborsRegressor, Dict]
- amlro.ml_models.random_forest_regressor() Tuple[RandomForestRegressor, Dict][source]
Creates a RandomForest regressor with a grid of hyperparameters.
- Returns:
The RandomForestRegressor model and hyperparameter grid.
- Return type:
Tuple[RandomForestRegressor, Dict]
amlro.optimizer module
- amlro.optimizer.categorical_feature_decoding(config: Dict, best_combo: List[Any]) List[Any][source]
This method converts encoded parameter list into decoded list. Convert categorical feature values back into its names.
- Parameters:
config (Dict) – Initial reaction feature configurations
best_combo (List[Any]) – parameter list required for decoding
- Returns:
Decoded parameter list
- Return type:
List[Any]
- amlro.optimizer.categorical_feature_encoding(config: Dict, prev_parameters: List[Any]) List[Any][source]
This method converts decoded parameter list into encoded list. Convert categorical feature values into its numerical values.
- Parameters:
config (Dict) – Initial reaction feature configurations
prev_parameters (List[Any]) – parameter list required for encoding
- Returns:
encoded parameter list
- Return type:
List[Any]
- amlro.optimizer.get_optimized_parameters(exp_dir: str, config: Dict, parameters_list: List[List] = [[]], objectives_list: List[List] = [[]], model: str = 'gb', filename: str = 'reactions_data.csv', batch_size: int = 1, termination: bool = False) List[List][source]
Trains a machine learning model using the provided training data and configuration, then predicts and retrieves the next best reaction parameters based on the optimization objectives. Then, writes the training data and predictions to files.
- Parameters:
exp_dir (str) – experimental directory for saving data files
config (Dict) – Configuration dictionary specifying features, objectives, directions, and other settings for optimizer.
parameters_list (List[List], optional) – List of reaction conditions for training, defaults to [[]].
objectives_list (List[List], optional) – List of objective values corresponding to the reaction conditions, defaults to [[]].
model (str, optional) – The machine learning model type to use, e.g., ‘gb’ for Gradient Boosting, defaults to ‘gb’.
filename (str, optional) – The name of the file where reaction data is stored, defaults to REACTION_DATA_FILENAME.
batch_size (int, optional) – The number of best reaction conditions to return, defaults to 1.
termination (bool, optional) – If True, the function will terminate early after saving data, defaults to False.
- Returns:
A list of lists containing the next predicted reaction conditions.
- Return type:
List[List]
- amlro.optimizer.load_data(reactions_data: str, combination_file: str, config: Dict) Tuple[DataFrame, DataFrame, DataFrame][source]
Loading dataset files
Reads the training set file and all combination file as pandas data frames and split into x train , y train and test datasets. When loading the combination file, data rows will be deleted if they are included in training file.
- amlro.optimizer.mo_model_training(x_train: DataFrame, y_train: DataFrame, model: str = 'gb') object[source]
Train the multi output regressor model and return the best model.
- Parameters:
x_train (pd.Dataframe) – training dataset that contains feature values.
y_train (pd.Dataframe) – target dataframe that contains objective values.
model (str) – Regressor model name. Valid options: ‘ela_net’, ‘dtree’, ‘rf’, ‘gb’, ‘xgb’, ‘aboost’, ‘svr’, ‘knn’, ‘bayesian_ridge’. Default is ‘gb’ (Gradient Boosting).
- Returns:
trained regressor model
- Return type:
model object with a predict method (e.g., sklearn regressor)
- amlro.optimizer.model_training(x_train: DataFrame, y_train: DataFrame, model: str = 'gb') object[source]
Train the regressor model and return the best model.
- Parameters:
x_train (pd.Dataframe) – training dataset that contains feature values.
y_train (pd.Dataframe) – target dataframe that contains objective values.
model (str) – Regressor model name. Valid options: ‘ela_net’, ‘dtree’, ‘rf’, ‘gb’, ‘xgb’, ‘aboost’, ‘svr’, ‘knn’, ‘bayesian_ridge’. Default is ‘gb’ (Gradient Boosting).
- Returns:
trained regressor model
- Return type:
model object with a predict method (e.g., sklearn regressor)
- amlro.optimizer.predict_next_parameters(regr, data: DataFrame, n_curr: int, config: Dict, batch_size: int = 1) DataFrame[source]
Predicts the yield from all the combination data using a trained regressor model and return the best combinations.
The function handles both single-objective and multi-objective optimization. In the single-objective case, it sorts the predicted results based on the specified direction (maximization or minimization). In the multi-objective case, it identifies the Pareto front and ranks solutions by computing weighted sums of normalized objectives. Weights are defined as -1 for min and +1 for max.
- Parameters:
regr (model object with a predict method (e.g., sklearn regressor)) – trained regressor model
data (pd.Dataframe) – test dataset that contains full reaction space
n_curr (int) – current size of the training data
config (Dict) – Dictionary of optimizer parameters
batch_size (int, optional) – Number of reactions conditions need as predictions defaults to 1.
- Returns:
batch of best predicted parameter
- Return type:
pd.Dataframe
- amlro.optimizer.stringify_parameters_objectives(parameters_list: List[List], objectives_list: List[List], config: Dict) Tuple[List[str], List[str]][source]
Combine reaction parameters and respective objectives values as a comma sepearated string for writing into reaction data file.
Convert parameters and their respective objective values into comma-separated strings, both encoded and decoded, for writing into a reaction data file. The function encodes categorical features based on the provided configuration.
- Parameters:
parameters_list (List[List]) – A list of reaction conditions, where each condition is a list of parameter values.
objectives_list (List[List]) – A list of objective values where list of corrosponding objective values for each reaction condition.
config (Dict) – Dictionary of optimizer parameters
- Returns:
Two lists of strings: (1) the parameters with encoded categorical features combined with objective values, and (2) the original (decoded) parameter values combined with objective values.
- Return type:
amlro.pareto module
- amlro.pareto.calculate_frontier_depth(config: Dict, nfeatures: int, n_curr: int, batch_size: int) int[source]
Calculates how many Pareto fronts to explore based on data volume.
Uses a stateless exponential decay function to dynamically balance exploration and exploitation. To prevent premature convergence on sparse datasets, the decay is strictly bounded by the ratio of intial training data and collected datapoints.
- amlro.pareto.get_ranked_pareto_fronts(config: Dict, nfeatures: int, pred_df: DataFrame, frontier_depth: int) DataFrame[source]
Selects and ranks layers of Pareto fronts according to the frontier depth.
Performs non-dominated sorting on the prediction grid. Iteratively extracts the current Pareto optimal front, assigns it a rank, removes it from the grid, and repeats to expose deeper, sub-optimal frontiers.
- Parameters:
- Returns:
Ranked pareto fronts
- Return type:
pd.DataFrame
- amlro.pareto.identify_pareto_front(prediction_data: List, directions: List, nfeatures: int) array[source]
Identifies the Pareto front from a list of points in a multi-objective space.
The Pareto front is a set of non-dominated points in a multi-objective optimization problem. A point is considered to be on the Pareto front if no other point in the set dominates it. This function examines each point and determines whether it should be included in the Pareto front.
- Parameters:
prediction_data (List) – A list of points/predictions, where each point contains both feature values and objective values. The objective values should follow the features in each point.
directions (List) – Optimization direction for each objective. Each entry should be “min” for minimization or “max” for maximization.
nfeatures (int) – Length of the feature space
- Returns:
List of pareto solutions
- Return type:
np.array
- amlro.pareto.is_dominated_by_any(target_point: ndarray, all_points: ndarray, directions: List[str]) bool[source]
Determines whether a single point is Pareto-dominated by ANY point in a dataset.
Pareto dominance is a concept used in multi-objective optimization to compare solutions based on multiple objectives. A point A is said to Pareto-dominate another point B if A is no worse than B in all objectives and strictly better than B in at least one objective.
Instead of one-to-one comparison, this function utilizes high-speed NumPy broadcasting to simultaneously check if target_point is Pareto-dominated by ANY of the solutions in the all_points array. It handles mixed optimization goals by mathematically aligning ‘min’ and ‘max’ directions into a uniform scale.
- Parameters:
target_point (np.ndarray) – A 1D array of a objective values of the the first point.
all_points (np.ndarray) – A 2D array containing the objective values of the entire grid.
directions (List[str]) – Optimization direction for each objective. Each entry should be “min” for minimization or “max” for maximization.
- Returns:
True if target_point is dominated by at least one other point in the dataset. False if it is non-dominated (it belongs on the Pareto front).
- Return type:
- amlro.pareto.is_pareto_dominant(point1: List, point2: List, directions: List) bool[source]
Determines whether one point dominates another in a multi-objective space according to Pareto dominance.
Pareto dominance is a concept used in multi-objective optimization to compare two points (or solutions) based on multiple objectives. A point A is said to Pareto-dominate another point B if A is no worse than B in all objectives and better than B in at least one objective.
This function checks whether point1 Pareto-dominates point2 considering the specified optimization directions for each objective.
- Parameters:
point1 (List) – List of objective values for the first point
point2 (List) – List of objective values for the second point
directions (List) – Optimization direction for each objective. Each entry should be “min” for minimization or “max” for maximization.
- Returns:
True if point1 Pareto-dominates point2, otherwise False.
- Return type:
- amlro.pareto.select_batch_from_ranked_fronts(ranked_df: DataFrame, config: Dict, nfeatures: int, batch_size: int, feature_columns: List[str]) DataFrame[source]
Scores a pool of ranked Pareto fronts and selects a batch using stratified sampling.
Normalizes the objective predictions and applies the user-defined directional weights to calculate a single score. To increase the exploration, it explicitly samples the top candidate from each available frontier depth before filling additional datapoints from forntier rank 1 to meet the required batch size.
- Parameters:
ranked_df (pd.DataFrame) – Data frame with ranked pareto_fronts
config (Dict) – Dictionary of optimizer parameters
nfeatures (int) – Length of the feature space
batch_size (int) – Number of reactions conditions need as predictions.
feature_columns (List[str]) – Parameter names for continous and categorical features
- Returns:
batch of best predicted parameter
- Return type:
pd.Dataframe
amlro.sampling_methods module
- amlro.sampling_methods.feature_scaling(samples: List[List], config: Dict, res_factor: float = 1.0) DataFrame[source]
This function will scale and map the continous and categorical features from latin hypercube and sobol sampling space. These methods will generate coordinates between [0,1] for each dimention.
From the config dictionary, the
config["continuous"]["feature_names"],config["continuous"] ["feature_bounds"], andconfig["continuous"]["resolutions"]are used rescale the continous features andconfig["continuous"]["bounds"]andconfig["continuous"]["feature_names"]are used to rescale categorical features.- Parameters:
samples (List[List]) – sub sample generated from sampling method
config (Dict) – Dictionary of parameters, their bounds and resolution.
res_factor (float, optional) – resolution factor to define rounding decimal places, defaults to 1.0
- Returns:
scaled traning reaction conditions dataframe
- Return type:
pd.DataFrame
- amlro.sampling_methods.latin_hypercube_sampling(config: Dict, sample_size: int = 20, res_factor: float = 1.0) DataFrame[source]
Generate subsample from full reaction space using latent hypercube sampling.
- Parameters:
- Returns:
sub sample of reaction space needed for training set generation
- Return type:
pd.DataFrame
- amlro.sampling_methods.random_sampling(df: DataFrame, sample_size: int = 20) DataFrame[source]
Generate subsample from full reaction space using random sampling.
- Parameters:
df (pd.DataFrame) – Dataframe with full reaction space
sample_size (int, optional) – sub sample size, defaults to 20
- Returns:
sub sample of reaction space needed for training set generation
- Return type:
pd.DataFrame
- amlro.sampling_methods.sobol_sequnce_sampling(config: Dict, sample_size: int = 20, res_factor: float = 1.0) DataFrame[source]
Generate subsample from full reaction space using Sobol sequnce sampling.
- Parameters:
- Returns:
sub sample of reaction space needed for training set generation
- Return type:
pd.DataFrame
amlro.validations module
- amlro.validations.validate_optimizer_config(config: Dict) None[source]
Validates the config dict for the optimizer and generate training set.
- Parameters:
config (Dict) – Configuration to be checked
- Raises:
ValueError – If lengths of directions and objectives do not match.
ValueError – If any direction is not ‘min’ or ‘max’.
- amlro.validations.validate_reaction_scope_config(config: Dict) None[source]
Validates the part of the configuration dictionary for generating grids.
- Parameters:
config (Dict) – Configuration to be checked
- Raises:
ValueError – At least one given bound is invalid.
ValueError – At least one given resolution is invalid.