Question
I'm currently trying to use Ax to apply Bayesian Optimization on a task of Hyperparameters Optimization on Machine-Learning models.
I have chosen SAASBO as the surrogate model since I'm working with 30+ parameters, and qNEHVI as the acquisition function since it should be the SOTA for accuracy on MOO tasks. From the paper on qNEHVI (Daulton S., 2021) it's clear that it is not realistic to optimize more than 5 objectives, but my use case requires to keep track of 10 objectives, and for the sake of knowledge, I tried to run this setup.
To my surprise, the defined model can actually optimize a few objectives. Specifically, with 65 observed values (each containing all 10 objectives), the next batch of 5 suggestions can be computed in about 30 minutes on a TR PRO 9995WX.
To add more information about the task, I'm minimizing each of the 10 objectives. Also, I'm using outcome constraints to avoid increasing any objective more than 10% of a baseline value (calculated by evaluating the ML-model with default parameters). In addition, the optimization space is mixed (continuous, discrete, choice) and many linear constraints are applied. Before starting to use BO, I initialize the run with 20 random points with a Sobol generator.
At this point, my hypotheses on why I'm able to compute qNEHVI on 10 objectives are:
-
Since I'm using outcome constraints, the Pareto front used to calculate the hyper-volume contains too few points, making the box decomposition computable. To validate this, I used Ax's Client.compute_analyses(), which shows in the card titled "Pareto Frontier Trials for Experiment" only the baseline datapoint. This observation is unexpected, because analyzing the data by myself, I can identify at least 3 datapoints that improve the baseline, while never regressing an objective more than 10% from the baseline.
However, when removing the outcome constraints, the same card shows 39 points on the Pareto front. I'm still able to compute the acquisition function, so this is probably not the correct reason.
-
The acquisition function is undergoing some heavy approximations. I tried to understand the implementation by looking at the source code, but from my understanding, the default behavior is to not use any approximation for box decomposition (at least reading the docstring from https://github.com/meta-pytorch/botorch/blob/054a0417fc2a2790f60bb6262195ee3f5f5814e9/botorch/utils/multi_objective/hypervolume.py#L509).
-
The issue is related to how I'm building Ax's client between iterations. For compatibility reasons with previous version of my pipeline, I'm creating a new Client between each iteration, by reading from the filesystem all the datapoints observed, and adding each point with a simple loop like:
client = Client()
client.configure_experiment(...)
client.configure_optimization(...)
client.set_generation_strategy(generation_strategy=get_generation_strategy(...))
params, losses = read_experiments()
for pars, loss in zip(params, losses):
idx = client.attach_trial(parameters=params)
client.complete_trial(trial_index=idx, raw_data=loss)
new_points = client.get_next_trials(max_trials=5)
I would really appreciate if someone could give me some advice to understand what is actually happening.
Even if I'm planning to reduce the number of objectives, my goal is to write a paper about said pipeline, and understanding what is actually happening under the hood of Ax and BoTorch is pretty important to me. Thanks in advance to anyone that will take some time to look into this!
Please find below the function used to create the generation strategy.
Please provide any relevant code snippet if applicable.
def get_generation_strategy(initialization_budget: int,
initialization_random_seed: int | None = None,
no_optimization: bool = False) -> GenerationStrategy:
"""
Constructs a GenerationStrategy with the following nodes:
- Sobol node for initial random sampling
- Bayesian optimization node using a SAAS Fully Bayesian GP model.
This strategy is a simplified version of the default Ax generation strategy,
tailored for multi-objective optimization.
Args:
- initialization_budget: number of initial Sobol samples
- initialization_random_seed: random seed for Sobol initialization
- no_optimization: flag to disable optimization, using only Sobol sampling
Returns:
GenerationStrategy: configured generation strategy
"""
if no_optimization:
logger.info("No optimization mode: using only Sobol sampling.")
return GenerationStrategy(name="SobolOnly", nodes=[
GenerationNode(
name="Sobol",
generator_specs=[
GeneratorSpec(
generator_enum=Generators.SOBOL,
model_kwargs={"seed": initialization_random_seed,},
),
],
)
])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
logger.info("Using device: %s for optimization.", device)
mobo_node = GenerationNode(
name="SAASBO",
generator_specs=[
GeneratorSpec(
generator_enum=Generators.SAASBO,
model_kwargs={
"torch_device": device,
"botorch_acqf_class": qLogNoisyExpectedHypervolumeImprovement,
"transform_configs": get_derelativize_config(
derelativize_with_raw_status_quo=True
),
},
)
],
should_deduplicate=True,
)
sobol_node = GenerationNode(
name="Sobol",
generator_specs=[
GeneratorSpec(
generator_enum=Generators.SOBOL,
model_kwargs={"seed": initialization_random_seed},
),
],
transition_criteria=[
MinTrials(
threshold=initialization_budget,
transition_to=mobo_node.name,
use_all_trials_in_exp=True,
)
],
)
nodes = [sobol_node, mobo_node]
logger.info(
"Configured generation strategy with %s initial Sobol samples followed by SAASBO optimization.",
initialization_budget,
)
return GenerationStrategy(name="Sobol+SAASBO", nodes=nodes)
Code of Conduct
Question
I'm currently trying to use Ax to apply Bayesian Optimization on a task of Hyperparameters Optimization on Machine-Learning models.
I have chosen SAASBO as the surrogate model since I'm working with 30+ parameters, and qNEHVI as the acquisition function since it should be the SOTA for accuracy on MOO tasks. From the paper on qNEHVI (Daulton S., 2021) it's clear that it is not realistic to optimize more than 5 objectives, but my use case requires to keep track of 10 objectives, and for the sake of knowledge, I tried to run this setup.
To my surprise, the defined model can actually optimize a few objectives. Specifically, with 65 observed values (each containing all 10 objectives), the next batch of 5 suggestions can be computed in about 30 minutes on a TR PRO 9995WX.
To add more information about the task, I'm minimizing each of the 10 objectives. Also, I'm using outcome constraints to avoid increasing any objective more than 10% of a baseline value (calculated by evaluating the ML-model with default parameters). In addition, the optimization space is mixed (continuous, discrete, choice) and many linear constraints are applied. Before starting to use BO, I initialize the run with 20 random points with a Sobol generator.
At this point, my hypotheses on why I'm able to compute qNEHVI on 10 objectives are:
Since I'm using outcome constraints, the Pareto front used to calculate the hyper-volume contains too few points, making the box decomposition computable. To validate this, I used Ax's Client.compute_analyses(), which shows in the card titled "Pareto Frontier Trials for Experiment" only the baseline datapoint. This observation is unexpected, because analyzing the data by myself, I can identify at least 3 datapoints that improve the baseline, while never regressing an objective more than 10% from the baseline.
However, when removing the outcome constraints, the same card shows 39 points on the Pareto front. I'm still able to compute the acquisition function, so this is probably not the correct reason.
The acquisition function is undergoing some heavy approximations. I tried to understand the implementation by looking at the source code, but from my understanding, the default behavior is to not use any approximation for box decomposition (at least reading the docstring from https://github.com/meta-pytorch/botorch/blob/054a0417fc2a2790f60bb6262195ee3f5f5814e9/botorch/utils/multi_objective/hypervolume.py#L509).
The issue is related to how I'm building Ax's client between iterations. For compatibility reasons with previous version of my pipeline, I'm creating a new Client between each iteration, by reading from the filesystem all the datapoints observed, and adding each point with a simple loop like:
I would really appreciate if someone could give me some advice to understand what is actually happening.
Even if I'm planning to reduce the number of objectives, my goal is to write a paper about said pipeline, and understanding what is actually happening under the hood of Ax and BoTorch is pretty important to me. Thanks in advance to anyone that will take some time to look into this!
Please find below the function used to create the generation strategy.
Please provide any relevant code snippet if applicable.
def get_generation_strategy(initialization_budget: int, initialization_random_seed: int | None = None, no_optimization: bool = False) -> GenerationStrategy: """ Constructs a GenerationStrategy with the following nodes: - Sobol node for initial random sampling - Bayesian optimization node using a SAAS Fully Bayesian GP model. This strategy is a simplified version of the default Ax generation strategy, tailored for multi-objective optimization. Args: - initialization_budget: number of initial Sobol samples - initialization_random_seed: random seed for Sobol initialization - no_optimization: flag to disable optimization, using only Sobol sampling Returns: GenerationStrategy: configured generation strategy """ if no_optimization: logger.info("No optimization mode: using only Sobol sampling.") return GenerationStrategy(name="SobolOnly", nodes=[ GenerationNode( name="Sobol", generator_specs=[ GeneratorSpec( generator_enum=Generators.SOBOL, model_kwargs={"seed": initialization_random_seed,}, ), ], ) ]) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") logger.info("Using device: %s for optimization.", device) mobo_node = GenerationNode( name="SAASBO", generator_specs=[ GeneratorSpec( generator_enum=Generators.SAASBO, model_kwargs={ "torch_device": device, "botorch_acqf_class": qLogNoisyExpectedHypervolumeImprovement, "transform_configs": get_derelativize_config( derelativize_with_raw_status_quo=True ), }, ) ], should_deduplicate=True, ) sobol_node = GenerationNode( name="Sobol", generator_specs=[ GeneratorSpec( generator_enum=Generators.SOBOL, model_kwargs={"seed": initialization_random_seed}, ), ], transition_criteria=[ MinTrials( threshold=initialization_budget, transition_to=mobo_node.name, use_all_trials_in_exp=True, ) ], ) nodes = [sobol_node, mobo_node] logger.info( "Configured generation strategy with %s initial Sobol samples followed by SAASBO optimization.", initialization_budget, ) return GenerationStrategy(name="Sobol+SAASBO", nodes=nodes)Code of Conduct