Currently, HistFromNtupleProducerTask runs on (nDatasets * nVariables) branches. This is very wasteful:
- Branch 0: A large dataset (say TTto2L) must go to the node, localize 100+ input HistTuple files, and create a single new file with only the chosen variable.
- The next branch (Branch 1) will then go to a node, localize the same HistTuple files, and create a different new file with a different variable.
If this is repeated for a data/mc production of 50+ variables this introduces a large amount of waste that should be easily avoidable -- make HistFromNtupleProducerTask run on (nDatasets) branches. This just means a single node after localizing all inputs will create several output files, reducing nJobs and saving localize time. This will add some work to detangle the new task structure in the next step (HistMergerTask), but will likely save a large amount of time for data/mc tasks.
Currently, HistFromNtupleProducerTask runs on (nDatasets * nVariables) branches. This is very wasteful:
If this is repeated for a data/mc production of 50+ variables this introduces a large amount of waste that should be easily avoidable -- make HistFromNtupleProducerTask run on (nDatasets) branches. This just means a single node after localizing all inputs will create several output files, reducing nJobs and saving localize time. This will add some work to detangle the new task structure in the next step (HistMergerTask), but will likely save a large amount of time for data/mc tasks.