Improvement to HistFromNtupleProducerTask nJobs Management

Currently, HistFromNtupleProducerTask runs on (nDatasets * nVariables) branches. This is very wasteful:
1. Branch 0: A large dataset (say TTto2L) must go to the node, localize 100+ input HistTuple files, and create a single new file with only the chosen variable.
2. The next branch (Branch 1) will then go to a node, localize the same HistTuple files, and create a different new file with a different variable.

If this is repeated for a data/mc production of 50+ variables this introduces a large amount of waste that should be easily avoidable -- make HistFromNtupleProducerTask run on (nDatasets) branches. This just means a single node after localizing all inputs will create several output files, reducing nJobs and saving localize time. This will add some work to detangle the new task structure in the next step (HistMergerTask), but will likely save a large amount of time for data/mc tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement to HistFromNtupleProducerTask nJobs Management #257

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improvement to HistFromNtupleProducerTask nJobs Management #257

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions