Skip to content

Improvement to HistFromNtupleProducerTask nJobs Management #257

@aebid

Description

@aebid

Currently, HistFromNtupleProducerTask runs on (nDatasets * nVariables) branches. This is very wasteful:

  1. Branch 0: A large dataset (say TTto2L) must go to the node, localize 100+ input HistTuple files, and create a single new file with only the chosen variable.
  2. The next branch (Branch 1) will then go to a node, localize the same HistTuple files, and create a different new file with a different variable.

If this is repeated for a data/mc production of 50+ variables this introduces a large amount of waste that should be easily avoidable -- make HistFromNtupleProducerTask run on (nDatasets) branches. This just means a single node after localizing all inputs will create several output files, reducing nJobs and saving localize time. This will add some work to detangle the new task structure in the next step (HistMergerTask), but will likely save a large amount of time for data/mc tasks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions