Skip to content

Does streamed output make sense? #7

Description

@hoijui

Basically, this is a CLI tool which:

  1. scans files
  2. collects info/filters
  3. and writes it to 3 output/log files (that could become a multiple of 3, later on).

The input is usually many files (let's say: 100), which are scanned in sequence,
and maybe 3% of the input lines are filtered out and written to the 3 output files.
The question is, which of these methods should I use to do it:

  1. bulk scan & write in the end:
    Scan and filter all input files, storing the selected data in a variable (in memory),
    and in the end: write everything out to the output files at once.
  2. scan input, filter it constantly while reading, and write out to the output whenever something was selected for output; continuously.

I like the second option much more, as it uses less memory and is a stream based approach,
so output could start appearing right when starting to scan input. The question is, whether this potentially decreases overall performance, because we always switch between reading input and writing to one of the 3 output files.

I do not expect the higher memory usage of method 1 to ever be a problem,
and I am not sure how often the stream-approach of method 2 is really an advantage in practice.
I do know that file-system access is the main performance issue of this software,
as this is generally the case,
but also because the computation done here is very minimal.

Maybe I need not worry, and the OS/buffering is going to handle the second method (stream based) just fine?

For now, the tool will be run max 100 times a day, globally,
with ~1MB of input text for each run.
So it is not very critical either way,
but I came across this issue a few times already,
and would like to tackle it and be over with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions