multithread the input in this library is a complex task, because we'd have only one encode_dictionary to be kept consistent across each thread. but the output is another matter: we could have each document to be written to disk as soon as it is hashed. as I/O Is our main bottleneck, this could improve performance up to a point. (but how many threads we could have until we are bounded by the machine's maximum I/O capability?)
multithread the input in this library is a complex task, because we'd have only one
encode_dictionaryto be kept consistent across each thread. but the output is another matter: we could have each document to be written to disk as soon as it is hashed. as I/O Is our main bottleneck, this could improve performance up to a point. (but how many threads we could have until we are bounded by the machine's maximum I/O capability?)