Multi-phrase text search using the Aho-Corasick algorithm. Useful when you need to find many strings in a piece of text simultaneously, especially when those phrases are long. No runtime dependencies; works in the browser.
When to use this: Aho-Corasick processes each character of the text once regardless of how many phrases you're searching for. The longer your phrases, the larger the advantage over repeated
indexOfcalls. See the benchmarks below.
npm install ok-text-searchimport Search from 'ok-text-search';
const search = new Search(['something', 'another thing']);
search.add('yet another thing');
search.build();
const results = search.exec('Lorem ipsum dolor something amet');
// [{ index: 18, text: 'something' }]Call order matters: add all phrases, then build() once, then call exec() as many times as needed. Adding phrases after build() throws.
const search = new Search(['Hello World'], { caseInsensitive: true });
search.build();
search.exec('say hello world today');
// [{ index: 4, text: 'Hello World' }]The original phrase casing is preserved in results. Indices are in Unicode code points, so emoji and non-BMP characters count as one position each.
Two approaches are compared:
- simple — a naive loop calling
String.indexOffor each phrase separately - custom — this library (Aho-Corasick)
The benchmark does not include setup time (build()), only the per-search cost.
Run them yourself:
npm run benchmarkThe phrase list is a large dictionary of food names. Short phrases are where V8's native indexOf is hardest to beat.
| Test | Text | Matches | simple | custom | Result |
|---|---|---|---|---|---|
| short foods 1 | 48-char sentence containing one food name | 1 | 959K ops/s | 2,480K ops/s | custom 61% faster |
| short foods 2 | 490-char paragraph with four food names | 4 | 382K ops/s | 203K ops/s | simple 47% faster |
| medium foods 1 | 1,000-char text; includes "Grape" and "Grapefruit" at the same position (overlapping prefix match) | 5 | 197K ops/s | 68K ops/s | simple 65% faster |
Phrases are movie titles like One Flew Over the Cuckoo's Nest and The Lord of the Rings: The Two Towers. Longer than food names but still under 40 chars.
| Test | Text | Matches | simple | custom | Result |
|---|---|---|---|---|---|
| medium movies 1 | 3,800-char text with 8 title occurrences | 8 | 40K ops/s | 24K ops/s | simple 40% faster |
Phrases are complete quotations, e.g. Strive not to be a success, but rather to be of value. –Albert Einstein. This is where Aho-Corasick's advantage becomes decisive: indexOf must compare up to 85 characters per candidate position in the text, while Aho-Corasick touches each character of the text exactly once.
| Test | Text | Matches | simple | custom | Result |
|---|---|---|---|---|---|
| short quotes 1 | 200-char text containing one quote | 1 | 12.9K ops/s | 486K ops/s | custom 37× faster |
| medium quotes 1 | 1,800-char text containing four quotes | 4 | 2.4K ops/s | 53.7K ops/s | custom 22× faster |
| long quotes 1 | 12,000-char text containing three quotes | 3 | 348 ops/s | 6,717 ops/s | custom 19× faster |
The crossover is phrase length, not text length or phrase count:
- Short phrases (< ~20 chars):
indexOfwins in most cases. V8's implementation is highly tuned for short patterns and the trie traversal overhead outweighs the algorithmic benefit. - Long phrases (40+ chars): Aho-Corasick wins decisively, with 19–37× speedups observed. The gain grows with text length because the O(n) guarantee compounds over a longer scan.
The one exception in the food tests (short foods 1) is because only a single phrase was present in the result set — the trie lookup for a single pattern is essentially free compared to the cost of scanning even a short string.