This can be useful if for all the conditions below, you have :
The fork was born out of necessity to extract a bunch of fediverse addresses from scraped web pages.
We only here describe differences between this fork and the original
ripgrep. Please refer to the original
readme for complete infos.
ripgrep we just offer basic installation facilities, e.g. using
cargo. You can install
cargo if you don’t have it already.
Beforehand, please refer to the original section of the readme
Don’t forget to install the
hyperscan library and sources on your system first. Most distributions provide ready-to-go packages (e.g.
libhyperscan-dev on Debian/Ubuntu) or you can compile it from source.
Note that on some environments if you compile from source (e.g. AWS EC-2) you need to add
-fPIC to the library compilation.
Finally checkout this repository and compile the fork:
$ git clone http://git.sr.ht/~pierrenn/ripgrep $ cd ripgrep $ git submodule update --init --recursive $ cargo install --path . --features 'hyperscan,pcre2' # if you want all 3 engines: default,pcre2,hyperscan $ # cargo install --path . --features 'hyperscan' # or if you want only 2 engines: default,hyperscan
And don’t forget to add Cargo’s bin directory to your path.
Note that the binary name for this fork of ripgrep is also
rg so it will overwrite the original binary (since we only add functionality this shouldn’t be a problem).
TLDR: We just add a new engine named
hyperscan to ripgrep.
To use it :
$ rg --engine hyperscan "my pattern" my_file
or via a file:
$ rg --engine hyperscan -f myregexps my_file
myregexps is a compiled hyperscan DB or a list of regexps in the standard format or the hyperscan format, e.g.:
some default regexp /some hyperscan regexp/imsHV8WcQ
imsHV8WcQ can be any subset of the following (case sensitive) option :
We also provide options
--hyper-ucp to override the value of each textual regular expression provided to hyperscan (ignored if you provide a compiled DB as we don’t support DB edition).
ripgrep options also override all regexps options (again, except when using a compiled hyperscan DB).
Finally, you can also save a compiled database DB to your disk. This can be useful as sometimes most of the time spent by ripgrep is to compile the DB (on a single core).
-d/--hyper-write parameter to save the DB to disk before starting the search :
$ # tell rg to read the myregexps text file, compile the regexps, write them to db.hs and finally search my_file $ rg --engine hyperscan -f myregexps -d db.hs my_file $ $ # now tell rg to directly read the compiled DB and search my_file2 - this will be quicker $ rg --engine hyperscan -f db.hs my_file2
Please refer to the original