add built-in ability to limit search to first N bytes of file/stream (like with `head -c`)

#### Describe your feature request

I recently had the use case of searching an nginx cache (using slicing) containing many 100 GiB of data. Each file contains the response HTTP header for the slice. The slices can be over a MiB in size (could vary depending on nginx config), but the content is irrelevant, all I needed was within the first 1-2KiB of every file.

It was a pain to find certain cache slices because ripgrep has to stream the entire file (it contains a binary header, so binary mode must be used) in case there is no match, which holds for the vast majority of files.

I thought of having a built-in `head -c` that would prevent needless examination of file contents beyond a known header.

An anecdotal benchmark on my system shows a factor 2000 lower search time with a cold filesystem cache and factor 400 with a warm cache. This is a completely different dataset with fewer, larger files, though, the original system is unfortunately no longer accessible to me.

I implemented a draft for this feature here, would love a review and, hopefully, merge: https://github.com/moschroe/ripgrep/tree/feat_head-bytes (should I open a PR?)
It is most likely not as clean as it could be, so I'd be happy to improve it until acceptable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add built-in ability to limit search to first N bytes of file/stream (like with `head -c`) #3035

Describe your feature request

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

add built-in ability to limit search to first N bytes of file/stream (like with head -c) #3035

Description

Describe your feature request

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

add built-in ability to limit search to first N bytes of file/stream (like with `head -c`) #3035