Text Search
Greph's text mode is a line-oriented byte search over a walked file tree. It is the default mode of the greph CLI and the Greph::searchText() facade. Output uses the canonical file:line:content format that grep, ripgrep, and most editor integrations parse.
Quick examples
# Regex search across the current directory
./vendor/bin/greph "function\s+\w+"
# Fixed-string search across src/
./vendor/bin/greph -F "function" src
# Case-insensitive whole-word search
./vendor/bin/greph -F -i -w "function" src
# Two lines of context around each match
./vendor/bin/greph -F -C 2 "TODO" src
# Stop after the first three matches per file
./vendor/bin/greph -F -m 3 "TODO" src
# Files that contain the pattern
./vendor/bin/greph -F -l "TODO" src
# Files that do not contain the pattern
./vendor/bin/greph -F -L "TODO" src
# Count matches per file
./vendor/bin/greph -F -c "TODO" src
# Inverted match
./vendor/bin/greph -F -v "TODO" src/notes.txtPatterns
By default the pattern is treated as a PCRE2 regular expression. PHP's preg_* functions back the engine, so the syntax matches the PHP PCRE reference.
Switch off the regex engine with -F to treat the pattern as a literal byte sequence. Fixed-string mode is the fast path: it uses strpos() for matching and skips the PCRE2 compile step entirely.
-i makes both modes case-insensitive. -w adds whole-word boundaries on either side of the match.
Filtering files
Greph walks the file tree before searching. The walker is shared with AST mode and indexed mode, so the same filtering flags apply everywhere.
| Flag | Effect |
|---|---|
--type NAME | Include only files matching the named type alias (php, js, md, ...) |
--type-not NAME | Exclude files matching the named type alias |
--glob GLOB | Include only files whose paths match GLOB. Repeatable. |
--no-ignore | Do not respect .gitignore or .grephignore |
--hidden | Include hidden files |
The walker respects .gitignore and .grephignore by default, skips binary files (detected by null bytes in the first 512 bytes), and caps file size at 10 MiB. Both limits are configurable through WalkOptions.
See API / FileTypeFilter for the full list of built-in type aliases.
Context, counts, and file-only modes
# Two lines before and after each match
./vendor/bin/greph -F -C 2 "needle" src
# Two lines after, no lines before
./vendor/bin/greph -F -A 2 "needle" src
# Two lines before, no lines after
./vendor/bin/greph -F -B 2 "needle" src
# Count matches per file
./vendor/bin/greph -F -c "needle" src
# List matching files only
./vendor/bin/greph -F -l "needle" src
# List non-matching files only
./vendor/bin/greph -F -L "needle" src-c, -l, and -L short-circuit the formatter so they avoid materializing every line. They are the right modes for "how many" and "where" queries.
Parallel scans
Pass -j N to dispatch the scan across N workers. Greph uses pcntl_fork() for the worker pool and falls back to single-process execution when ext-pcntl is unavailable.
./vendor/bin/greph -F -j 8 "function" srcThe facade computes whether parallel execution is worth the overhead based on file count, mode, and pattern shape. Below the threshold it stays single-process. See Advanced / Parallel Workers for the heuristics.
JSON output
--json emits a structured payload that downstream tools can parse without screen-scraping the grep format:
./vendor/bin/greph --json -F "function" src[
{
"file": "src/Greph.php",
"matches": [
{
"line": 31,
"column": 1,
"content": "final class Greph",
"matched_text": "function",
"captures": []
}
]
}
]The rg wrapper emits ripgrep-event-shaped JSON instead. Use it when an existing consumer expects the ripgrep schema.
Programmatic use
use Greph\Greph;
use Greph\Text\TextSearchOptions;
$results = Greph::searchText(
'function',
'src',
new TextSearchOptions(
fixedString: true,
caseInsensitive: true,
beforeContext: 1,
afterContext: 1,
jobs: 4,
),
);
foreach ($results as $file) {
foreach ($file->matches as $match) {
echo "{$file->file}:{$match->line}:{$match->content}\n";
}
}The result objects are documented in API / Result objects.
How it works
The text engine has two paths:
- Literal path: when
-Fis set, the search runsstrpos()over a 64 KiB buffered reader and tracks newlines for line numbering. There is no regex compile step. - Regex path: when
-Fis not set, Greph extracts literal substrings from the pattern using a small PCRE-shaped analyzer (Greph\Text\LiteralExtractor) and uses them as astrpos()pre-filter. Lines that pass the pre-filter are then matched against the compiled PCRE2 pattern with JIT enabled.
Both paths use the same BufferedReader for I/O. The reader reads in 64 KiB chunks instead of fgets()-per-line, which reduces the syscall count dramatically on large files.
For repeated workloads, the indexed text mode replaces the buffered reader with a warmed trigram + identifier postings store and skips most files entirely.