|
|
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="A module for building and searching with deterministic finite automata (DFAs)."><title>regex_automata::dfa - Rust</title><link rel="preload" as="font" type="font/woff2" crossorigin href="../../static.files/SourceSerif4-Regular-46f98efaafac5295.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../static.files/FiraSans-Regular-018c141bf0843ffd.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../static.files/FiraSans-Medium-8f9a781e4970d388.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../static.files/SourceCodePro-Regular-562dcc5011b6de7d.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../static.files/SourceCodePro-Semibold-d899c5a5c4aeb14a.ttf.woff2"><link rel="stylesheet" href="../../static.files/normalize-76eba96aa4d2e634.css"><link rel="stylesheet" href="../../static.files/rustdoc-ac92e1bbe349e143.css"><meta name="rustdoc-vars" data-root-path="../../" data-static-root-path="../../static.files/" data-current-crate="regex_automata" data-themes="" data-resource-suffix="" data-rustdoc-version="1.76.0 (07dca489a 2024-02-04)" data-channel="1.76.0" data-search-js="search-2b6ce74ff89ae146.js" data-settings-js="settings-4313503d2e1961c2.js" ><script src="../../static.files/storage-f2adc0d6ca4d09fb.js"></script><script defer src="../sidebar-items.js"></script><script defer src="../../static.files/main-305769736d49e732.js"></script><noscript><link rel="stylesheet" href="../../static.files/noscript-feafe1bb7466e4bd.css"></noscript><link rel="alternate icon" type="image/png" href="../../static.files/favicon-16x16-8b506e7a72182f1c.png"><link rel="alternate icon" type="image/png" href="../../static.files/favicon-32x32-422f7d1d52889060.png"><link rel="icon" type="image/svg+xml" href="../../static.files/favicon-2c020d218678b618.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="mobile-topbar"><button class="sidebar-menu-toggle">☰</button></nav><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../regex_automata/index.html">regex_automata</a><span class="version">0.4.9</span></h2></div><h2 class="location"><a href="#">Module dfa</a></h2><div class="sidebar-elems"><section><ul class="block"><li><a href="#modules">Modules</a></li></ul></section><h2><a href="../index.html">In crate regex_automata</a></h2></div></nav><div class="sidebar-resizer"></div>
|
|
|
<main><div class="width-limiter"><nav class="sub"><form class="search-form"><span></span><div id="sidebar-button" tabindex="-1"><a href="../../regex_automata/all.html" title="show sidebar"></a></div><input class="search-input" name="search" aria-label="Run search in the documentation" autocomplete="off" spellcheck="false" placeholder="Click or press ‘S’ to search, ‘?’ for more options…" type="search"><div id="help-button" tabindex="-1"><a href="../../help.html" title="help">?</a></div><div id="settings-menu" tabindex="-1"><a href="../../settings.html" title="settings"><img width="22" height="22" alt="Change settings" src="../../static.files/wheel-7b819b6101059cd0.svg"></a></div></form></nav><section id="main-content" class="content"><div class="main-heading"><h1>Module <a href="../index.html">regex_automata</a>::<wbr><a class="mod" href="#">dfa</a><button id="copy-path" title="Copy item path to clipboard"><img src="../../static.files/clipboard-7571035ce49a181d.svg" width="19" height="18" alt="Copy item path"></button></h1><span class="out-of-band"><a class="src" href="../../src/regex_automata/dfa/mod.rs.html#1-360">source</a> · <button id="toggle-all-docs" title="collapse all docs">[<span>−</span>]</button></span></div><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>A module for building and searching with deterministic finite automata (DFAs).</p>
|
|
|
<p>Like other modules in this crate, DFAs support a rich regex syntax with Unicode
|
|
|
features. DFAs also have extensive options for configuring the best space vs
|
|
|
time trade off for your use case and provides support for cheap deserialization
|
|
|
of automata for use in <code>no_std</code> environments.</p>
|
|
|
<p>If you’re looking for lazy DFAs that build themselves incrementally during
|
|
|
search, then please see the top-level <a href="../hybrid/index.html" title="mod regex_automata::hybrid"><code>hybrid</code> module</a>.</p>
|
|
|
<h2 id="overview"><a href="#overview">Overview</a></h2>
|
|
|
<p>This section gives a brief overview of the primary types in this module:</p>
|
|
|
<ul>
|
|
|
<li>A [<code>regex::Regex</code>] provides a way to search for matches of a regular
|
|
|
expression using DFAs. This includes iterating over matches with both the start
|
|
|
and end positions of each match.</li>
|
|
|
<li>A [<code>dense::DFA</code>] provides low level access to a DFA that uses a dense
|
|
|
representation (uses lots of space, but fast searching).</li>
|
|
|
<li>A [<code>sparse::DFA</code>] provides the same API as a <code>dense::DFA</code>, but uses a sparse
|
|
|
representation (uses less space, but slower searching).</li>
|
|
|
<li>An [<code>Automaton</code>] trait that defines an interface that both dense and sparse
|
|
|
DFAs implement. (A <code>regex::Regex</code> is generic over this trait.)</li>
|
|
|
<li>Both dense DFAs and sparse DFAs support serialization to raw bytes (e.g.,
|
|
|
[<code>dense::DFA::to_bytes_little_endian</code>]) and cheap deserialization (e.g.,
|
|
|
[<code>dense::DFA::from_bytes</code>]).</li>
|
|
|
</ul>
|
|
|
<p>There is also a <a href="onepass/index.html" title="mod regex_automata::dfa::onepass"><code>onepass</code></a> module that provides a <a href="onepass/struct.DFA.html" title="struct regex_automata::dfa::onepass::DFA">one-pass
|
|
|
DFA</a>. The unique advantage of this DFA is that, for the class
|
|
|
of regexes it can be built with, it supports reporting the spans of matching
|
|
|
capturing groups. It is the only DFA in this crate capable of such a thing.</p>
|
|
|
<h2 id="example-basic-regex-searching"><a href="#example-basic-regex-searching">Example: basic regex searching</a></h2>
|
|
|
<p>This example shows how to compile a regex using the default configuration
|
|
|
and then use it to find matches in a byte string:</p>
|
|
|
|
|
|
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::regex::Regex};
|
|
|
|
|
|
<span class="kw">let </span>re = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>)<span class="question-mark">?</span>;
|
|
|
<span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
|
|
<span class="kw">let </span>matches: Vec<Match> = re.find_iter(text).collect();
|
|
|
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
|
|
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
|
|
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
|
|
]);</code></pre></div>
|
|
|
<h2 id="example-searching-with-regex-sets"><a href="#example-searching-with-regex-sets">Example: searching with regex sets</a></h2>
|
|
|
<p>The DFAs in this module all fully support searching with multiple regexes
|
|
|
simultaneously. You can use this support with standard leftmost-first style
|
|
|
searching to find non-overlapping matches:</p>
|
|
|
|
|
|
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::regex::Regex};
|
|
|
|
|
|
<span class="kw">let </span>re = Regex::new_many(<span class="kw-2">&</span>[<span class="string">r"\w+"</span>, <span class="string">r"\S+"</span>])<span class="question-mark">?</span>;
|
|
|
<span class="kw">let </span>text = <span class="string">b"@foo bar"</span>;
|
|
|
<span class="kw">let </span>matches: Vec<Match> = re.find_iter(text).collect();
|
|
|
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
|
|
Match::must(<span class="number">1</span>, <span class="number">0</span>..<span class="number">4</span>),
|
|
|
Match::must(<span class="number">0</span>, <span class="number">5</span>..<span class="number">8</span>),
|
|
|
]);</code></pre></div>
|
|
|
<h2 id="example-use-sparse-dfas"><a href="#example-use-sparse-dfas">Example: use sparse DFAs</a></h2>
|
|
|
<p>By default, compiling a regex will use dense DFAs internally. This uses more
|
|
|
memory, but executes searches more quickly. If you can abide slower searches
|
|
|
(somewhere around 3-5x), then sparse DFAs might make more sense since they can
|
|
|
use significantly less space.</p>
|
|
|
<p>Using sparse DFAs is as easy as using <code>Regex::new_sparse</code> instead of
|
|
|
<code>Regex::new</code>:</p>
|
|
|
|
|
|
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::regex::Regex};
|
|
|
|
|
|
<span class="kw">let </span>re = Regex::new_sparse(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>).unwrap();
|
|
|
<span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
|
|
<span class="kw">let </span>matches: Vec<Match> = re.find_iter(text).collect();
|
|
|
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
|
|
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
|
|
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
|
|
]);</code></pre></div>
|
|
|
<p>If you already have dense DFAs for some reason, they can be converted to sparse
|
|
|
DFAs and used to build a new <code>Regex</code>. For example:</p>
|
|
|
|
|
|
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::regex::Regex};
|
|
|
|
|
|
<span class="kw">let </span>dense_re = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>).unwrap();
|
|
|
<span class="kw">let </span>sparse_re = Regex::builder().build_from_dfas(
|
|
|
dense_re.forward().to_sparse()<span class="question-mark">?</span>,
|
|
|
dense_re.reverse().to_sparse()<span class="question-mark">?</span>,
|
|
|
);
|
|
|
<span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
|
|
<span class="kw">let </span>matches: Vec<Match> = sparse_re.find_iter(text).collect();
|
|
|
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
|
|
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
|
|
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
|
|
]);</code></pre></div>
|
|
|
<h2 id="example-deserialize-a-dfa"><a href="#example-deserialize-a-dfa">Example: deserialize a DFA</a></h2>
|
|
|
<p>This shows how to first serialize a DFA into raw bytes, and then deserialize
|
|
|
those raw bytes back into a DFA. While this particular example is a
|
|
|
bit contrived, this same technique can be used in your program to
|
|
|
deserialize a DFA at start up time or by memory mapping a file.</p>
|
|
|
|
|
|
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::{dense, regex::Regex}};
|
|
|
|
|
|
<span class="kw">let </span>re1 = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>).unwrap();
|
|
|
<span class="comment">// serialize both the forward and reverse DFAs, see note below
|
|
|
</span><span class="kw">let </span>(fwd_bytes, fwd_pad) = re1.forward().to_bytes_native_endian();
|
|
|
<span class="kw">let </span>(rev_bytes, rev_pad) = re1.reverse().to_bytes_native_endian();
|
|
|
<span class="comment">// now deserialize both---we need to specify the correct type!
|
|
|
</span><span class="kw">let </span>fwd: dense::DFA<<span class="kw-2">&</span>[u32]> = dense::DFA::from_bytes(<span class="kw-2">&</span>fwd_bytes[fwd_pad..])<span class="question-mark">?</span>.<span class="number">0</span>;
|
|
|
<span class="kw">let </span>rev: dense::DFA<<span class="kw-2">&</span>[u32]> = dense::DFA::from_bytes(<span class="kw-2">&</span>rev_bytes[rev_pad..])<span class="question-mark">?</span>.<span class="number">0</span>;
|
|
|
<span class="comment">// finally, reconstruct our regex
|
|
|
</span><span class="kw">let </span>re2 = Regex::builder().build_from_dfas(fwd, rev);
|
|
|
|
|
|
<span class="comment">// we can use it like normal
|
|
|
</span><span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
|
|
<span class="kw">let </span>matches: Vec<Match> = re2.find_iter(text).collect();
|
|
|
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
|
|
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
|
|
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
|
|
]);</code></pre></div>
|
|
|
<p>There are a few points worth noting here:</p>
|
|
|
<ul>
|
|
|
<li>We need to extract the raw DFAs used by the regex and serialize those. You
|
|
|
can build the DFAs manually yourself using [<code>dense::Builder</code>], but using
|
|
|
the DFAs from a <code>Regex</code> guarantees that the DFAs are built correctly. (In
|
|
|
particular, a <code>Regex</code> constructs a reverse DFA for finding the starting
|
|
|
location of matches.)</li>
|
|
|
<li>To convert the DFA to raw bytes, we use the <code>to_bytes_native_endian</code> method.
|
|
|
In practice, you’ll want to use either [<code>dense::DFA::to_bytes_little_endian</code>]
|
|
|
or [<code>dense::DFA::to_bytes_big_endian</code>], depending on which platform you’re
|
|
|
deserializing your DFA from. If you intend to deserialize on either platform,
|
|
|
then you’ll need to serialize both and deserialize the right one depending on
|
|
|
your target’s endianness.</li>
|
|
|
<li>Safely deserializing a DFA requires verifying the raw bytes, particularly if
|
|
|
they are untrusted, since an invalid DFA could cause logical errors, panics
|
|
|
or even undefined behavior. This verification step requires visiting all of
|
|
|
the transitions in the DFA, which can be costly. If cheaper verification is
|
|
|
desired, then [<code>dense::DFA::from_bytes_unchecked</code>] is available that only does
|
|
|
verification that can be performed in constant time. However, one can only use
|
|
|
this routine if the caller can guarantee that the bytes provided encoded a
|
|
|
valid DFA.</li>
|
|
|
</ul>
|
|
|
<p>The same process can be achieved with sparse DFAs as well:</p>
|
|
|
|
|
|
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::{sparse, regex::Regex}};
|
|
|
|
|
|
<span class="kw">let </span>re1 = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>).unwrap();
|
|
|
<span class="comment">// serialize both
|
|
|
</span><span class="kw">let </span>fwd_bytes = re1.forward().to_sparse()<span class="question-mark">?</span>.to_bytes_native_endian();
|
|
|
<span class="kw">let </span>rev_bytes = re1.reverse().to_sparse()<span class="question-mark">?</span>.to_bytes_native_endian();
|
|
|
<span class="comment">// now deserialize both---we need to specify the correct type!
|
|
|
</span><span class="kw">let </span>fwd: sparse::DFA<<span class="kw-2">&</span>[u8]> = sparse::DFA::from_bytes(<span class="kw-2">&</span>fwd_bytes)<span class="question-mark">?</span>.<span class="number">0</span>;
|
|
|
<span class="kw">let </span>rev: sparse::DFA<<span class="kw-2">&</span>[u8]> = sparse::DFA::from_bytes(<span class="kw-2">&</span>rev_bytes)<span class="question-mark">?</span>.<span class="number">0</span>;
|
|
|
<span class="comment">// finally, reconstruct our regex
|
|
|
</span><span class="kw">let </span>re2 = Regex::builder().build_from_dfas(fwd, rev);
|
|
|
|
|
|
<span class="comment">// we can use it like normal
|
|
|
</span><span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
|
|
<span class="kw">let </span>matches: Vec<Match> = re2.find_iter(text).collect();
|
|
|
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
|
|
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
|
|
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
|
|
]);</code></pre></div>
|
|
|
<p>Note that unlike dense DFAs, sparse DFAs have no alignment requirements.
|
|
|
Conversely, dense DFAs must be aligned to the same alignment as a
|
|
|
<a href="../util/primitives/struct.StateID.html" title="struct regex_automata::util::primitives::StateID"><code>StateID</code></a>.</p>
|
|
|
<h2 id="support-for-no_std-and-alloc-only"><a href="#support-for-no_std-and-alloc-only">Support for <code>no_std</code> and <code>alloc</code>-only</a></h2>
|
|
|
<p>This crate comes with <code>alloc</code> and <code>std</code> features that are enabled by default.
|
|
|
When the <code>alloc</code> or <code>std</code> features are enabled, the API of this module will
|
|
|
include the facilities necessary for compiling, serializing, deserializing
|
|
|
and searching with DFAs. When only the <code>alloc</code> feature is enabled, then
|
|
|
implementations of the <code>std::error::Error</code> trait are dropped, but everything
|
|
|
else generally remains the same. When both the <code>alloc</code> and <code>std</code> features are
|
|
|
disabled, the API of this module will shrink such that it only includes the
|
|
|
facilities necessary for deserializing and searching with DFAs.</p>
|
|
|
<p>The intended workflow for <code>no_std</code> environments is thus as follows:</p>
|
|
|
<ul>
|
|
|
<li>Write a program with the <code>alloc</code> or <code>std</code> features that compiles and
|
|
|
serializes a regular expression. You may need to serialize both little and big
|
|
|
endian versions of each DFA. (So that’s 4 DFAs in total for each regex.)</li>
|
|
|
<li>In your <code>no_std</code> environment, follow the examples above for deserializing
|
|
|
your previously serialized DFAs into regexes. You can then search with them as
|
|
|
you would any regex.</li>
|
|
|
</ul>
|
|
|
<p>Deserialization can happen anywhere. For example, with bytes embedded into a
|
|
|
binary or with a file memory mapped at runtime.</p>
|
|
|
<p>The <code>regex-cli</code> command (found in the same repository as this crate) can be
|
|
|
used to serialize DFAs to files and generate Rust code to read them.</p>
|
|
|
<h2 id="syntax"><a href="#syntax">Syntax</a></h2>
|
|
|
<p>This module supports the same syntax as the <code>regex</code> crate, since they share the
|
|
|
same parser. You can find an exhaustive list of supported syntax in the
|
|
|
<a href="https://docs.rs/regex/1/regex/#syntax">documentation for the <code>regex</code> crate</a>.</p>
|
|
|
<p>There are two things that are not supported by the DFAs in this module:</p>
|
|
|
<ul>
|
|
|
<li>Capturing groups. The DFAs (and <a href="regex::Regex"><code>Regex</code></a>es built on top
|
|
|
of them) can only find the offsets of an entire match, but cannot resolve
|
|
|
the offsets of each capturing group. This is because DFAs do not have the
|
|
|
expressive power necessary.</li>
|
|
|
<li>Unicode word boundaries. These present particularly difficult challenges for
|
|
|
DFA construction and would result in an explosion in the number of states.
|
|
|
One can enable [<code>dense::Config::unicode_word_boundary</code>] though, which provides
|
|
|
heuristic support for Unicode word boundaries that only works on ASCII text.
|
|
|
Otherwise, one can use <code>(?-u:\b)</code> for an ASCII word boundary, which will work
|
|
|
on any input.</li>
|
|
|
</ul>
|
|
|
<p>There are no plans to lift either of these limitations.</p>
|
|
|
<p>Note that these restrictions are identical to the restrictions on lazy DFAs.</p>
|
|
|
<h2 id="differences-with-general-purpose-regexes"><a href="#differences-with-general-purpose-regexes">Differences with general purpose regexes</a></h2>
|
|
|
<p>The main goal of the <a href="https://docs.rs/regex"><code>regex</code></a> crate is to serve as a
|
|
|
general purpose regular expression engine. It aims to automatically balance low
|
|
|
compile times, fast search times and low memory usage, while also providing
|
|
|
a convenient API for users. In contrast, this module provides a lower level
|
|
|
regular expression interface based exclusively on DFAs that is a bit less
|
|
|
convenient while providing more explicit control over memory usage and search
|
|
|
times.</p>
|
|
|
<p>Here are some specific negative differences:</p>
|
|
|
<ul>
|
|
|
<li><strong>Compilation can take an exponential amount of time and space</strong> in the size
|
|
|
of the regex pattern. While most patterns do not exhibit worst case exponential
|
|
|
time, such patterns do exist. For example, <code>[01]*1[01]{N}</code> will build a DFA
|
|
|
with approximately <code>2^(N+2)</code> states. For this reason, untrusted patterns should
|
|
|
not be compiled with this module. (In the future, the API may expose an option
|
|
|
to return an error if the DFA gets too big.)</li>
|
|
|
<li>This module does not support sub-match extraction via capturing groups, which
|
|
|
can be achieved with the regex crate’s “captures” API.</li>
|
|
|
<li>While the regex crate doesn’t necessarily sport fast compilation times,
|
|
|
the regexes in this module are almost universally slow to compile, especially
|
|
|
when they contain large Unicode character classes. For example, on my system,
|
|
|
compiling <code>\w{50}</code> takes about 1 second and almost 15MB of memory! (Compiling
|
|
|
a sparse regex takes about the same time but only uses about 1.2MB of
|
|
|
memory.) Conversely, compiling the same regex without Unicode support, e.g.,
|
|
|
<code>(?-u)\w{50}</code>, takes under 1 millisecond and about 15KB of memory. For this
|
|
|
reason, you should only use Unicode character classes if you absolutely need
|
|
|
them! (They are enabled by default though.)</li>
|
|
|
<li>This module does not support Unicode word boundaries. ASCII word bondaries
|
|
|
may be used though by disabling Unicode or selectively doing so in the syntax,
|
|
|
e.g., <code>(?-u:\b)</code>. There is also an option to
|
|
|
<a href="crate::dfa::dense::Config::unicode_word_boundary">heuristically enable Unicode word boundaries</a>,
|
|
|
where the corresponding DFA will give up if any non-ASCII byte is seen.</li>
|
|
|
<li>As a lower level API, this module does not do literal optimizations
|
|
|
automatically. Although it does provide hooks in its API to make use of the
|
|
|
<a href="../util/prefilter/struct.Prefilter.html" title="struct regex_automata::util::prefilter::Prefilter"><code>Prefilter</code></a> trait. Missing literal
|
|
|
optimizations means that searches may run much slower than what you’re
|
|
|
accustomed to, although, it does provide more predictable and consistent
|
|
|
performance.</li>
|
|
|
<li>There is no <code>&str</code> API like in the regex crate. In this module, all APIs
|
|
|
operate on <code>&[u8]</code>. By default, match indices are
|
|
|
guaranteed to fall on UTF-8 boundaries, unless either of
|
|
|
<a href="../util/syntax/struct.Config.html#method.utf8" title="method regex_automata::util::syntax::Config::utf8"><code>syntax::Config::utf8</code></a> or
|
|
|
<a href="../nfa/thompson/struct.Config.html#method.utf8" title="method regex_automata::nfa::thompson::Config::utf8"><code>thompson::Config::utf8</code></a> are disabled.</li>
|
|
|
</ul>
|
|
|
<p>With some of the downsides out of the way, here are some positive differences:</p>
|
|
|
<ul>
|
|
|
<li>Both dense and sparse DFAs can be serialized to raw bytes, and then cheaply
|
|
|
deserialized. Deserialization can be done in constant time with the unchecked
|
|
|
APIs, since searching can be performed directly on the raw serialized bytes of
|
|
|
a DFA.</li>
|
|
|
<li>This module was specifically designed so that the searching phase of a
|
|
|
DFA has minimal runtime requirements, and can therefore be used in <code>no_std</code>
|
|
|
environments. While <code>no_std</code> environments cannot compile regexes, they can
|
|
|
deserialize pre-compiled regexes.</li>
|
|
|
<li>Since this module builds DFAs ahead of time, it will generally out-perform
|
|
|
the <code>regex</code> crate on equivalent tasks. The performance difference is likely
|
|
|
not large. However, because of a complex set of optimizations in the regex
|
|
|
crate (like literal optimizations), an accurate performance comparison may be
|
|
|
difficult to do.</li>
|
|
|
<li>Sparse DFAs provide a way to build a DFA ahead of time that sacrifices search
|
|
|
performance a small amount, but uses much less storage space. Potentially even
|
|
|
less than what the regex crate uses.</li>
|
|
|
<li>This module exposes DFAs directly, such as [<code>dense::DFA</code>] and
|
|
|
[<code>sparse::DFA</code>], which enables one to do less work in some cases. For example,
|
|
|
if you only need the end of a match and not the start of a match, then you can
|
|
|
use a DFA directly without building a <code>Regex</code>, which always requires a second
|
|
|
DFA to find the start of a match.</li>
|
|
|
<li>This module provides more control over memory usage. Aside from choosing
|
|
|
between dense and sparse DFAs, one can also choose a smaller state identifier
|
|
|
representation to use less space. Also, one can enable DFA minimization
|
|
|
via [<code>dense::Config::minimize</code>], but it can increase compilation times
|
|
|
dramatically.</li>
|
|
|
</ul>
|
|
|
</div></details><h2 id="modules" class="section-header"><a href="#modules">Modules</a></h2><ul class="item-table"><li><div class="item-name"><a class="mod" href="onepass/index.html" title="mod regex_automata::dfa::onepass">onepass</a></div><div class="desc docblock-short">A DFA that can return spans for matching capturing groups.</div></li></ul></section></div></main></body></html> |