Matching Regexps 200 Times Faster

I love reading optimization stories so thank you!

So essentially the code got faster by switching to smarter regexp engine that can transform (hopefully) the regexp into a FSM. Cool, but what about replacing the regexp with straightforward parsing code written manually? That way it is statically known that this function is fast all of the time.

At least that’s what I would do in a natively compiled language with access to simd, not sure if ruby has access to it and can detect the best variant at runtime.

And I’d argue plain code is simpler to understand, debug, troubleshoot than a complex regexp.

The regexp engine part reminds me of writing scalar code, relying on the compiler doing autovectorization to make it fast. And what very often happens is that the new compiler version changed their heuristics and now there is no autovectorization that kicks in and the performance falls off a cliff . It’s a risky bet.

eregon 4 months ago

Thanks for the comment.
> Cool, but what about replacing the regexp with straightforward parsing code written manually?
If you take a look at the linked snippets of C code, I think it's clear it's all but straightforward. The regexps OTOH are really short and expressive.
Ruby has no access to SIMD. And writing SIMD is basically writing inline assembly, so it's really tedious and messy.
> relying on the compiler doing autovectorization to make it fast.
I can relate to that, but Regexps are a much smaller domain, and there it's clear SIMD is always a win, so if the regexp engine uses SIMD it's very unlikely it would ever stop using it (unless something faster comes up).
SR2Z 4 months ago

> Cool, but what about replacing the regexp with straightforward parsing code written manually?
It would not be faster - an FSM can trivially be created as a few jump tables which fit in cache. This is one of the rare cases where a compiler can be provably optimal.
Hell, a regex compiler could even transform the FSM to read a 4-character sequence at a time for SIMD if it made sense.

Validark 4 months ago

47 bytes per nanosecond? That's 47 GB/s. My machine couldn't stream data into the CPU that fast. Though if you're exclusively operating within cache then I suppose that's possible.

Honestly, I've never really understood why people care so much about portable binaries unless they're running their OS off a thumb drive on multiple different computers. My computer doesn't change instruction sets. And if I did put a new CPU in my computer, I'd want to get new binaries compiled for a more powerful instruction set.

wtallis 4 months ago

That's not really an extreme amount of bandwidth. DDR4-2933 was mainstream five years ago, and at that speed the two channels consumer CPUs are equipped with adds up to 47 GB/s.
- Validark 4 months ago
  
  Sure, if you are streaming data to ALL cores simultaneously you can get that kind of number. But on a single-thread, I don't think so.

rurban 4 months ago

Using a pcre2 gem would be even faster. Like about 1000x

Validark 4 months ago

Well, the article reports a speed of 47 GB/s. There is no way to be 1000x faster than that.
eregon 4 months ago

No it wouldn't.

JonChesterfield 4 months ago

No backreference support in the state machine, disappointing https://github.com/oracle/graal/blob/master/regex/README.md

conradludgate 4 months ago

Backreferences are not part of the original regular expression theories. Regular expressions are supposed to be equivalent to DFAs, but backrefs upgrade their status to full Turing machines.
Algorithms for running regexes as non-deterministic finite automata just happen to be easy to add backrefs too, but it completely breaks the DFA principle.
Interesting video on "How regexes got catastrophic": https://youtu.be/gITmP0IWff0
HackerThemAll 4 months ago

and lack of backreferences makes them fast, but crippled.