Improve state machine codegen #258

nikomatsakis · 2025-02-18T19:02:52Z

Metadata
Point of contact	@folkertdev
Team(s)	compiler, lang
Goal document	2025h1/improve-rustc-codegen

Summary

We want to improve rustc codegen, based on this initialive by the Trifecta Tech Foundation. The work focusses on improving state machine code generation, and finding (and hopefully fixing) cases where clang produces better code than rustc for roughly equivalent input.

Tasks and status

Discussion and moral support (compiler )
Lang-team experiment (lang )
Lang-team champion (lang )
Refine RFC 3720 (@folkertdev)
Implementation (@folkertdev, @bjorn3)
Standard reviews (compiler )

nikomatsakis · 2025-02-18T19:05:21Z

This issue is intended for status updates only.

For general questions or comments, please contact the owner(s) directly.

traviscross · 2025-02-26T00:53:51Z

This is, I believe, mostly waiting on us on the lang team to have a look, probably in a design meeting, to feel out what's in the realm of possibility for us to accept.

folkertdev · 2025-03-18T11:59:06Z

@traviscross how would we make progress on that? So far we've mostly been talking to @joshtriplett, under the assumption that a #[loop_match] attribute on loops combined with a #[const_continue] attribute on "jumps to the next iteration" will be acceptable as a language experiment.

Our current implementation handles the following

#![feature(loop_match)]

enum State {
    A,
    B,
}

fn main() {
    let mut state = State::A;
    #[loop_match]
    'outer: loop {
        state = 'blk: {
            match state {
                State::A =>
                {
                    #[const_continue]
                    break 'blk State::B
                }
                State::B => break 'outer,
            }
        }
    }
}

Crucially, this does not add syntax, only the attributes and internal logic in MIR lowering for statically performing the pattern match to pick the right branch to jump to.

The main challenge is then to implement this in the compiler itself, which we've been working on (I'll post our tl;dr update shortly)

folkertdev · 2025-03-18T12:00:13Z

Some benchmarks (as of march 18th)

A benchmark of https://github.com/bjorn3/comrak/blob/loop_match_attr/autolink_email.rs, basically a big state machine that is a perfect fit for loop match

Benchmark 1: ./autolink_email
  Time (mean ± σ):      1.126 s ±  0.012 s    [User: 1.126 s, System: 0.000 s]
  Range (min … max):    1.105 s …  1.141 s    10 runs
 
Benchmark 2: ./autolink_email_llvm_dfa
  Time (mean ± σ):     583.9 ms ±   6.9 ms    [User: 581.8 ms, System: 2.0 ms]
  Range (min … max):   575.4 ms … 591.3 ms    10 runs
 
Benchmark 3: ./autolink_email_loop_match
  Time (mean ± σ):     411.4 ms ±   8.8 ms    [User: 410.1 ms, System: 1.3 ms]
  Range (min … max):   403.2 ms … 430.4 ms    10 runs
 
Summary
  ./autolink_email_loop_match ran
    1.42 ± 0.03 times faster than ./autolink_email_llvm_dfa
    2.74 ± 0.07 times faster than ./autolink_email

#[loop_match] beats the status quo, but also beats the llvm flag by a large margin.

A benchmark of zlib decompression with chunks of 16 bytes (this makes the impact of loop_match more visible)

Benchmark 1 (65 runs): target/release/examples/uncompress-baseline rs-chunked 4
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          77.7ms ± 3.04ms    74.6ms … 88.9ms          9 (14%)        0%
  peak_rss           24.1MB ± 64.6KB    24.0MB … 24.2MB          0 ( 0%)        0%
  cpu_cycles          303M  ± 11.8M      293M  …  348M           9 (14%)        0%
  instructions        833M  ±  266       833M  …  833M           0 ( 0%)        0%
  cache_references   3.62M  ±  310K     3.19M  … 4.93M           1 ( 2%)        0%
  cache_misses        209K  ± 34.2K      143K  …  325K           1 ( 2%)        0%
  branch_misses      4.09M  ± 10.0K     4.08M  … 4.13M           5 ( 8%)        0%
Benchmark 2 (68 runs): target/release/examples/uncompress-llvm-dfa rs-chunked 4
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          74.0ms ± 3.24ms    70.6ms … 85.0ms          4 ( 6%)        🚀-  4.8% ±  1.4%
  peak_rss           24.1MB ± 27.1KB    24.0MB … 24.1MB          3 ( 4%)          -  0.1% ±  0.1%
  cpu_cycles          287M  ± 12.7M      277M  …  330M           4 ( 6%)        🚀-  5.4% ±  1.4%
  instructions        797M  ±  235       797M  …  797M           0 ( 0%)        🚀-  4.3% ±  0.0%
  cache_references   3.56M  ±  439K     3.08M  … 5.93M           2 ( 3%)          -  1.8% ±  3.6%
  cache_misses        144K  ± 32.5K     83.7K  …  249K           2 ( 3%)        🚀- 31.2% ±  5.4%
  branch_misses      4.09M  ± 9.62K     4.07M  … 4.12M           1 ( 1%)          -  0.1% ±  0.1%
Benchmark 3 (70 runs): target/release/examples/uncompress-loop-match rs-chunked 4
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          71.6ms ± 2.43ms    69.3ms … 78.8ms          6 ( 9%)        🚀-  7.8% ±  1.2%
  peak_rss           24.1MB ± 72.8KB    23.9MB … 24.2MB         20 (29%)          -  0.0% ±  0.1%
  cpu_cycles          278M  ± 9.59M      270M  …  305M           7 (10%)        🚀-  8.5% ±  1.2%
  instructions        779M  ±  277       779M  …  779M           0 ( 0%)        🚀-  6.6% ±  0.0%
  cache_references   3.49M  ±  270K     3.15M  … 4.17M           4 ( 6%)        🚀-  3.8% ±  2.7%
  cache_misses        142K  ± 25.6K     86.0K  …  197K           0 ( 0%)        🚀- 32.0% ±  4.8%
  branch_misses      4.09M  ± 7.83K     4.08M  … 4.12M           1 ( 1%)          +  0.0% ±  0.1%
Benchmark 4 (69 runs): target/release/examples/uncompress-llvm-dfa-loop-match rs-chunked 4
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          72.8ms ± 2.57ms    69.7ms … 80.0ms          7 (10%)        🚀-  6.3% ±  1.2%
  peak_rss           24.1MB ± 35.1KB    23.9MB … 24.1MB          2 ( 3%)          -  0.1% ±  0.1%
  cpu_cycles          281M  ± 10.1M      269M  …  312M           5 ( 7%)        🚀-  7.5% ±  1.2%
  instructions        778M  ±  243       778M  …  778M           0 ( 0%)        🚀-  6.7% ±  0.0%
  cache_references   3.45M  ±  277K     2.95M  … 4.14M           0 ( 0%)        🚀-  4.7% ±  2.7%
  cache_misses        176K  ± 43.4K      106K  …  301K           0 ( 0%)        🚀- 15.8% ±  6.3%
  branch_misses      4.16M  ± 96.0K     4.08M  … 4.37M           0 ( 0%)        💩+  1.7% ±  0.6%

The important points: loop-match is faster than llfm-dfa, and when combined performance is worse than when using loop-match on its own.

folkertdev · 2025-03-18T12:51:05Z

TL;DR:

We've started work on implementing #[loop_match] on this branch. For the time being integer and enum patterns are supported. The benchmarks, are extremely encouraging, showing large improvements over the status quo, and significant improvements versus -Cllvm-args=-enable-dfa-jump-thread.

Our next steps can be found in the todo file, and focus mostly on improving the code quality and robustness.

traviscross · 2025-03-18T15:50:59Z

Thanks for that update. Have reached out separately.

nikomatsakis added C-tracking-issue T-compiler T-lang labels Feb 18, 2025

nikomatsakis added this to the 2025h1 milestone Feb 18, 2025

nikomatsakis assigned folkertdev Feb 18, 2025

nikomatsakis moved this to Project goals in Lang team features Feb 21, 2025

nikomatsakis added this to Lang team features Feb 21, 2025

nikomatsakis assigned traviscross Feb 21, 2025

nikomatsakis unassigned traviscross Feb 26, 2025

nikomatsakis removed this from Lang team features Mar 3, 2025

nikomatsakis added this to Lang team features Mar 4, 2025

nikomatsakis moved this to Project goal in Lang team features Mar 4, 2025

folkertdev mentioned this issue Mar 21, 2025

Add #[loop_match] for improved DFA codegen rust-lang/rust#138780

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve state machine codegen #258

Improve state machine codegen #258

nikomatsakis commented Feb 18, 2025 •

edited

Loading

nikomatsakis commented Feb 18, 2025

traviscross commented Feb 26, 2025

folkertdev commented Mar 18, 2025

folkertdev commented Mar 18, 2025

folkertdev commented Mar 18, 2025

traviscross commented Mar 18, 2025

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Improve state machine codegen #258

Improve state machine codegen #258

Comments

nikomatsakis commented Feb 18, 2025 • edited Loading

Summary

Tasks and status

nikomatsakis commented Feb 18, 2025

traviscross commented Feb 26, 2025

folkertdev commented Mar 18, 2025

folkertdev commented Mar 18, 2025

folkertdev commented Mar 18, 2025

traviscross commented Mar 18, 2025

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

nikomatsakis commented Feb 18, 2025 •

edited

Loading