Content-Length: 304873 | pFad | https://github.com/dotnet/runtime/pull/83247

53 [wasm] Introduce jiterpreter control flow pass by kg · Pull Request #83247 · dotnet/runtime · GitHub
Skip to content

[wasm] Introduce jiterpreter control flow pass #83247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 10, 2023

Conversation

kg
Copy link
Member

@kg kg commented Mar 10, 2023

This PR introduces a control flow graph (CFG) pass to the jiterpreter that runs after the initial code generation pass. Things that are currently generated inline like branch target blocks and branches are now recorded in a list of segments, and in a second pass all the segments are stitched together with the necessary webassembly flow control logic inserted inbetween. This allows turning forward branches into direct jumps and turns backward branches into a direct jump paired with a table dispatch. It is theoretically possible to avoid the table dispatch for backward branches, but I'm not smart enough to figure out how to do it in a general way :-)

This should provide large speedups for traces that contain many branch targets, since right now we pay the cost of an eip check for each branch target. The cfg is able to omit all of those checks. For traces containing backward branches the existence of the dispatch table means we still have overhead there, but it's not as bad.

This will probably regress startup time slightly (as visible in the Page Show timing, though I think that is probably noise) due to the second pass and the overhead in tracking segments, but it's possible to optimize that.

Initial browser-bench measurements, compared vs main:

measurement cfg main
AppStart, Page show 15.6019ms 15.3393ms
AppStart, Reach managed 295.5263ms 296.5789ms
Json, non-ASCII text serialize 2.2458ms 3.1807ms
Json, non-ASCII text deserialize 3.8506ms 5.7064ms
Json, small serialize 0.1135ms 0.1263ms
Json, small deserialize 0.1924ms 0.2150ms
Json, large serialize 30.3450ms 34.3533ms
Json, large deserialize 51.4950ms 58.3636ms
Span, Reverse bytes 0.0886ms 0.1198ms
Span, Reverse chars 0.1697ms 0.4297ms
Span, IndexOf bytes 0.5563us 0.8598us
Span, IndexOf chars 0.0404ms 0.0830ms
Span, SequenceEqual bytes 0.0385ms 0.1165ms
Span, SequenceEqual chars 0.0769ms 0.2322ms
String, Normalize 0.9573ms 1.3945ms
String, Normalize ASCII 0.0449ms 0.1718ms
Vector, Create Vector128 0.0900us 0.0878us
Vector, Add 2 Vector128's 0.1052us 0.1046us
Vector, Multiply 2 Vector128's 0.1057us 0.1054us
Vector, Dot product int 0.0996us 0.0983us
Vector, Dot product ulong 0.0876us 0.0877us
Vector, Dot product float 0.1106us 0.1092us
Vector, Dot product double 0.0905us 0.0898us
Vector, Sum sbyte 0.1397us 0.1386us
Vector, Sum short 0.1102us 0.1085us
Vector, Sum uint 0.0905us 0.0891us
Vector, Sum double 0.0895us 0.0883us
Vector, Min float 0.1364us 0.1512us
Vector, Max float 0.1378us 0.1669us
Vector, Min double 0.1116us 0.1179us
Vector, Max double 0.1136us 0.1206us
Vector, Normalize float 0.4070us 0.4039us

@kg kg added arch-wasm WebAssembly architecture area-Codegen-Jiterpreter-mono labels Mar 10, 2023
@ghost ghost assigned kg Mar 10, 2023
@ghost
Copy link

ghost commented Mar 10, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR introduces a control flow graph (CFG) pass to the jiterpreter that runs after the initial code generation pass. Things that are currently generated inline like branch target blocks and branches are now recorded in a list of segments, and in a second pass all the segments are stitched together with the necessary webassembly flow control logic inserted inbetween. This allows turning forward branches into direct jumps and turns backward branches into a direct jump paired with a table dispatch. It is theoretically possible to avoid the table dispatch for backward branches, but I'm not smart enough to figure out how to do it in a general way :-)

This should provide large speedups for traces that contain many branch targets, since right now we pay the cost of an eip check for each branch target. The cfg is able to omit all of those checks. For traces containing backward branches the existence of the dispatch table means we still have overhead there, but it's not as bad.

This will probably regress startup time slightly (as visible in the Page Show timing, though I think that is probably noise) due to the second pass and the overhead in tracking segments, but it's possible to optimize that.

Initial browser-bench measurements, compared vs main:

measurement cfg main
AppStart, Page show 15.6019ms 15.3393ms
AppStart, Reach managed 295.5263ms 296.5789ms
Json, non-ASCII text serialize 2.2458ms 3.1807ms
Json, non-ASCII text deserialize 3.8506ms 5.7064ms
Json, small serialize 0.1135ms 0.1263ms
Json, small deserialize 0.1924ms 0.2150ms
Json, large serialize 30.3450ms 34.3533ms
Json, large deserialize 51.4950ms 58.3636ms
Span, Reverse bytes 0.0886ms 0.1198ms
Span, Reverse chars 0.1697ms 0.4297ms
Span, IndexOf bytes 0.5563us 0.8598us
Span, IndexOf chars 0.0404ms 0.0830ms
Span, SequenceEqual bytes 0.0385ms 0.1165ms
Span, SequenceEqual chars 0.0769ms 0.2322ms
String, Normalize 0.9573ms 1.3945ms
String, Normalize ASCII 0.0449ms 0.1718ms
Vector, Create Vector128 0.0900us 0.0878us
Vector, Add 2 Vector128's 0.1052us 0.1046us
Vector, Multiply 2 Vector128's 0.1057us 0.1054us
Vector, Dot product int 0.0996us 0.0983us
Vector, Dot product ulong 0.0876us 0.0877us
Vector, Dot product float 0.1106us 0.1092us
Vector, Dot product double 0.0905us 0.0898us
Vector, Sum sbyte 0.1397us 0.1386us
Vector, Sum short 0.1102us 0.1085us
Vector, Sum uint 0.0905us 0.0891us
Vector, Sum double 0.0895us 0.0883us
Vector, Min float 0.1364us 0.1512us
Vector, Max float 0.1378us 0.1669us
Vector, Min double 0.1116us 0.1179us
Vector, Max double 0.1136us 0.1206us
Vector, Normalize float 0.4070us 0.4039us
Author: kg
Assignees: -
Labels:

arch-wasm, area-Codegen-Jiterpreter-mono

Milestone: -

kg added 3 commits March 9, 2023 17:02
Generate fallthrough in CFG

Generate branch block header partially in cfg

Emit branches in CFG

Checkpoint: Emit loop and exit return in CFG (broken)

Fix CFG emitting function header in the wrong place

Improve accuracy of cfg size estimation

Remove log messages

Checkpoint: Forward branches partially working

Fix non-conditional branches not being added to target table
Remove fallthrough
@kg kg force-pushed the wasm-jiterpreter-cfg branch from 33dd63a to 17cca49 Compare March 10, 2023 01:02
Use copyWithin to implement appendBytes where possible
More accurate overhead calculation
@kg kg marked this pull request as ready for review March 10, 2023 03:26
@kg kg requested review from lewing and pavelsavara as code owners March 10, 2023 03:26
@lewing lewing merged commit b900242 into dotnet:main Mar 10, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/dotnet/runtime/pull/83247

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy