[wasm] Introduce jiterpreter control flow pass #83247

kg · 2023-03-10T00:59:42Z

This PR introduces a control flow graph (CFG) pass to the jiterpreter that runs after the initial code generation pass. Things that are currently generated inline like branch target blocks and branches are now recorded in a list of segments, and in a second pass all the segments are stitched together with the necessary webassembly flow control logic inserted inbetween. This allows turning forward branches into direct jumps and turns backward branches into a direct jump paired with a table dispatch. It is theoretically possible to avoid the table dispatch for backward branches, but I'm not smart enough to figure out how to do it in a general way :-)

This should provide large speedups for traces that contain many branch targets, since right now we pay the cost of an eip check for each branch target. The cfg is able to omit all of those checks. For traces containing backward branches the existence of the dispatch table means we still have overhead there, but it's not as bad.

This will probably regress startup time slightly (as visible in the Page Show timing, though I think that is probably noise) due to the second pass and the overhead in tracking segments, but it's possible to optimize that.

Initial browser-bench measurements, compared vs main:

measurement	cfg	main
AppStart, Page show	15.6019ms	15.3393ms
AppStart, Reach managed	295.5263ms	296.5789ms
Json, non-ASCII text serialize	2.2458ms	3.1807ms
Json, non-ASCII text deserialize	3.8506ms	5.7064ms
Json, small serialize	0.1135ms	0.1263ms
Json, small deserialize	0.1924ms	0.2150ms
Json, large serialize	30.3450ms	34.3533ms
Json, large deserialize	51.4950ms	58.3636ms
Span, Reverse bytes	0.0886ms	0.1198ms
Span, Reverse chars	0.1697ms	0.4297ms
Span, IndexOf bytes	0.5563us	0.8598us
Span, IndexOf chars	0.0404ms	0.0830ms
Span, SequenceEqual bytes	0.0385ms	0.1165ms
Span, SequenceEqual chars	0.0769ms	0.2322ms
String, Normalize	0.9573ms	1.3945ms
String, Normalize ASCII	0.0449ms	0.1718ms
Vector, Create Vector128	0.0900us	0.0878us
Vector, Add 2 Vector128's	0.1052us	0.1046us
Vector, Multiply 2 Vector128's	0.1057us	0.1054us
Vector, Dot product int	0.0996us	0.0983us
Vector, Dot product ulong	0.0876us	0.0877us
Vector, Dot product float	0.1106us	0.1092us
Vector, Dot product double	0.0905us	0.0898us
Vector, Sum sbyte	0.1397us	0.1386us
Vector, Sum short	0.1102us	0.1085us
Vector, Sum uint	0.0905us	0.0891us
Vector, Sum double	0.0895us	0.0883us
Vector, Min float	0.1364us	0.1512us
Vector, Max float	0.1378us	0.1669us
Vector, Min double	0.1116us	0.1179us
Vector, Max double	0.1136us	0.1206us
Vector, Normalize float	0.4070us	0.4039us

ghost · 2023-03-10T00:59:50Z

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR introduces a control flow graph (CFG) pass to the jiterpreter that runs after the initial code generation pass. Things that are currently generated inline like branch target blocks and branches are now recorded in a list of segments, and in a second pass all the segments are stitched together with the necessary webassembly flow control logic inserted inbetween. This allows turning forward branches into direct jumps and turns backward branches into a direct jump paired with a table dispatch. It is theoretically possible to avoid the table dispatch for backward branches, but I'm not smart enough to figure out how to do it in a general way :-)

This should provide large speedups for traces that contain many branch targets, since right now we pay the cost of an eip check for each branch target. The cfg is able to omit all of those checks. For traces containing backward branches the existence of the dispatch table means we still have overhead there, but it's not as bad.

This will probably regress startup time slightly (as visible in the Page Show timing, though I think that is probably noise) due to the second pass and the overhead in tracking segments, but it's possible to optimize that.

Initial browser-bench measurements, compared vs main:

measurement	cfg	main
AppStart, Page show	15.6019ms	15.3393ms
AppStart, Reach managed	295.5263ms	296.5789ms
Json, non-ASCII text serialize	2.2458ms	3.1807ms
Json, non-ASCII text deserialize	3.8506ms	5.7064ms
Json, small serialize	0.1135ms	0.1263ms
Json, small deserialize	0.1924ms	0.2150ms
Json, large serialize	30.3450ms	34.3533ms
Json, large deserialize	51.4950ms	58.3636ms
Span, Reverse bytes	0.0886ms	0.1198ms
Span, Reverse chars	0.1697ms	0.4297ms
Span, IndexOf bytes	0.5563us	0.8598us
Span, IndexOf chars	0.0404ms	0.0830ms
Span, SequenceEqual bytes	0.0385ms	0.1165ms
Span, SequenceEqual chars	0.0769ms	0.2322ms
String, Normalize	0.9573ms	1.3945ms
String, Normalize ASCII	0.0449ms	0.1718ms
Vector, Create Vector128	0.0900us	0.0878us
Vector, Add 2 Vector128's	0.1052us	0.1046us
Vector, Multiply 2 Vector128's	0.1057us	0.1054us
Vector, Dot product int	0.0996us	0.0983us
Vector, Dot product ulong	0.0876us	0.0877us
Vector, Dot product float	0.1106us	0.1092us
Vector, Dot product double	0.0905us	0.0898us
Vector, Sum sbyte	0.1397us	0.1386us
Vector, Sum short	0.1102us	0.1085us
Vector, Sum uint	0.0905us	0.0891us
Vector, Sum double	0.0895us	0.0883us
Vector, Min float	0.1364us	0.1512us
Vector, Max float	0.1378us	0.1669us
Vector, Min double	0.1116us	0.1179us
Vector, Max double	0.1136us	0.1206us
Vector, Normalize float	0.4070us	0.4039us

Author:	kg
Assignees:	-
Labels:	`arch-wasm`, `area-Codegen-Jiterpreter-mono`
Milestone:	-

Generate fallthrough in CFG Generate branch block header partially in cfg Emit branches in CFG Checkpoint: Emit loop and exit return in CFG (broken) Fix CFG emitting function header in the wrong place Improve accuracy of cfg size estimation Remove log messages Checkpoint: Forward branches partially working Fix non-conditional branches not being added to target table Remove fallthrough

Use copyWithin to implement appendBytes where possible More accurate overhead calculation

kg added arch-wasm WebAssembly architecture area-Codegen-Jiterpreter-mono labels Mar 10, 2023

ghost assigned kg Mar 10, 2023

kg added 3 commits March 9, 2023 17:02

Implement backward branches via a dispatch table

e0debad

Cleanup

17cca49

kg force-pushed the wasm-jiterpreter-cfg branch from 33dd63a to 17cca49 Compare March 10, 2023 01:02

Cleanup

a414fb6

This was referenced Mar 10, 2023

[wasm][debugger] DebuggerTests.AssignmentTests.InspectVariableBeforeAndAfterAssignment failing #64188

Closed

Roslyn source generator crash on mono/linux/arm64 #81123

Closed

Remove use of DataView since it has a hazard around heap growth

46211fa

Use copyWithin to implement appendBytes where possible More accurate overhead calculation

kg marked this pull request as ready for review March 10, 2023 03:26

kg requested review from lewing and pavelsavara as code owners March 10, 2023 03:26

lewing approved these changes Mar 10, 2023

View reviewed changes

lewing merged commit b900242 into dotnet:main Mar 10, 2023

ghost locked as resolved and limited conversation to collaborators Apr 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wasm] Introduce jiterpreter control flow pass #83247

[wasm] Introduce jiterpreter control flow pass #83247

Uh oh!

kg commented Mar 10, 2023

Uh oh!

ghost commented Mar 10, 2023

Uh oh!

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

[wasm] Introduce jiterpreter control flow pass #83247

[wasm] Introduce jiterpreter control flow pass #83247

Uh oh!

Conversation

kg commented Mar 10, 2023

Uh oh!

ghost commented Mar 10, 2023

Uh oh!

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!