CRT Pow function has bad performance on Windows #10798

fiigii · 2018-07-30T20:34:38Z

During benchmarking AoS/SoA ray-tracer dotnet/coreclr#18839, we found that the Vector3 benchmark (RayTracer) is much slower on Windows than Linux.

Execution time	Windows	Linux
Baseline (RayTracer )	6.00s	4.13s
PacketTracer	1.20s	1.35s
Performance Gains	5.00x	3.06x

According to VTune analysis, this gap is caused by the CRT math library, which RayTracer uses Math.Pow at https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/Performance/CodeQuality/SIMD/RayTracer/Raytracer.cs#L153

Windows

Linux

On the left side (AoS means RayTracer), we can see ucrtbase.dll on Windows has much more time consuming and instruction retired than libm-2.23.so on Linux.

The data is collected on Core i9 + VS2017, but Core i7+ VS2015 has the same performance gap.

The text was updated successfully, but these errors were encountered:

tannergooding · 2018-07-30T20:41:33Z

@fiigii, do you have any metrics for what time is spent in ucrtbase.dll (as well as RayTracer.dll and libm-2.23.so)? I wonder if the metrics are partially "skewed" due to different inlining characteristics of the math libraries/etc.

fiigii · 2018-07-30T20:46:10Z

@tannergooding Here is the VTune data of CRT

fiigii · 2018-07-30T20:48:12Z

The CPI of pow on Windows looks healthy, the issue may be from its algorithm/implementation.

fiigii · 2018-07-30T20:49:46Z

cc @AndyAyersMS @CarolEidt @jkotas

tannergooding · 2018-07-30T20:58:23Z

the issue may be from its algorithm/implementation.

@fiigii, that would be a bit surprising. I'm looking at the implementation and it is some fairly heavily optimized FMA3 code (there is also an SSE2 code path, but you shouldn't hit that).

Unfortunately, that implementation is closed source, so I can't share it here.

tannergooding · 2018-07-30T20:58:43Z

I'm trying to collect a trace locally as well, to see if I get the same.

fiigii · 2018-07-30T21:01:11Z

is some fairly heavily optimized FMA3 code

Right, I saw it also from disasm.

tannergooding · 2018-07-30T21:20:43Z

@fiigii, how was CoreCLR compiled for you?

I'm testing on only a i7-7700 @ 3.6 GHz and the run completes in just 1.231s (as compared to the nearly 6s the above shows) -- This is for Baseline (AoS))

fiigii · 2018-07-30T21:25:57Z

I'm testing on only a i7-7700 @ 3.6 GHz and the run completes in just 1.231s (as compared to the nearly 6s the above shows) -- This is for Baseline (AoS))

I changed the image size from 250x250 to 2480x2480 and just rendered one image (not using RenderLoop) to avoid the collection pool imapct for the profling.
https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/Performance/CodeQuality/SIMD/RayTracer/RayTracerBench.cs#L33-L34
Sorry for the unclarity.

tannergooding · 2018-07-30T21:28:39Z

just rendered one image (not using RenderLoop) to avoid the collection pool imapct for the profling.

So, to clarify, you changed just the following?:

-private const int Width = 250;
-private const int Height = 250;
-private const int Iterations = 7;
+private const int Width = 2480;
+private const int Height = 2480;
+private const int Iterations = 1;

fiigii · 2018-07-30T21:34:02Z

Similar, but I did not use ObjectPool https://github.com/fiigii/PacketTracer/blob/master/baseline/RayTracer/RayTracerBench.cs#L120-L140

tannergooding · 2018-07-31T00:30:52Z

Looks like glibc recently (07 AUG 2017) made a few changes: https://sourceware.org/git/?p=glibc.git;a=commit;h=57a72fa3502673754d14707da02c7c44e83b8d20

Namely, they still use the IBM Accurate Mathematical Library as their root source code, however, they now have some new logic which additionally compiles that code with the -mfma and -mavx2 flags, which provides some automatic transformations/optimizations (it looks like they do a cached CPUID check at runtime and jump to the appropriate code).

Additionally, it looks like, since the calling conventions map up, they generally end up calling libm-2.27.so~__pow directly, rather than having an intermediate call through COMDouble::Pow.

CC. @CarolEidt, @AndyAyersMS, @jkotas

roterdam · 2018-07-31T02:36:06Z

Does CoreCLR not JIT methods with the platform calling convention?

tannergooding · 2018-07-31T15:43:59Z

@roterdam, it does. However, the backing implementations for most System.Math functions aren't managed code, they are "FCALLs" to the underlying C Runtime implementation (through the COMDouble and COMSingle classes).

AaronRobinsonMSFT · 2019-06-14T21:58:06Z

@tannergooding Were you able to reproduce this issue? Is this something we can do or do we need to loop in the VC++ team?

tannergooding · 2019-06-14T22:09:55Z

@AaronRobinsonMSFT. Yes, I was able to reproduce this.

I believe this is already tracked by one of the C++ bugs I logged internally, but I will double-check and log a new one if not.

This is also part of a bigger picture with System.Math/MathF that is being tracked internally. I can share more details offline if necessary.

AaronRobinsonMSFT · 2019-06-14T22:40:00Z

@tannergooding Not necessary. My main goal was simply to set milestones for issues tagged with VM. If this is something that is post-3.0, then feel free to tag as needed. If 3.0, do we have a plan to deliver it?

ghost · 2020-06-23T21:39:49Z

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

dotnet-policy-service · 2024-12-24T02:15:13Z

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

fiigii changed the title ~~CRT math library has bad performance on Windows~~ CRT Pow function has bad performance on Windows Jul 31, 2018

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020

tannergooding added area-System.Numerics and removed area-System.Runtime untriaged New issue has not been triaged by the area owner labels Jun 23, 2020

tarekgh mentioned this issue Sep 25, 2020

Math.Pow and Math.Exp produce inconsistent results on x64 #42747

Open

sfiruch mentioned this issue Nov 22, 2024

System.Math and System.MathF should be implemented in managed code, rather than as FCALLs to the C runtime #9001

Open

dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRT Pow function has bad performance on Windows #10798

CRT Pow function has bad performance on Windows #10798

fiigii commented Jul 30, 2018

tannergooding commented Jul 30, 2018

fiigii commented Jul 30, 2018

fiigii commented Jul 30, 2018

fiigii commented Jul 30, 2018

tannergooding commented Jul 30, 2018

tannergooding commented Jul 30, 2018

fiigii commented Jul 30, 2018 •

edited

Loading

tannergooding commented Jul 30, 2018

fiigii commented Jul 30, 2018

tannergooding commented Jul 30, 2018

fiigii commented Jul 30, 2018

tannergooding commented Jul 31, 2018

roterdam commented Jul 31, 2018

tannergooding commented Jul 31, 2018

AaronRobinsonMSFT commented Jun 14, 2019

tannergooding commented Jun 14, 2019

AaronRobinsonMSFT commented Jun 14, 2019

ghost commented Jun 23, 2020

dotnet-policy-service bot commented Dec 24, 2024

CRT Pow function has bad performance on Windows #10798

CRT Pow function has bad performance on Windows #10798

Comments

fiigii commented Jul 30, 2018

Windows

Linux

tannergooding commented Jul 30, 2018

fiigii commented Jul 30, 2018

fiigii commented Jul 30, 2018

fiigii commented Jul 30, 2018

tannergooding commented Jul 30, 2018

tannergooding commented Jul 30, 2018

fiigii commented Jul 30, 2018 • edited Loading

tannergooding commented Jul 30, 2018

fiigii commented Jul 30, 2018

tannergooding commented Jul 30, 2018

fiigii commented Jul 30, 2018

tannergooding commented Jul 31, 2018

roterdam commented Jul 31, 2018

tannergooding commented Jul 31, 2018

AaronRobinsonMSFT commented Jun 14, 2019

tannergooding commented Jun 14, 2019

AaronRobinsonMSFT commented Jun 14, 2019

ghost commented Jun 23, 2020

dotnet-policy-service bot commented Dec 24, 2024

fiigii commented Jul 30, 2018 •

edited

Loading