-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safer stackallocs #110864
base: main
Are you sure you want to change the base?
Safer stackallocs #110864
Conversation
…k-struct # Conflicts: # src/libraries/System.Private.CoreLib/src/System/Globalization/CalendarData.Browser.cs
I don't know how much of a win it is to use less stack,but in general I think it is preferable to keep a const in for the stack size. It makes it much easier to reason about "how much is going to be stack allocated" during code review and static analysis. cc @GrabYourPitchforks Since we were just chatting about this. |
I don't have a strong preference here, but in some cases it's possible that e.g. |
Related discussion: #97895 |
#110843 is still not fully fixed. It changes a potential stack overflow in to an access violation. This computation is done unchecked: Lines 208 to 209 in b6038a4
Since it is unsigned and unchecked, can overflow to a small positive value When we try to access memory here: Line 234 in b6038a4
The address will be invalid. Can be reproduced on this PR with [Fact]
public static void BigNumberOfCounters()
{
CounterSet counterSet = new(Guid.NewGuid(), Guid.NewGuid(), CounterSetInstanceType.Single);
for (int i = 0; i <= 134_217_726; i++)
{
counterSet.AddCounter(i, CounterType.ElapsedTime);
}
counterSet.CreateCounterSetInstance("potato");
} |
Thanks. Just wrapping the compution into |
Seems reasonable.
I don't see a problem with it. Many will probably be unnecessary, but I don't think it introduces any overhead that is problematic. |
Hardcoded ad-hoc policies like this are never going be "right". The proper fix would be an API like #52065 so that one can just ask for a scratch buffer and leave the choice of strategy to the system. |
I think this is a case where we should be ensuring our code is correctly tuned. If we know the vast majority of cases will be Dynamic stackalloc is also "expensive", it introduces a loop that has to probe pages. All our usages will be a single page probe, which means that will be mispredicted by the CPU in the default scenario, incurring an approx 20 cycle penalty. Since stackalloc should not be getting used in known recursive functions or directly in loops, it is unlikely the predictor will ever get that trained correctly and this will be a "permanent" cost. Dynamic stackalloc also means the JIT cannot easily optimize around it, like it currently does with some small fixed allocation sizes. I also don't think the JIT implicitly doing things to the primitive Notably dynamic length stackallocs are often considered dangerous as well. The pattern we're using here is safe (relatively speaking), but as the pattern gets copied around and used its also easy for people to misunderstand and do the wrong thing. Since we're trying to reduce unsafe code, I think being explicit and adding documentation around these areas and why certain thresholds are used/correct is a better investment; as is pushing for the helper intrinsic that generally simplifies the pattern and abstracts away the "right" sizes to use so that it can be correctly tuned, potentially participate in PGO, etc. |
Reverted that part. Added |
@@ -158,7 +158,7 @@ internal static unsafe bool EvpAeadCipherFinalEx( | |||
notNullOutput = (stackalloc byte[1]).Slice(1); | |||
} | |||
|
|||
fixed (byte* pOutput = &MemoryMarshal.GetReference(notNullOutput)) | |||
fixed (byte* pOutput = notNullOutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is introducing a bug. It looks like the code used MemoryMarshal.GetReference
to avoid normalization of empty spans to null. You would have to also delete .Slice(1)
above to make this change work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
Co-authored-by: Miha Zupan <[email protected]>
This is ready for review. Here is the code analyzer I used to find all unbound stackallocs: https://github.com/EgorBo/UnsafeCodeAnalyzer/blob/main/src/Analyzer/UnboundStackallocAnalyzer.cs (reports 263 places in Libraries and Corelib excluding tests) |
I think if you already use "stackalloc", you should already have a good reason, you work low-level, and should have knowledge about when it's reasonable to do so. Could maybe make an analyzer that warns on stackallocing unverified dynamic lengths. Imo it's better longterm to improve cases where normal new operations are recognized by JIT to not escape, do the length check there and automatically stackalloc them. And leave stackalloc as-is, a programmer controlled thing. Benefits existing code without changing heuristics. If this gets good enough in the future, at some point ImmutableArray.Builder with simple valuetypes populated by a loop, could occur only on the stack up to a certain number of items even putting the builder class on the stack. I don't think simplifying ops like this is needed, except maybe being able to create scratch buffers inside methods that get allocated at callsite (such as ArrayBuilder.Create(int initialCapacity), that could return a ref struct and decide internally if put on stack or heap). Instead of passing in scratch area. |
Presumably, that requires quite complex range check analysis like nullability. Our general strategy is to avoid spending efforts on low-level/unsafe things, if someone decides to go the unsafe route, they're on their own
Nobody argues with that, but it might take a while till we get there, the most complicated part is the Inter-procedure analysis on top of functions which might change their behavior any time (arguments suddenly start escaping) e.g. due to profilers with ReJIT attached in the middle of the flight.
I think till we get to the proper escape analysis it could be a reasonable API to simplify these patterns, I personally don't have a strong opinion on it. Another "cons" of the escape analysis is that it might be fragile and you will never know whether it kicks in or not |
Your reasoning makes very much sense, I agree. For me personally, I think example below would increase my productivity/safety most writing low-level / high-performant code. It would help more than any language word or implicit API. Being able to write methods like this: static MyRefStructArrayBuilder MyRefStructArrayBuilder.CreateOnStackOrHeap();
static MyRefStructArrayBuilder MyRefStructArrayBuilder.CreateOnStackOrArrayPool(); Basically moving responsability for making the stack scratch area, and where to allocate if not fitting the stack, into a common reusable place. Within the helper / builder that's used elsewhere. Rather than having this allocation at each callsite and pass it in. No idea how it would be solved though, as you can't stack alloc inside those and return that area without some weird caller/callee contract. It'd have to be like a macro that's always inlined to the callsite or something. Just some thoughts, maybe someone else can spin off on it, or confirm if they have the same use case as me. |
First batch of changes around various
stackalloc
expressions in BCL.The rules are:
(uint)
casts to handle negative/overflows as well (I've checked all places that it's only applied toint
, no long, etc.). In many places it's redundant, but it shouldn't affect the performance and reduce number of false-positives reported by automated tooling.Also, add
Debug.Assert
to make it more clear.2) For patterns likelen > CONST : stackalloc T[CONST] : new T[len]
we changestackalloc T[CONST]
tostackalloc T[len]
if that CONST is >= 1024 bytes in order to consume less stack. It shouldn't hurt performance since our libs are compiled with SkipLocalsInit (although, a small overhead still there, so we leave it as is for small const sizes).UPD: although, maybe we should do it only when it's saved to Span to make sure nobody relies on some minimal size..
Closes #110843