Significant performance improvement#668
Conversation
There was a problem hiding this comment.
Pull request overview
This PR targets analyzer performance hotspots in NestedInvocationWalker by reducing redundant Roslyn semantic symbol queries and avoiding duplicate traversal work, with an additional (unrelated) addition of Rider/JetBrains project configuration files.
Changes:
- Introduced
SymbolInfoCacheto cacheSemanticModel.GetSymbolInfo(...)results perExpressionSyntax. - Updated
NestedInvocationWalker.GetSymbol<T>to use the cache and reduced redundant processing inVisitMemberAccessExpressionfor invocation callees. - Added Rider/JetBrains
.ideaconfiguration files undersrc/.idea/.idea.Acuminator/.idea/.
Reviewed changes
Copilot reviewed 2 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Acuminator/Acuminator.Utilities/Roslyn/SymbolInfoCache.cs | New symbol-info cache (and optional statistics collection) used to reduce repeated semantic queries. |
| src/Acuminator/Acuminator.Utilities/Roslyn/NestedInvocationWalker.cs | Uses the cache in GetSymbol<T> and avoids redundant member-access processing for foo.Bar() patterns. |
| src/.idea/.idea.Acuminator/.idea/vcs.xml | Adds Rider VCS mapping configuration. |
| src/.idea/.idea.Acuminator/.idea/indexLayout.xml | Adds Rider index layout configuration. |
| src/.idea/.idea.Acuminator/.idea/.name / .gitignore | Adds Rider project name and IDE-local ignore rules. |
Files not reviewed (4)
- src/.idea/.idea.Acuminator/.idea/.gitignore: Language not supported
- src/.idea/.idea.Acuminator/.idea/.name: Language not supported
- src/.idea/.idea.Acuminator/.idea/indexLayout.xml: Language not supported
- src/.idea/.idea.Acuminator/.idea/vcs.xml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public StatisticsSnapshot GetSnapshot() | ||
| { | ||
| var lookupsTotal = _hitsTotal + _missesTotal; | ||
| return new StatisticsSnapshot( | ||
| _cacheId, | ||
| _owner, | ||
| _hitsTotal, | ||
| _missesTotal, | ||
| _setsTotal, | ||
| _byTypeCounter, | ||
| CalcRatio(_hitsTotal, lookupsTotal), | ||
| CalcRatio(_missesTotal, lookupsTotal) | ||
| ); |
There was a problem hiding this comment.
@gelverpl this is a good point. It's not really a snapshot if you are passing the original reference which will be mutated after the creation of the snapshot.
| public void Hit() => _hitsTotal++; | ||
|
|
||
| public void Miss() => _missesTotal++; | ||
|
|
||
| public void Set(Type type) | ||
| { | ||
| if (!_byTypeCounter.ContainsKey(type)) | ||
| { | ||
| _byTypeCounter[type] = 1; | ||
| } | ||
| else | ||
| { | ||
| _byTypeCounter[type]++; | ||
| } | ||
| _setsTotal++; | ||
| } |
There was a problem hiding this comment.
This shouldn't be a problem for now since each instance of symbol cache and related statistic is owned by only one dedicated nested invocation walker which is executed only by one thread.
Later, when this cache will be expanded to be shared with all analyzers, this indeed will be a problem.
| <?xml version="1.0" encoding="UTF-8"?> | ||
| <project version="4"> | ||
| <component name="VcsDirectoryMappings"> | ||
| <mapping directory="$PROJECT_DIR$/.." vcs="Git" /> | ||
| </component> | ||
| </project> No newline at end of file |
There was a problem hiding this comment.
@gelverpl this seems like IDE specific. I think it's better to update the .gitignore file in the repo to not track the entire .idea folder.
| public sealed class SymbolInfoCache | ||
| { | ||
| private readonly Dictionary<ExpressionSyntax, SymbolInfo> _map = new(); | ||
| #if SYMBOL_INFO_CACHE_STATISTICS |
There was a problem hiding this comment.
@gelverpl we spoke today about using some kind of feature flags for performance profiling features. Did you decide not to use it in the end?
| namespace Acuminator.Utilities.Roslyn; | ||
|
|
||
| /// <summary> | ||
| /// Caches Roslyn symbol lookup results for expression syntax nodes visited by a syntax walker. |
There was a problem hiding this comment.
IMHO, this can be already a bit generalized to be lookup for syntax nodes, not just expression syntax nodes. There are many other types of nodes that can be passed to the semantic model:
https://learn.microsoft.com/en-us/dotnet/api/microsoft.codeanalysis.semanticmodel?view=roslyn-dotnet-5.0.0
| private static double CalcRatio(long value, long total) => total == 0 ? 0 : (double)value / total; | ||
| } | ||
|
|
||
| public static class SymbolInfoCacheStatisticsContext |
There was a problem hiding this comment.
@gelverpl please use 1 type per file convention (except maybe 1 line records types) and move this to a separate file.
|
|
||
| public static class SymbolInfoCacheStatisticsContext | ||
| { | ||
| private static readonly object Gate = new(); |
There was a problem hiding this comment.
For private fields the current convention is to use underscore-prefixed camel case.
Also it's not very important, but the established naming convention for these objects is to call them _locker
| } | ||
|
|
||
| #if SYMBOL_INFO_CACHE_STATISTICS | ||
| public sealed class SymbolInfoCacheStatistics |
There was a problem hiding this comment.
Should be moved to a separate file.
The type should probably be marked as internal instead of public.
| } | ||
| } | ||
|
|
||
| public record StatisticsSnapshot(int CacheId, Type Owner, long HitsTotal, long MissesTotal, long SetsTotal, Dictionary<Type, long> ByTypeCounter, double HitRatio, double MissRatio); |
There was a problem hiding this comment.
Imho, this DTO can also be in a separate file, but I won't mind if it will be placed in a file with the SymbolInfoCacheStatistics type.
And it also should be internal.
|
|
||
| public static StatisticsSnapshot[] GetStatistics() | ||
| { | ||
| lock (Gate) |
There was a problem hiding this comment.
This is not a big improvement, since GetSnapshot doesn't do a lot currently, but you can reduce the time under the lock in this algorithm, if you make under the lock a copy of the Statistics list. Then you can call GetSnapshot on the copy of the list without locking the Statistics collection.
| var result = new StatisticsSnapshot[Statistics.Count]; | ||
| for (var i = 0; i < Statistics.Count; i++) | ||
| { | ||
| result[i] = Statistics[i].GetSnapshot(); |
There was a problem hiding this comment.
Since it's a static context, it can be shared by multiple analyzers running on different threads. Which means that while the call to GetStatistics is executed on one thread, the other thread may update the counters for a concrete statistic. This make this collection a bit different from a full snapshot and makes the measurement a bit inaccurate - snapshot for one statistic is collected at the moment X, snapshot for the next statistic will be collected for the moment X + some small delta.
For full snapshots you would need to use sync the updates of the counters with the retrieval of statistics... which will make things more difficult for us and introduce undesirable threads synchronization.
I think it's OK to leave it as is since we are not micro benchmarking and looking for milliseconds. But this should be a conscious decision, so I think a comment about this decision should be added here describing it.
|
|
||
| private readonly ISet<(SyntaxNode, DiagnosticDescriptor)> _reportedDiagnostics = new HashSet<(SyntaxNode, DiagnosticDescriptor)>(); | ||
|
|
||
| private readonly SymbolInfoCache _cache; |
There was a problem hiding this comment.
There are other caches in this type, so IMHO a more specific name should be used like _symbolsCache.
| private static readonly List<SymbolInfoCacheStatistics> Statistics = new(); | ||
| private static int _nextCacheId; | ||
|
|
||
| internal static SymbolInfoCacheStatistics Register(Type owner) |
There was a problem hiding this comment.
@gelverpl there can be multiple statistics registered for the same owner (walker). Is it okay to not aggregate them into one statistic?
| /// </summary> | ||
| public sealed class SymbolInfoCache | ||
| { | ||
| private readonly Dictionary<ExpressionSyntax, SymbolInfo> _map = new(); |
There was a problem hiding this comment.
@gelverpl we should cache ISymbol?, there is no need to keep intermediate symbol info DTO.
| var semanticModel = GetSemanticModel(node.SyntaxTree); | ||
|
|
||
| if (semanticModel != null) | ||
| SymbolInfo? cached = _cache.GetOrCreate(node, () => |
There was a problem hiding this comment.
Here should be the retrieval of the symbol from cache, not SymbolInfo.
As I wrote in https://github.com/Acumatica/Acuminator/pull/668/changes#r3197690911, cache should keep ISymbol?.
Also, since the symbol is not always cached, IMHO it's better to call it just symbol
SENya1990
left a comment
There was a problem hiding this comment.
Overall, the PR content is good, there is no need to significantly change anything.
But I have some remarks that should be processed:
- about the type of cached data,
- some minor remarks about the concurrent code
- remarks about the code style.
| _cacheId = cacheId; | ||
| } | ||
|
|
||
| public void Hit() => _hitsTotal++; |
There was a problem hiding this comment.
All counting methods should be annotated with the System.Diagnostics.ConditionalAttribute instead of using pragma directives at their every usage.
https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.conditionalattribute?view=netstandard-2.0
| { | ||
| if (_map.TryGetValue(key, out SymbolInfo cached)) | ||
| { | ||
| #if SYMBOL_INFO_CACHE_STATISTICS |
There was a problem hiding this comment.
All counting methods can be annotated with the System.Diagnostics.ConditionalAttribute instead of using pragma directives at their every usage.
https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.conditionalattribute?view=netstandard-2.0
It won't compile without the pragma directive as is though. So, to make it compile and get rid of to many pragmas, maybe you can always keep the field with statistics object and for regular builds initialize it to null or an empty static object.
| #if SYMBOL_INFO_CACHE_STATISTICS | ||
| public sealed class SymbolInfoCacheStatistics | ||
| { | ||
| private readonly Dictionary<Type, long> _byTypeCounter = new(); |
There was a problem hiding this comment.
Imho, the name can indicate that the keys are type of syntax nodes. For example, it can be _byNodeTypeCounter
|
|
||
| public void Miss() => _missesTotal++; | ||
|
|
||
| public void Set(Type type) |
There was a problem hiding this comment.
Imho, nodeType is clearer
| #endif | ||
|
|
||
| SymbolInfo? potentialValue = factory(); | ||
| if (potentialValue is not null) |
There was a problem hiding this comment.
Why not caching null results? It's not like the semantic model will return a different result with another call.
Proposed changes
NestedInvocationWalkernow caches retrievedISymbolinstances using a newSymbolInfoCache.ISymbol.Motivation
While profiling the analyzer, I noticed extremely high memory usage, and running solution analysis from the CLI took more than 40 minutes.


dotTrace pointed to
Acuminator.Utilities.Roslyn.NestedInvocationWalker.GetSymbol(ExpressionSyntax)as a primary hotspot:Memory profiling confirmed the same method:

Caching
As a load-reducing measure, I implemented caching. This significantly reduced both execution time and memory consumption.
Yes, caching is often a last-resort approach and ideally shouldn't be here. However, when implemented carefully it can also provide valuable diagnostic data that helps guide further optimization work.
Cache statistics
SymbolInfoCachecan optionally collect statistics. Enable it by defining theSYMBOL_INFO_CACHE_STATISTICSconstant, then inspectAcuminator.Utilities.Roslyn.SymbolInfoCache.SymbolInfoCacheStatisticsContext.GetStatisticsunder the debugger.I intentionally did not add any logging of it because the statistics can become very large. It's much more practical to analyze it interactively (e.g., via LINQ in a debug session).
While digging into the stats, I noticed something weird: the same expression was cached under two different key types:
InvocationExpressionSyntaxandMemberAccessExpressionSyntax. Debugging led toVisitMemberAccessExpression.foo.Bar()is bothMemberAccessExpressionSyntaxandInvocationExpressionSyntaxBy filtering out redundant invocation processing inside
MemberAccessExpressionSyntax, I added an additional fix. This change, similarly to caching, delivered a substantial improvement in both time and memory.Benchmarks
I measured the impact of each fix independently, as well as both together.
Total runtime (wall-clock):

Memory allocation / GC pressure:

With both fixes applied, traces no longer show


GetSymbolas a top hotspot; instead, the cache path (GetOrCreate) appears, and overall memory usage is dramatically improved.Conclusion
Even though caching contributes less after the second fix, I'd still like to keep it (at least for upcoming optimization work) as a way to collect detailed statistics. The goal is to reach a hit rate that is 0 or as close to 0 as possible (i.e., eliminate the underlying redundant symbol queries altogether).
I plan to continue working on this area and believe we can achieve even more significant improvements.