Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce string allocations when formatting file system objects. #8831

Merged
merged 3 commits into from
Feb 7, 2019

Conversation

powercode
Copy link
Collaborator

@powercode powercode commented Feb 5, 2019

PR Summary

Cache FullName property in the ProviderInfo class. Causes extra string allocations for every item in the formatting pipeline.

PR Context

PR Checklist

@adityapatwardhan
Copy link
Member

@powercode Please have a look at the test failure.

@@ -136,6 +143,7 @@ public string ModuleName
internal void SetModule(PSModuleInfo module)
{
Module = module;
_fullName = null; // clear the cached FullName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the comment. Better use _cacheFullModuleName name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it isn't a full module name. That name would be misleading. But sure, I can remove the comment.

Copy link
Collaborator

@iSazonov iSazonov Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not a full name what about _cachedModuleName?

I see there comment "Gets the name of the provider." So the name could be appropriate - _cachedProviderName.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property is FullName, so it makes sense to me that the backing property is _fullName.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to insert your comment to the name that it is a cached :-)

@bergmeister
Copy link
Contributor

bergmeister commented Feb 6, 2019

Awesome. Out of curiosity: Do you have time measurements of cases where this improvement is noticeable?

@powercode
Copy link
Collaborator Author

powercode commented Feb 6, 2019

@bergmeister It's mostly in allocations - it generates lots of extra strings, like 2 strings per item in the pipeline.

I'm reasoning more in terms of steadily removing silly allocations, and fixing the worst offenders perf-wise, and eventually we will have a more snappy shell.

The image below show the dotmemory view of strings allocated, grouped by callstack, when looking at the first 50000 items in the windows dir. It wasn't the biggest, but it was one of the bigger, and it was low hanging fruit. Does it make sense?

image

@bergmeister
Copy link
Contributor

Thanks @powercode. Yes, makes sense. I agree that continous improvement will make it better over time. :)

@powercode
Copy link
Collaborator Author

powercode commented Feb 6, 2019

@bergmeister It may seem like I'm just doing random changes, but I actually measure things now and then :)

dotTrace and dotMemory are almost always running on my machine.

@iSazonov iSazonov self-assigned this Feb 7, 2019
@iSazonov iSazonov added the CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log label Feb 7, 2019
@iSazonov iSazonov merged commit ec88045 into PowerShell:master Feb 7, 2019
@bergmeister
Copy link
Contributor

bergmeister commented Feb 7, 2019

@powercode I was not questioning it, I was just curious for my own education (because knowing this also means that one will know the scenarios where PSCore is stronger than Windows PowerShell meaning that I could recommend an upgrade to clients).
Would you mind having a look at issue #7603 with Import-Csv please which is causing actual OutOfMemory problems in some workflows where CSV files are a couple of GB large.

@iSazonov
Copy link
Collaborator

iSazonov commented Feb 8, 2019

@bergmeister Windows PowerShell is still more efficient than PowerShell Core
in many scenarios. Main reason that .Net Framework engine works differently than .Net Core one. Optimizing individual cmdlets would be fine, but maybe we need something more general.

@powercode
Copy link
Collaborator Author

powercode commented Feb 10, 2019

@bergmeister I'm currently working on optimizations for the formatting system, the filesystem provider, etc that makes it out-perform windows powershell with a huge margin, often like 4x. And with a memory footprint reduction of the similar size.

Import-Csv is problematic, since our property abstraction is heavier that I would like it to be. Especially in cases like that, where we actually have a table, we wouldn't need to store all the metadata for each object. I haven't given it so much thought - just passed by it last fall when doing related work.

I have an idea that I like to try out for Import-CSV - could drastically reduce the memory footprint.
That is to generate a dynamic assembly, with a class containing the fields of the CSV.

@iSazonov
Copy link
Collaborator

I was thinking about compiling to classes too and found some problems in the approach. One problem could be solved by #8852 (not trivial). After finding IDataView #8855 I suppose this is preferred way we should start research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-CodeCleanup Indicates that a PR should be marked as a Code Cleanup change in the Change Log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants