Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by having better primitives on PSObject #8785

Merged
merged 24 commits into from
Mar 21, 2019

Conversation

powercode
Copy link
Collaborator

@powercode powercode commented Jan 29, 2019

By not doing excessive amounts of extra work, formatting can be sped up quite significantly.
The main change comes from adding new, more efficient, primitive to query an object for the existence of an instance member.
The formatting system has been checking for if an object has properties other than some decorated properties added by PS remoting, and it doesn't this by retrieving all properties which results in heavy allocations and wasted cycles.
By adding GetFirstOrDefault to PSObject and similar primitives to the underlying Adapters, we are able to return early, without having to get all properties back.

Test script:

$files = gci -ea:0 c:\windows -Recurse -File
$files | format-table Name, LastAccessTime, Length | out-null
version time speedup
6.1 35.5s 1.0x
PR 8785 4.5s 8x

PR Summary

PR Context

PR Checklist

@powercode powercode changed the title Perf: Faster formatting by having better primitives on PSObject [feature] Faster formatting by having better primitives on PSObject Jan 29, 2019
Copy link
Member

@SteveL-MSFT SteveL-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to take a bit of time to review all the changes. @powercode did you test your changes remotely with WMI and CIM instances?

@powercode powercode changed the title [feature] Faster formatting by having better primitives on PSObject Faster formatting by having better primitives on PSObject Jan 30, 2019
@powercode
Copy link
Collaborator Author

powercode commented Jan 30, 2019

@SteveL-MSFT To be honest, I've done an embarrassingly small amount of testing. Except for running Start-PSPester.

@powercode
Copy link
Collaborator Author

powercode commented Jan 31, 2019

Note to reviewers: @BrucePay @daxian-dbw

I tried another route in this second PR, by adding another delegate for member lookup. It is less intrusive in the previous functionality, so the risk should be lower.
I wonder if you, when you review this, can try to think about if the same things could be accomplished in a smarter way? I'm adding a way to check for the existence of any property in a reasonably efficient way. But in reality, we only use it to check if there exist any property other than our magic remoting note properties. for very specific properties. (The different stream names, and the remoting properties). In the stream case, it is also important which stream it is, but we don't need a PSNoteProperty for that. For example, the 7 boolean flags in PSObject could become bit fields. Maybe we could encode this information there?

I have refactored PSObject to use flags instead of Booleans. I also removed some unused fields, and have plans for using the available padding for speeding up the handling of the remoting properties. I added a field for the stream to use by the formatting system - this was previously magically named PSNoteProperties - and the lookup of these was expensive during formatting.

As you can see, I've implemented a special case of a the Linq operator FirstOrDefault, but I chose not to make this an iterator as I wanted to keep allocations to a minimum. Not sure it's the right call, just trying to explain where I come from :)

I'm a bit weak on the serialization, so that is an area that could need some scrutiny to see if I've messed anything up.

@iSazonov
Copy link
Collaborator

iSazonov commented Feb 2, 2019

the 7 boolean flags in PSObject could become bit fields. Maybe we could encode this information there?

It can break remoting/serialization and backward compatibility.

@powercode
Copy link
Collaborator Author

powercode commented Feb 3, 2019

It can break remoting/serialization and backward compatibility.

I think that is just an implementation detail. We may need to handle that in the deserializer - but good to keep in mind when doing it.
But I don't think these fields are serialized at all - they seem to be set in the constructor when deserializing.

@powercode
Copy link
Collaborator Author

powercode commented Feb 3, 2019

I have a sample implementation here - https://github.com/powercode/PowerShell/tree/PSObjectFlags

Edit:
All but the remoting implementation is merged into this PR. I know a bit to little about the remoting to be confident to make that change now.

@iSazonov
Copy link
Collaborator

iSazonov commented Feb 4, 2019

I have a sample implementation here - https://github.com/powercode/PowerShell/tree/PSObjectFlags

Looks great at first short review! (We'll need measurement of CPU/memory benefits. And main complicity is that we have to manually test remoting and serialization.)

Cleaning up PSObject, making the bools into flag fields, leaving room for using the remaing padding for additional flags.
@powercode
Copy link
Collaborator Author

@iSazonov @SteveL-MSFT Any chance we can turn of the hungarian notation warnings from Code Factor? It has nearly 100% false positives.

I don't intend to fix the remaining code factor issues.

@SteveL-MSFT
Copy link
Member

@powercode I believe we're eventually moving off CodeFactor and over to Codacy. Agree the false positives on hungarian are annoying.

@iSazonov
Copy link
Collaborator

iSazonov commented Feb 7, 2019

Codacy reports tons of false-positives. We need to merge Dongbo's PR that resolve most CodeFactor issues.

Copy link
Collaborator

@vexx32 vexx32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this is way out of my depth, but I hope some of this is helpful! 😊

Nothing serious, mostly nitpicks.

@iSazonov
Copy link
Collaborator

iSazonov commented Feb 8, 2019

@vexx32 I don't look but if you comments is for copy-pasted (not new) code we shouldn't fix it in the PR - it is complicate code review.

@daxian-dbw
Copy link
Member

@powercode I submitted another PR to your branch for some fixes and refactoring: powercode#2. Please take a look when you have time.

@powercode
Copy link
Collaborator Author

Looks good!

@iSazonov
Copy link
Collaborator

It seems we need rename the PR from

Faster formatting by having better primitives on PSObject

to

Improve performance by having better primitives on PSObject

@powercode powercode changed the title Faster formatting by having better primitives on PSObject Improve performance by having better primitives on PSObject Mar 16, 2019
Copy link
Member

@daxian-dbw daxian-dbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Now we need to add tests.
We can add xUnit tests to exercise PSObject.GetFirstPropertyOrDefault.

Add [Feature] tag to the last commit to kick off the feature tests on Linux and macOS.

Copy link
Member

@daxian-dbw daxian-dbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iSazonov
Copy link
Collaborator

iSazonov commented Mar 20, 2019

Can we move style changes to #9149 to follow our best practice?

@daxian-dbw
Copy link
Member

daxian-dbw commented Mar 20, 2019

@iSazonov I have reviewed all changes in this PR and all look good. Separating the refactoring changes will not only cost extra efforts to resolve conflicts but also require another pass of review to make sure no mistake made during those changes. I don't think it worth the effort.

@daxian-dbw
Copy link
Member

All issues reported by Codacy are false alarms.

@iSazonov
Copy link
Collaborator

I don't think it worth the effort.

We agreed to write this in our guide. This allows us to maintain a clear history of functional changes without style ones. Of course it takes effort.
It is not blocking comment.

@daxian-dbw
Copy link
Member

This allows us to maintain a clear history of functional changes without style ones. Of course it takes effort.

I agree that will get us a clear view of the functional change history. But I think it's more appropriate to bring this up at very early stage of the PR life cycle.
When we reach the end stage of a PR, it's too costly to do this.

@daxian-dbw daxian-dbw merged commit 42c289f into PowerShell:master Mar 21, 2019
@iSazonov
Copy link
Collaborator

I think it's more appropriate to bring this up at very early stage of the PR life cycle.

At early stage we haven't all style changes.
From my experience we ready to do this near to finish - and rebase (1) to get new tests, (2) remove unrelated changes.

The main benefit is that the commits are clean.
Otherwise in a few months the details will be forgotten and we will have to figure it out again, not to mention those who will do code analysis in this place for the first time.

When we reach the end stage of a PR, it's too costly to do this.

In previous powercode's PR you did this :-)

@iSazonov
Copy link
Collaborator

@powercode Thanks for the great improvement!

@iSazonov iSazonov added the CL-Engine Indicates that a PR should be marked as an engine change in the Change Log label Mar 21, 2019
@kborowinski
Copy link

@powercode This is outstanding work here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-Engine Indicates that a PR should be marked as an engine change in the Change Log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants