Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse numeric strings as numbers again during conversions #8681

Merged
merged 15 commits into from
Feb 4, 2019

Conversation

vexx32
Copy link
Collaborator

@vexx32 vexx32 commented Jan 18, 2019

PR Summary

Standardizes numeric conversions by invoking Parser.ScanNumber() during conversions, allowing numeric conversions to recognise and appropriately handle suffixed numerals that are specific to the parser and otherwise not known to PS's standard casting and conversion methods.

  • Added shouldTryCoercion parameter to Parser.ScanNumber() which has a default value of true. When the method is called without this parameter, it will throw instead of attempting to recurse back into LanguagePrimitives.ConvertTo()
  • Added code paths to ConvertStringToInteger(), ConvertStringToReal(), and ConvertStringToDecimal() in LanguagePrimitives to attempt to parse numbers using the PS parser itself.
    • This will allow operations like [int]"1ulgb" where the cast string would otherwise not be recognised as a number.
    • If this parse fails, it will fall back to the current conversion methods to account for things like [int]"1,000" which can be converted with more conventional methods but not recognised by the parser.
  • Altered existing method binder to include the default parameter to prevent weird behaviour.
    • The binder mismatch was causing errors to be completely hidden from normal console display... somehow.

Resolves #7710

/cc @lzybkr @rjmholt @SteveL-MSFT

However, attempting to convert invalid values (that normally result in a parsing exception e.g., [int]"200y") instead triggers a StackOverflowException, crashing PowerShell. I don't know enough about the internals to figure out exactly what's going wrong here, would appreciate some guidance!

@SeeminglyScience helped me find out where it was recursing until stack overflow; the issue is that calling Parser.ScanNumber() will eventually attempt to invoke LanguagePrimitives.ConvertTo() which will eventually attempt to re-invoke the method that originally called Parser.ScanNumber(). Resolved by adding a default parameter which instead throws an exception, and the conversion methods fall back to their original code paths.

PR Context

See #7710

PR Checklist

@vexx32
Copy link
Collaborator Author

vexx32 commented Jan 18, 2019

@lzybkr @rjmholt It seems that by calling ScanNumber() in here, we're potentially triggering recursion when the value is not a valid number value. ScanNumber() calls LanguagePrimitives.ConvertTo() which eventually ends up calling the ConvertStringToInteger or another related conversion method, which just re-invokes Parser.ScanNumber() 😕

if (token == null || !tokenizer.IsAtEndOfScript(token.Extent))
{
// We call ConvertTo, primarily because we expect it will throw an exception,
// but it's possible it could succeed, e.g. if the string had commas, our lexer
// will fail, but Convert.ChangeType could succeed.
return LanguagePrimitives.ConvertTo(str, toType, CultureInfo.InvariantCulture);

This seems to be the source of the stackoverflow I'm encountering here. Not really sure how best to avoid this one, though.

(ty @SeeminglyScience for double-checking my blindness and pointing this out!)

Per @iSazonov's comment in #7710 I'll look to see if we can keep a Parser handy in LanguagePrimitives for this purpose, and perhaps along the way I'll find a good way to circumvent the issue here.

@rjmholt
Copy link
Collaborator

rjmholt commented Jan 18, 2019

Per @iSazonov's comment in #7710 I'll look to see if we can keep a Parser handy in LanguagePrimitives for this purpose, and perhaps along the way I'll find a good way to circumvent the issue here.

That's a great idea!

ScanNumber() calls LanguagePrimitives.ConvertTo() which eventually ends up calling the ConvertStringToInteger or another related conversion method, which just re-invokes Parser.ScanNumber() 😕

The only way to stop it is going to be some way of communicating to ScanNumber() that LanguagePrimitives.ConvertTo() has called it and that it should bottom out. However, LanguagePrimitives.ConvertTo() is public, whereas ScanNumber() is not. So perhaps the ticket is adding a default parameter to the internal ScanNumber() method: Parser.ScanNumber(string strToConvert, Type resultType, bool shouldTryCoercion = true). Then when it's called from LanguagePrimitives.ConvertTo(), you can use Parser.ScanNumber(str, t, shouldTryCoercion: false) while leaving all other calls the same.

The relevant Parser code would then be:

 if (shouldTryCoercion && (token == null || !tokenizer.IsAtEndOfScript(token.Extent))) 
 { 
     // We call ConvertTo, primarily because we expect it will throw an exception, 
     // but it's possible it could succeed, e.g. if the string had commas, our lexer 
     // will fail, but Convert.ChangeType could succeed. 
  
    return LanguagePrimitives.ConvertTo(str, toType, CultureInfo.InvariantCulture); 

@vexx32
Copy link
Collaborator Author

vexx32 commented Jan 19, 2019

@rjmholt @lzybkr appreciate the suggestions! It now no longer overflows.

However, despite also attempting to throw an exception here when handling the recursion, it seems that somehow that's getting swallowed and not emitted like other conversion failures do. Not sure what's happening there?

Hmm. Also, attempting to create a static Parser object for this purpose seems to trigger an AMSI uninitialization failure?

The shell cannot be started. A failure occurred during initialization:
The type initializer for 'System.Management.Automation.Language.Compiler' threw an exception.
Assertion Failed
AMSI should have been uninitialized.

@vexx32 vexx32 closed this Jan 19, 2019
@vexx32 vexx32 reopened this Jan 19, 2019
Why is it so easy to break error handling?
No idea. But there it is.
@vexx32
Copy link
Collaborator Author

vexx32 commented Jan 19, 2019

Okay, all fixed. I think. Let's make sure that runs correctly now... Seems OK locally.

It was very surprising to me to completely break console error display by adding a default parameter to ScanNumber() but now I know for the future, I guess.

If this runs OK I'll look at adding some tests. 😄

@vexx32 vexx32 changed the title WIP: Parse numeric strings as numbers again during conversions Parse numeric strings as numbers again during conversions Jan 19, 2019
@vexx32
Copy link
Collaborator Author

vexx32 commented Jan 20, 2019

🤔 OK, not sure what I broke in this test.

Failure is:

Describing Trusted module on locked down machine should not expose private functions to script debugger command processing
[-] Verifies that private trusted module function is not available in script debugger 11.97s
Expected regular expression 'DebuggerStopHandled' to match NoResult, but it did not match.
172:             $debuggerTester.TestResult | Should -Match "DebuggerStopHandled"
 at <ScriptBlock>, D:\a\1\s\test\powershell\Modules\Microsoft.PowerShell.Security\ConstrainedLanguageDebugger.Tests.ps1: line 172

I can't really figure out how/why this is failing here, unfortunately. Looking at the test, it seems like it originates from a helper module that I don't have any experience with. 😕

@iSazonov
Copy link
Collaborator

iSazonov commented Jan 21, 2019

@PaulHigin Could you please help @vexx32 with the debugger security issue? Is it possible to resolve the issue in the PR context?

@vexx32
Copy link
Collaborator Author

vexx32 commented Jan 22, 2019

@rjmholt @BrucePay @iSazonov I've looked at coding in a static parser instance to reuse for the static class methods, but it might be more trouble than it's worth.

Was discussing it with @SeeminglyScience and he brought up a thread-safety concern; with a static parser instance in place there's no guarantee multiple threads won't try to use it at the same time. 😕

Thread safety isn't something I have a lot of awareness of at the moment, so I think I'll probably not try to get too deep into that here as that's more of a tangent than anything else in this PR. 😄

(Also, I did try it out and it seems to work, but it is breaking something somewhere in the builds, and I'm not sure where... 🙁 It seems that it would have to be declared in the actual Parser class, though, as declaring it in LanguagePrimitives triggers the AMSI uninit failure message.)

@vexx32
Copy link
Collaborator Author

vexx32 commented Jan 31, 2019

Per @lzybkr's comment in #7710, we're not going the static parser instance route here unless there's a compelling reason to do so.

Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests.

@@ -2907,6 +2907,17 @@ private static CultureInfo GetCultureFromFormatProvider(IFormatProvider formatPr
TypeConverter integerConverter = LanguagePrimitives.GetIntegerSystemConverter(resultType);
try
{
try
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is used identically three times I think we should place it in a helper method. Something like "TryScanNumber()".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely could... it's used in three contexts, though (int, decimal, real), and the behaviour when it hits a ParseException differs in all three. I'll see what I can come up with.

resultType,
System.Globalization.CultureInfo.InvariantCulture.NumberFormat);
}
catch (ParseException)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation says Convert.ChangeType() can throw InvalidCastException, FormatException, etc. To prevent more breakage than necessary, it seems like we should catch a generic Exception exception here and let the original code path run.

Copy link
Collaborator Author

@vexx32 vexx32 Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional (I should probably comment it).

The original code path has an outer catch (Exception e) block below, where the general exceptions from ChangeType() are handled. This isn't the clearest code form, I'll readily admit, but it's the simplest way to handle both errors in the right way, I think. I guess I could move the latter return into the catch (ParseException) block instead, but that would make it more difficult to turn into a helper method.

If there's a parse exception, the method goes on to call the general integerConverter.ConvertFrom() method as it was before, but if there's a general exception either there or during the ChangeType() an entirely different code path is taken... assuming I am understanding my try/catch logic there correctly. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that the code now first tries the new TryScanNumber() method before reverting back to the old code path (integerconverter or ChangeType directly on strToConvert). It seems possible that TryScanNumber() could throw an exception when the old code path(s) might have succeeded, causing a regression. To be safe it seems to me it is best to catch generic Exception instead of ParseException in TryScanNumber() to ensure fallback code paths always get a chance at the conversion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean, I think. So the only case I can see that happening is if ScanNumber() succeeds in getting a value, but the subsequent Convert.ChangeType() fails for some reason... and the subsequent code paths wouldn't have necessarily failed anyway on that error.

I would generally expect that if the ChangeType() fails, so too would integerConverter's method anyway, but I suppose there are potential edge cases where this wouldn't go so well... better safe than sorry, and at least in theory it shouldn't have to do the double-check too often, so it should be OK.

Will change, thank you!

@vexx32
Copy link
Collaborator Author

vexx32 commented Jan 31, 2019

I'll take care of the CodeFactor issues tomorrow. 😄

iSazonov and others added 2 commits February 1, 2019 08:22
Co-Authored-By: vexx32 <32407840+vexx32@users.noreply.github.com>
Co-Authored-By: vexx32 <32407840+vexx32@users.noreply.github.com>
Copy link
Collaborator

@iSazonov iSazonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vexx32 After @PaulHigin update code review please look CodeFactor issues and re-format code.

else
{
return Convert.ChangeType(strToConvert, resultType,
System.Globalization.CultureInfo.InvariantCulture.NumberFormat);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format code.

@@ -2977,7 +3022,7 @@ private static CultureInfo GetCultureFromFormatProvider(IFormatProvider formatPr

throw new PSInvalidCastException("InvalidCastFromStringToDecimal", e,
ExtendedTypeSystem.InvalidCastExceptionWithInnerException,
valueToConvert.ToString(), resultType.ToString(), e.Message);
strToConvert, resultType.ToString(), e.Message);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format code.

}
catch (Exception e)
{
typeConversion.WriteLine("Exception converting to double or single: \"{0}\".", e.Message);
throw new PSInvalidCastException("InvalidCastFromStringToDoubleOrSingle", e,
ExtendedTypeSystem.InvalidCastExceptionWithInnerException,
valueToConvert.ToString(), resultType.ToString(), e.Message);
strToConvert, resultType.ToString(), e.Message);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format code.

@vexx32
Copy link
Collaborator Author

vexx32 commented Feb 1, 2019

@iSazonov @PaulHigin that ought to do it. The remaining CodeFactor issues are all "this looks like Hungarian notation" complaints; unless we feel the need to rename half the variables in this file, I don't think it's worth pursuing. 😄

Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iSazonov iSazonov added the CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log label Feb 2, 2019
@iSazonov
Copy link
Collaborator

iSazonov commented Feb 2, 2019

@vexx32 Thanks for your contribution!

@anmenaga The PR is ready to merge.

@anmenaga anmenaga merged commit 130298b into PowerShell:master Feb 4, 2019
@vexx32 vexx32 deleted the ConsolidateConversions branch February 4, 2019 20:23
@iSazonov
Copy link
Collaborator

iSazonov commented Feb 5, 2019

@vexx32 Please create Docs issue and link in the PR description. Thanks!

@vexx32
Copy link
Collaborator Author

vexx32 commented Feb 5, 2019

@iSazonov all done!

@iSazonov iSazonov removed the Documentation Needed in this repo Documentation is needed in this repo label Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent support for converting number strings with binary-multiplier / number-type suffixes to numbers
5 participants