Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce race condition probability for config file access #8779

Merged
merged 5 commits into from
Mar 10, 2019

Conversation

iSazonov
Copy link
Collaborator

@iSazonov iSazonov commented Jan 29, 2019

PR Summary

Related #8715.

  1. Replace EnterReadLock() with TryEnterReadLock().
    EnterReadLock() might never return.
  2. Increase the number of read file lock attempts.
    Reduce inter-process race condition probability.
  3. Use WaitForFile() for write file lock.
    Reduce inter-process race condition probability.
  4. Replace EnterWriteLock() with TryEnterWriteLock().
    EnterWriteLock() might never return.
  5. Return default if the config file can not be read (added catch (IOExeption)).

PR Context

Currently we see race condition failures in xUnit tests. The race conditions can be inter-thread and inter-process.

PR Checklist

@iSazonov
Copy link
Collaborator Author

iSazonov commented Feb 7, 2019

@SteveL-MSFT I think we can not release 6.2 without fixing the race condition.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Feb 8, 2019

@SteveL-MSFT @adityapatwardhan @daxian-dbw Friendly ping - I think it is important to fix before RC.

@daxian-dbw daxian-dbw self-assigned this Feb 19, 2019
Copy link
Member

@daxian-dbw daxian-dbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better approach to reduce race conditions among pwsh processes is to read the file once on the first read request, and then cache the JObject for future read requests. In this way, all subsequent read operations won't need to touch the file. Update operation still needs to touch the file, and may need to invalidate the cache afterwards depending on the specific operation (updating experimental features doesn't need to invalidate the cache). In this way, we can minimize the need to touch the file, and thus minimize the chance of race condition. Thoughts?

@@ -389,36 +389,42 @@ internal PSKeyword GetLogKeywords()
if (!File.Exists(fileName)) { return defaultValue; }

// Open file for reading, but allow multiple readers
fileLock.EnterReadLock();
try
if (fileLock.TryEnterReadLock(millisecondsTimeout: 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fileLock is for synchronization within a process, and thus won't help the conflicts between processes. With this context, is the TryEnterReadLock necessary?
Besides, you use 10 ms timeout, but WaitForFile itself can take up to 400 ms (20 * 20) to finish, this doesn't seem to make sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this context, is the TryEnterReadLock necessary?

We have race conditions in xUnit tests. It is related threads. And I am trying to address only the thread race conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You saw EnterReadLock() got blocked for a long time in xUnit test run? Do you still have the link to that CI run?
That's possible only if there are a lot write operations and each write operation takes a relatively long time. I doubt this would be possible in real world scenarios.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some issues #8715 #8784 #8838 where we tracking xUnit test failures and there is links to CIs.

That's possible only if there are a lot write operations and each write operation takes a relatively long time. I doubt this would be possible in real world scenarios.

If somebody open the config file with read lock PowerShell doesn't start.
Try:

$path = "powershell.config.json "
$mode = "Open"
$access = "Read"
$share = "None"

$file = [System.IO.File]::Open($path, $mode, $access, $share)

then try to start new PowerShell Core. It is DoS attack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the issues you mentioned, and they all are related to intra-process conflicts, not race conditions within a process.
For the DoS attack example, it's still an intra-process scenario, where fileLock doesn't play a role.

We discussed about this DoS attack in #8249 (comment). Returning default value when the config file is not accessible will cause security settings to not be enforced, which would be another security hole. @TravisEz13 and I chatted about this offline. We have some thoughts but immature overall. Need to think about this problem more.

But this is not a release blocker, as it's an existing issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the issues you mentioned, and they all are related to intra-process conflicts

Sorry, it seems I am setting it up unclear. Yes, the issues is related to intra-process. Root case is file locks. My idea for the PR is to decrease file lock times internally in the process - it will reduce probability of intra-process locks and waitings too. If you agree let's consider the PR only in the context to simplify PR. Of cause we would fix all in one PR although it is not clear how to do it in better way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's focus on this PR specifically.

My idea for the PR is to decrease file lock times internally in the process - it will reduce probability of intra-process locks and waitings too.

IMHO, I don't think filelock affects the race condition among processes, and making it timeout in 10 ms doesn't make sense given waitfile can block for 400 ms within the file lock.
I think for this PR, maybe the only change that should be made is to use WaitForFile in UpdateValueInFile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replace fileLock.EnterReadLock() because it can fall in infinite wait. I do not know whether this can happen. Since testing race conditions is too complicated it will be more safe. I'd don't want to change current logic of the code until we push follow PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replace fileLock.EnterReadLock() because it can fall in infinite wait

I would argue that the possibility is extremely slim unless we have data to prove otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rely on our experience.
Reverted.

@iSazonov
Copy link
Collaborator Author

@daxian-dbw The PR is aimed only at intra-thread race conditions that we catch in xUnit tests. I want to keep the PR as simple as possible and move intra-process lock fixes to follow PR.
I believe it is important that these two PR must be made before we release 6.2.
Perhaps the second PR will replace most of changes of the PR. I didn't think it over deep. For now, in short, I think we could do the same as in AnalisisCache.

@daxian-dbw
Copy link
Member

I think we could do the same as in AnalisisCache

We cannot do the same as in AnalysisCache. For AnalysisCache, the cache file is only for perf, and we are able to get the accurate metadata even without the cache file. Therefore, we can ignore the cache file if read/write operation fails.

@iSazonov
Copy link
Collaborator Author

Therefore, we can ignore the cache file if read/write operation fails.

What should be the behavior for the configuration file?
My main concern now is that if someone has blocked this file, then all PowerShell Core processes will terminate abnormally (We should consider side-by-side and hosting scenarios too). This means that the only reliable scenario is to always rely on the default settings and/or current settings and ignore the settings file if we cannot read it (or write) for the allotted time (preferably in the background process).

@daxian-dbw
Copy link
Member

My main concern now is that if someone has blocked this file, then all PowerShell Core processes will terminate abnormally

Using default values for settings is fine except for security settings. So one option is to use the most secure values for the security settings when the file is not accessible, and the default values for other settings. So when the file is deliberately locked, a new pwsh instance can be started, but with all security features configurable via powershell.config.json turned on. But say the new pwsh instance want to set value for a setting, that will fail, and how to deal with that would be another problem.

There are two set of problems:

  1. Reduce chance of race condition among processes, and mitigate the race condition when it happens.
    • Reading the config file only once and cache the whole JObject for future use is a way to reduce race condition
    • The WaitForFile change you made is a way to mitigate the race condition when it happens.
  2. Handle the worst scenario where it's impossible to access the file within the reasonable time period.

The problem 2 is a hard one, we definitely cannot get to it for 6.2.
Currently, RC release is targeting end of this month, so I don't think there's enough time for the 'caching JObject' idea.

@iSazonov
Copy link
Collaborator Author

Using default values for settings is fine except for security settings.

The default hard-coded config must be secure by default - yes?
In other words, if a process cannot read a file at startup, it uses internal default values ​​that are safe. If the process cannot read them while running, it uses the current values.

Reading the config file only once and cache the whole JObject for future use is a way to reduce race condition

We need to look LastWriteTime and try read only if the file was changed.

The problem 2 is a hard one, we definitely cannot get to it for 6.2.

I think we can address this as I said above.

@daxian-dbw
Copy link
Member

The default hard-coded config must be secure by default - yes?

There will be two set of default values for security settings

  • One set is used when we are able to read the config file and the security settings are not configured. In this case, powershell assumes the security settings are not enforced.
  • The other set is used when we are not able to read the config file. In this case, powershell assumes all security settings should be enforced.

We need to look LastWriteTime and try read only if the file was changed.

I'm not super sure about this. I would love to always use the cache data (some settings only care about the first-read-in values anyways), but it might be wrong for some other settings. This is just an idea that need to be further explored.

fileLock.ExitReadLock();
catch (IOException)
{
return defaultValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, might cause security settings to be bypassed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the security settings? I don't understand why it could cause a problem if defaults is secure too. If we want force security we should use other means (group policy).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All settings under PowerShellPolicies are security settings. An example is pasted below.
Currently, powershell doesn't enforce them by default when the config file doesn't exists or those settings are not defined in the config file.
If we want to stick to the default value solution for the race condition problem, additional work is needed to make it return the appropriate value when the file exists but not accessible.
But still, it's impossible to have an appropriate value for some of the settings in case the file is not accessible, for example, ProtectedEventLogging, you won't be able to know what certificate to use to encrypt the logging.

If we want force security we should use other means (group policy).

Linux/macOS doesn't have group policy.

{
  "Microsoft.PowerShell:ExecutionPolicy": "RemoteSigned",
  "PowerShellPolicies": {
    "ScriptExecution": {
      "ExecutionPolicy": "RemoteSigned",
      "EnableScripts": true
    },
    "ScriptBlockLogging": {
      "EnableScriptBlockInvocationLogging": true,
      "EnableScriptBlockLogging": true
    },
    "ModuleLogging": {
      "EnableModuleLogging": false,
      "ModuleNames": [
        "PSReadline",
        "PowerShellGet"
      ]
    },
    "ProtectedEventLogging": {
      "EnableProtectedEventLogging": false,
      "EncryptionCertificate": [
        "Joe"
      ]
    },
    "Transcription": {
      "EnableTranscripting": true,
      "EnableInvocationHeader": true,
      "OutputDirectory": "F:\\tmp\\new"
    },
    "UpdatableHelp": {
      "DefaultSourcePath": "f:\\temp"
    },
    "ConsoleSessionConfiguration": {
      "EnableConsoleSessionConfiguration": false,
      "ConsoleSessionConfigurationName": "name"
    }
  },
  "LogLevel": "verbose"
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProtectedEventLogging, you won't be able to know what certificate to use to encrypt the logging

Thanks! My question was just about such settings. I think we need a compromise. This is an open project and if someone needs a highly secure configuration, they can hardcode needed parameters in fork repo and safely ignore the config file at all. In the repo we can hardcode default parameters as needed too and allow PowerShell start without reading the config file. Also users can check current config in script/module/host application and terminate if they see wrong config.

@daxian-dbw
Copy link
Member

@iSazonov I looked at the current changes in this PR again, and I think the only change that should be kept is changing new FileStream to WaitForFile in UpdateValueInFile. All rest changes should be reverted, at least for now. Would you like to make that only change in this PR? Or, do you want to close this one and submit a new one around the "default value" idea at a later time?

@iSazonov
Copy link
Collaborator Author

iSazonov commented Mar 6, 2019

Would you like to make that only change in this PR?

Ok, will do.

@iSazonov iSazonov requested a review from daxian-dbw March 7, 2019 03:59
@iSazonov
Copy link
Collaborator Author

iSazonov commented Mar 7, 2019

Done.

@@ -380,7 +380,7 @@ internal PSKeyword GetLogKeywords()
/// <typeparam name="T">The type of the value</typeparam>
/// <param name="scope">The ConfigScope of the configuration file to update.</param>
/// <param name="key">The string key of the value.</param>
/// <param name="defaultValue">The default value to return if the key is not present.</param>
/// <param name="defaultValue">The default value to return if the key is not present or the config file is not available.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -451,96 +451,90 @@ private void UpdateValueInFile<T>(ConfigScope scope, string key, T value, bool a
{
string fileName = GetConfigFilePath(scope);
fileLock.EnterWriteLock();
try
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the try/finally block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@daxian-dbw
Copy link
Member

daxian-dbw commented Mar 7, 2019

@iSazonov Can you please resolve the CI failure?

The failure seems not related to the change. Restarted the build.

@daxian-dbw daxian-dbw merged commit 3439411 into PowerShell:master Mar 10, 2019
@iSazonov iSazonov deleted the xunit-config-lock branch March 10, 2019 05:48
@iSazonov
Copy link
Collaborator Author

@daxian-dbw Do you want me to continue with defaults?

So if I have questions before I begin.
We need read the config at start time. Current waiting timeouts is good. Yes? If we can not read we use precompiled defaults.
How should we re-read the config - can we do this in background while monitoring the file is changed?
If we can not re-read the file we use current used values.
Thoughts?

@daxian-dbw
Copy link
Member

@iSazonov Let me read your RFC again before commenting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-Engine Indicates that a PR should be marked as an engine change in the Change Log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants