Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UseQuotes parameter #8951

Merged
merged 3 commits into from
Mar 6, 2019
Merged

Conversation

iSazonov
Copy link
Collaborator

@iSazonov iSazonov commented Feb 22, 2019

PR Summary

Close #8890

Add new parameter -UseQuotes to Export-Csv and ConvertTo-Csv cmdlets:

  • Never - don't quote anything.
  • Always - quote everything (current and default behavior).
  • AsNeeded - only quote fields that need it (they contain a delimiter character).

The PR doesn't change default to AsNeeded as requested in #8890. We could do this later because it is breaking change and we need to discuss this.

The PR doesn't add StringsOnly. This allows to produce broken output csv files where values contain unquoted delimiter. AsNeeded is best choice. If I wrong we can add this later.

PR Context

Please look #8890.
Some application don't like quotes in cvs files.

/cc @DavidBerg-MSFT

PR Checklist

@iSazonov iSazonov added the CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log label Feb 22, 2019
@DavidBerg-MSFT
Copy link

            dest.Append(c);

FYI, building the string up one at a time is really slow... (I know, not new code).

I was curious how slow, so I came up with four alternatives:

  • Append One Char at a time - current approach
  • Append All or One at a time - check for quote, then pick Append String or Char
  • Append Incrementally - run through the string char for char, but append in chunks
  • Append All or Incrementally - check for quote, then chunk

For test data without quotes, the times are:

  • Append One Char at a time: 374
  • Append All or One at a time: 163
  • Append Incrementally: 331
  • Append All or Incrementally: 163

In other words, it really pays off (2x faster) to add the check (if indexof('"') == -1 then append string). But what if the string has a quote (or more)? How much time does it add?

For test data with a quote, the times are:

  • Append One Char at a time: 382
  • Append All or One at a time: 404
  • Append Incrementally: 348
  • Append All or Incrementally: 379

So adding the extra check added ~6% overhead in this case. (which we can make back up by using an incremental chunked approach). Probably worth it since the bulk of strings probably won't have embedded quotes, so we'll probably average 2x faster.


Refers to: src/Microsoft.PowerShell.Commands.Utility/commands/utility/CSVCommands.cs:1081 in fe379e2. [](commit_id = fe379e2, deletion_comment = False)

@DavidBerg-MSFT
Copy link

            dest.Append(c);

FYI, here's the optimized code:

        private static void AddStringAllOrIncremental(StringBuilder dest, string source)
        {
            if (source == null)
            {
                return;
            }

            // Adding Double quote to all strings
            dest.Append('"');
            if (source.IndexOf('"') >= 0) // shortcut if escape not necessary
            {
                int startIndex = 0;
                for (int i = 0; i < source.Length; i++)
                {
                    char c = source[i];

                    // Double quote in the string is escaped with double quote
                    if (c == '"')
                    {
                        dest.Append(source, startIndex, i - startIndex);  // copy everything up to this point
                        dest.Append('"');
                        startIndex = i;
                    }
                }
                dest.Append(source, startIndex, source.Length - startIndex); // copy the rest of the string
            }
            else
            {
                dest.Append(source);
            }

            dest.Append('"');
        }

    }


In reply to: 466544670 [](ancestors = 466544670)


Refers to: src/Microsoft.PowerShell.Commands.Utility/commands/utility/CSVCommands.cs:1081 in fe379e2. [](commit_id = fe379e2, deletion_comment = False)

@DavidBerg-MSFT
Copy link

            dest.Append(c);

In fairness, might be better as a separate change, since it's an optimization of existing code, only indirectly related to the new code.


In reply to: 466545320 [](ancestors = 466545320,466544670)


Refers to: src/Microsoft.PowerShell.Commands.Utility/commands/utility/CSVCommands.cs:1081 in fe379e2. [](commit_id = fe379e2, deletion_comment = False)

@iSazonov
Copy link
Collaborator Author

@DavidBerg-MSFT Many thanks! It is very interesting!
I hope the PR will fast reviewed and we get the new, very useful feature in 6.2 version. Then we could work on optimizations (also we plan to move to .Net Core 3.0 after releasing 6.2 that give us performance benefits too).

@SteveL-MSFT If your team is interesting to have this in 6.2 please review.

/// <summary>
/// Never escape output.
/// </summary>
internal static void AppendStringWithEscapeNever(StringBuilder dest, string source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't actually used. I understand the original intent, but I think just calling .Append() rather than calling this method is preferred. So you should remove this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@SteveL-MSFT
Copy link
Member

@DavidBerg-MSFT agree that the code optimization should be a separate PR.

@iSazonov can you address the Codacy issues?

Having taken a look at the code, the risk is small so we may be able to take this for 6.2-GA, but it won't make it into RC.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Mar 1, 2019

can you address the Codacy issues?

Done.

Copy link
Member

@SteveL-MSFT SteveL-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anmenaga
Copy link
Contributor

anmenaga commented Mar 6, 2019

@iSazonov Looks like there is still 2 minor Codacy updates needed.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Mar 6, 2019

@anmenaga First was fixed. Second is false positive.

@anmenaga anmenaga merged commit 2953959 into PowerShell:master Mar 6, 2019
@iSazonov iSazonov deleted the export-csv-quotes branch March 7, 2019 03:33
@iSazonov
Copy link
Collaborator Author

iSazonov commented Mar 7, 2019

@SteveL-MSFT Will this be added to 6.2-GA?

@TravisEz13
Copy link
Member

Can you ask this in the maintainers teams channel with a link to the PR with your reason why and an analysis of the regression risk?

@iSazonov
Copy link
Collaborator Author

iSazonov commented Mar 8, 2019

@TravisEz13 This is only an informational question to the previous Steve's comment. I don't need to have it in 6.2.

@TravisEz13
Copy link
Member

We have already forked for 6.2-GA based no RC.1. So, this won't be in the build.

@iSazonov
Copy link
Collaborator Author

@DavidBerg-MSFT I tested your code on .Net Core 3.0 Preview3 and can not confirm results. In my tests original code is faster.
What .Net Core did you use? Could you share code of your tests?

@iSazonov
Copy link
Collaborator Author

I looked the code again. StringBuilder delays only on reallocating buffers. We reuse StringBuilder and exclude such re-allocations. Underlying method in StringBuilder copies very efficient by means of indexer for both char and strings. So I can not find how we could speed up the code - IndexOf() slow down the code and my tests confirm this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export-Csv should have option to suppress quotes
5 participants