Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literal and folded strings cannot handle "\n \n" at the end of a string #1944

Open
jonasfj opened this issue Jun 27, 2024 · 8 comments
Open
Labels
package:yaml_edit type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@jonasfj
Copy link
Member

jonasfj commented Jun 27, 2024

Adding the test case 'whitespace between line breaks at the end\n \n' to test/string_test.dart causes tests to fail.

diff --git a/test/string_test.dart b/test/string_test.dart
index b92c20a..a18f57a 100644
--- a/test/string_test.dart
+++ b/test/string_test.dart
@@ -10,6 +10,7 @@ final _testStrings = [
   "this is a fairly' long string with\nline breaks",
   'whitespace\n after line breaks',
   'whitespace\n \nbetween line breaks',
+  'whitespace between line breaks at the end\n \n',
   '\n line break at the start',
   'word',
   'foo bar',

Tests failures:

$ dart test
00:01 +400 -1: loading test/golden_test.dart                                                                                                                                                                                           
Successfully tested 20 inputs against golden files, created 0 golden files
00:01 +400 -1: test/string_test.dart: Root FOLDED string (4) [E]                                                                                                                                                                       
  Assertion failed: (package:yaml_edit) Modification did not result in expected result.
  
  # YAML before edit:
  > 
  
  # YAML after edit:
  > >-
  >   whitespace between line breaks at the end
  >    
  >   
  
  Please file an issue at:
  https://github.com/dart-lang/yaml_edit/issues/new?labels=bug
  
  package:yaml_edit/src/editor.dart 579:7   YamlEditor._performEdit
  package:yaml_edit/src/editor.dart 249:14  YamlEditor.update
  test/string_test.dart 39:20               main.<fn>
  

To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root FOLDED string (4)'
00:01 +415 -2: test/string_test.dart: Root LITERAL string (4) [E]                                                                                                                                                                      
  Assertion failed: (package:yaml_edit) Modification did not result in expected result.
  
  # YAML before edit:
  > 
  
  # YAML after edit:
  > |-
  >   whitespace between line breaks at the end
  >    
  >   
  
  Please file an issue at:
  https://github.com/dart-lang/yaml_edit/issues/new?labels=bug
  
  package:yaml_edit/src/editor.dart 579:7   YamlEditor._performEdit
  package:yaml_edit/src/editor.dart 249:14  YamlEditor.update
  test/string_test.dart 39:20               main.<fn>
  

To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root LITERAL string (4)'
00:10 +624 -2: Some tests failed.                                                                                                                                                                                                      

Consider enabling the flag chain-stack-traces to receive more detailed exceptions.
For example, 'dart test --chain-stack-traces'.
@jonasfj jonasfj added the type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) label Jun 27, 2024
@jonasfj
Copy link
Member Author

jonasfj commented Jun 27, 2024

@kekavc24 I'm not sure this is a new bug caused by dart-lang/yaml_edit#87 but it's something we should fix.

At a minimum we should fallback to double quoted string encoded, if nothing else.

Indeed, I'd suggest the following steps:

  1. PR that makes _tryYamlEncodeLiteral and _tryYamlEncodeFolded for strings that end in \n \n and similar.
    • Figure out what strings aren't allowed at the end.
    • Maybe, just start with a safe thing, like if (string.trimRight() != string) return null
    • Add additional test cases that exercise all sorts of ways whitespace and linebreaks can be added at the end.
  2. Explore in what scenarios we can encode strings ending with whitespace / line breaks as:
    • Literal strings
    • Folded strings
  3. PRs and tests for literal and folded strings respectively.

I think (1) is the most important. if (string.trimRight() != string) return null is a big hammer, and maybe we can do better, but it's a good start, and it's better to safe than wrong 🤣

@jonasfj
Copy link
Member Author

jonasfj commented Jun 27, 2024

Btw, I can be found on dart_community discord, I'm jonasfj, feel free to ping me.

@kekavc24
Copy link
Contributor

Adding the test case 'whitespace between line breaks at the end\n \n' to test/string_test.dart causes tests to fail.

diff --git a/test/string_test.dart b/test/string_test.dart
index b92c20a..a18f57a 100644
--- a/test/string_test.dart
+++ b/test/string_test.dart
@@ -10,6 +10,7 @@ final _testStrings = [
   "this is a fairly' long string with\nline breaks",
   'whitespace\n after line breaks',
   'whitespace\n \nbetween line breaks',
+  'whitespace between line breaks at the end\n \n',
   '\n line break at the start',
   'word',
   'foo bar',

Tests failures:

$ dart test
00:01 +400 -1: loading test/golden_test.dart                                                                                                                                                                                           
Successfully tested 20 inputs against golden files, created 0 golden files
00:01 +400 -1: test/string_test.dart: Root FOLDED string (4) [E]                                                                                                                                                                       
  Assertion failed: (package:yaml_edit) Modification did not result in expected result.
  
  # YAML before edit:
  > 
  
  # YAML after edit:
  > >-
  >   whitespace between line breaks at the end
  >    
  >   
  
  Please file an issue at:
  https://github.com/dart-lang/yaml_edit/issues/new?labels=bug
  
  package:yaml_edit/src/editor.dart 579:7   YamlEditor._performEdit
  package:yaml_edit/src/editor.dart 249:14  YamlEditor.update
  test/string_test.dart 39:20               main.<fn>
  

To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root FOLDED string (4)'
00:01 +415 -2: test/string_test.dart: Root LITERAL string (4) [E]                                                                                                                                                                      
  Assertion failed: (package:yaml_edit) Modification did not result in expected result.
  
  # YAML before edit:
  > 
  
  # YAML after edit:
  > |-
  >   whitespace between line breaks at the end
  >    
  >   
  
  Please file an issue at:
  https://github.com/dart-lang/yaml_edit/issues/new?labels=bug
  
  package:yaml_edit/src/editor.dart 579:7   YamlEditor._performEdit
  package:yaml_edit/src/editor.dart 249:14  YamlEditor.update
  test/string_test.dart 39:20               main.<fn>
  

To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root LITERAL string (4)'
00:10 +624 -2: Some tests failed.                                                                                                                                                                                                      

Consider enabling the flag chain-stack-traces to receive more detailed exceptions.
For example, 'dart test --chain-stack-traces'.

Sorry for this. It seems I overwrote it as I was resolving some merge conflicts I encountered while rebasing.

I wanted to drop the chomping indicator changes I made in between for dart-lang/yaml_edit#87 so that the squashed commits make sense when merged.

I’ll work on it 🫡

@kekavc24
Copy link
Contributor

kekavc24 commented Jul 1, 2024

I have a branch with the changes you suggested, I'll make a PR.

@jonasfj Additionally, I also managed to come up with a PoC that solves this issue permanently (hopefully) but comes with some breaking but solvable trade offs in another branch.


Issue One

After we wrongly tried using keep chomping indicator in dart-lang/yaml_edit#87, it got me wondering why it existed in YamlEditor in the first place. What issue was it trying to solve before my fix in dart-lang/yaml_edit#87?

The investigation led me back here 😅 , to this very issue we are trying to solve. The chomping indicator was meant to preserve the trailing \n in the initially buggy _tryYamlEncodeFolded and _tryYamlEncodeLiteral with hits and misses before my fix. And now it won't work. How come and why?

I investigated yamlEncodeBlockScalar which calls _tryYamlEncodeFolded and any other function that encodes block scalars. It looked okay. You helped me refactor it. The culprit ended up being yamlEncodeBlock which calls yamlEncodeBlockScalar.

If a YamlNode is a :

  1. YamlScalar, it calls yamlEncodeBlockScalar and returns the encoded string.
  2. YamlList or YamlMap, it recursively calls yamlEncodeBlock until all YamlScalars are encoded and joins then with a \n and returns the string.

No issue, right? Well, that's where the issue is.

It means our YamlScalar will never be given the chance to have an additional \n we need for _tryYamlEncodeFolded and _tryYamlEncodeLiteral to prevent issues.

Okay, issue found. Just apply the additional \n we need. I did this and quickly ended up having duplicate \n or even \nx where x is the number of YamlList or YamlMap are encountered before the next YamlScalar is found.

So I refactored that function to apply line-breaks correctly. After this 2 issues bubbled up:

  1. Dangling multiline comments messing with modified yaml!
  2. Dangling line breaks for YamlScalars that aren't folded/stripped. (I expected this but not 1 above 😅)

Issue Two: Comments

Once this occurred, I backed off to see if this was a known issue that needed fixing. That's when I found #1935. In the issue, YOU outline ways we can/should handle this which ended up being what I had in mind. Thanks! (for the incredible foresight)

However, a few issues you overlooked that ended up being the trade-offs I mentioned:

  1. What happens to white-space after the comment(s)?
  2. Do we always look for comments greedily or lazily?
    • lazily - Should we just retain the white space we skipped after scanning ahead for comments and not finding any?
    • greedily - Discard any white space we find ahead irrespective of whether we found comments or not.

I ended up looking for and skipping comments greedily. At least to have a PoC that works. Armed with this, I continued my investigation started in issue one above.

I noticed that both _replaceInBlockMap and updateInList only look ahead up-to the first comment. While refactoring code here, I realised:

  • yamlEncodeBlock intentionally leaves out the trailing line-break we needed in issue one above and applies it later but only in updateInList and never for top-level scalars or _replaceInBlockMap. A bug which we can't solve here. Only at the yamlEncodeBlock level
  • I needed to compensate for the indent (belonging to the next node) I greedily consume. I did this using the current YamlList or YamlMap being mutated.

Dangling line-breaks

At this point, the issues I had were:

  1. How do I prune the additional line breaks without messing up folded and literal strings.
  2. How do I make sure folded/literal strings don't mess up the next element's indent. I ignored this here and instead solved it in issue_two.

I implemented the function that does that and also added some tricks when adding folded/literal strings.


It would be nice if you are able to review it and make suggestions for improvements. I can make a draft PR.

@kekavc24
Copy link
Contributor

kekavc24 commented Jul 1, 2024

@jonasfj sorry for the extremely long write up. I didn't want to create a PR without outlining the PoC itself since I make so many changes to the existing code without redefining its intention that may need your review.

@jonasfj
Copy link
Member Author

jonasfj commented Jul 1, 2024

Sorry for this. It seems I overwrote it as I was resolving some merge conflicts I encountered while rebasing.

Yeah, the PR was getting pretty complicated, we should probably have done fewer things.
No worries, we can still fix bugs now :D

Also it's not like there weren't any bugs before this PR 🙈 🤣

@jonasfj
Copy link
Member Author

jonasfj commented Jul 1, 2024

Okay, this might take a while to review.

Thinking about it, I do wonder, is it really a good idea to use folded or literal strings without chomping -?

Certainly the + modify is a but dangerous, in scenarios like:

Example 1

A: |+
  foo
  
B: true
  • Removing B could change the value of A if we're not careful.

Example 2

A: |+
  foo

  • Adding top-level key B: true, could change the value of A if we're not careful.

I guess what I'm asking is? Instead of trying to handle cases like these correctly, would it be better if we just fallback to use "foo\n\n".

In practice I suspect it's rare that yaml_edit will be asked to deal with these things. So it might be better if we have a package that can do less, but do it correctly every time.

If we have to fallback to double quoted encoding, that's not nearly as bad as throwing a runtime exception because of an internal error, which is what will happen if we modify the YAML document incorrectly (_performEdit makes sure of that).

I'm just asking the question here:

  • How much do we want yaml_edit to do?
  • Do we think yaml_edit will be asked to add literal or folded strings often?
  • Won't most literal or folded strings be trimmed for whitespace in practice?
    (And if they are not, would it perhaps not be better that the tools that use yaml_edit figure out that trimming the strings make yaml_edit produce nicer looking output)

@kekavc24
Copy link
Contributor

kekavc24 commented Jul 1, 2024

I guess what I'm asking is? Instead of trying to handle cases like these correctly, would it be better if we just fallback to use "foo\n\n".

True. That's why I still fielded dart-lang/yaml_edit#91 that should precede dart-lang/yaml_edit#90. dart-lang/yaml_edit#90 is just a PoC that I believe can get better once you take a look and help me nitpick the rough edges.


Certainly the + modify is a but dangerous, in scenarios like:

That's why I never attempt to use it at any point within the PoC. I discovered a nifty trick that I strongly highlighted and documented. I was surprised it passes most existing tests including random_test.dart since the PoC ensures no dangling comments are left behind.


In practice I suspect it's rare that yaml_edit will be asked to deal with these things. So it might be better if we have a package that can do less, but do it correctly every time.

Yeah, that makes sense

@mosuem mosuem transferred this issue from dart-lang/yaml_edit Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:yaml_edit type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants