[#64] Refactor markdown scanner #238

YuriRomanowski · 2022-12-14T12:07:10Z

Description

Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations.

Solution: Refactor: improve handling annotations, remove IMSAll state, rename functions, isolate processing annotations for different types of Nodes, sharing common behaviour with factored out functions.

Related issue(s)

Needed to fix #64

✅ Checklist for your Pull Request

Ideally a PR has all of the checkmarks set.

If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).

Related changes (conditional)

Tests
- If I added new functionality, I added tests covering it.
- If I fixed a bug, I added a regression test to prevent the bug from
  silently reappearing again.
Documentation
- I checked whether I should update the docs and did so if necessary:
  - README
  - Haddock
Public contracts
- Any modifications of public contracts comply with the Evolution
  of Public Contracts policy.
- I added an entry to the changelog if my changes are visible to the users
  and
- provided a migration guide for breaking changes if possible

Stylistic guide (mandatory)

My commits comply with the policy used in Serokell.
My code complies with the style guide.

✓ Release Checklist

I updated the version number in package.yaml.
I updated the changelog and moved everything
under the "Unreleased" section to a new section for this release version.
(After merging) I edited the auto-release.
- Change the tag and title using the format vX.Y.Z.
- Write a summary of all user-facing changes.
- Deselect the "This is a pre-release" checkbox at the bottom.
(After merging) I updated xrefcheck-action.
(After merging) I uploaded the package to hackage.

Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; improve handling annotations, remove IMSAll state as it's not required, rename functions.

Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; isolate processing annotations for different types of nodes.

aeqz

I like this refactor. It will allow us to create new annotations more easily. I have added just a few comments.

Also, maybe there will be the need for another small refactor if at some point we increase the ScannerState with stateful annotations that do not correspond to ignore, but it can be done at that moment.

aeqz · 2022-12-14T19:00:07Z

src/Xrefcheck/Scanners/Markdown.hs

-getPosition :: Node -> Maybe PosInfo
-getPosition node@(Node pos _ _) = do
+getPosition :: C.Node -> Maybe PosInfo
+getPosition node@(C.Node pos _ _) = do
  annLength <- length . T.strip <$> getHTMLText node
  PosInfo sl sc _ _ <- pos
  pure $ PosInfo sl sc sl (sc + annLength - 1)

 -- | Extract `IgnoreMode` if current node is xrefcheck annotation.


I think that this comment should be updated:

IgnoreMode -> GetAnnotation

aeqz · 2022-12-14T19:11:36Z

src/Xrefcheck/Scanners/Markdown.hs

-    isIgnoreFile :: Node -> Bool
-    isIgnoreFile = (ValidMode IMAll ==) . getIgnoreMode
+    isIgnoreFile :: C.Node -> Bool
+    isIgnoreFile = (Just (IgnoreAnnotation IMAll) ==) . getAnnotation


Maybe we can have a function like

isGlobalAnnotation :: GetAnnotation -> Bool

and use it here. Also, if this function has no wildcard pattern matchings, the compiler will ask us if we want any new annotation that we add in the future to be global or not.

Currently there can be only one type of annotation, which is IgnoreFile. Here similar function is introduced, called isHeaderNode.

aeqz · 2022-12-14T19:32:02Z

src/Xrefcheck/Scanners/Markdown.hs

+      NodeType ->
+      [ScannerM C.Node] ->
+      ScannerM C.Node
+    handleLink ign pos ty subs = do


Now all the ign arguments can renamed to something like ann. Right?

Sorry, they always correspond to Ignore.



aeqz · 2022-12-14T19:36:54Z

src/Xrefcheck/Scanners/Markdown.hs

+      ScannerM C.Node
+    traverseNodeWithLinkExpected ignoreLinkState modePos pos ty subs = do
+      when (ignoreLinkState == ExpectingLinkInSubnodes) $
+        ssIgnore . _Just . ignoreMode .=  IMSLink ParentExpectsLink


Extra space after .=?

aeqz · 2022-12-14T19:37:24Z

src/Xrefcheck/Scanners/Markdown.hs

+      -> GetAnnotation
+      -> ScannerM C.Node
+    handleAnnotation pos nodeType = \case
+      IgnoreAnnotation mode  -> do


Extra space after mode?

Martoon-00 · 2022-12-15T18:48:05Z

src/Xrefcheck/Scanners/Markdown.hs

+          Nothing -> case e of
+            Nothing                    -> Node pos ty <$> sequence subs
+            Just (Ignore mode modePos) ->
+                case (mode, ty) of


Extra indentation

🤔 Where is it? It seems it's not present in the final version

Martoon-00 · 2022-12-15T18:59:40Z

tests/golden/check-scan-errors/expected.gold

@@ -18,7 +18,7 @@
  ➥  In file check-scan-errors.md
     scan error at src:21:1-50:

-     Unrecognised option "unrecognised-annotation" perhaps you meant <"ignore link"|"ignore paragraph"|"ignore all">
+     Unrecognised option "ignore unrecognised-annotation" perhaps you meant <"ignore link"|"ignore paragraph"|"ignore all">


I think this is a good change.

Mm, however, in the commit description you explicitly tell that you refactor things, which assumes no changes in the application logic.

Please leave a note in the commit descrption that your refactoring led to this minor change.

OK, got it, I'll change the description later when prettying commit history.

Martoon-00 · 2022-12-15T19:27:35Z

src/Xrefcheck/Scanners/Markdown.hs

+    handleLink ign pos ty subs = do
+      let traverseChildren = C.Node pos ty <$> sequence subs
+      -- It can be checked that it's correct for all the cases
+      ssIgnore .= Nothing


This is where I'd agree with @aeqz on the maintainability issue.

Currently, the comment at line above is correct, but if someday someone adds some IMS123Paragraphs constructor, this statement will turn false but the compiler won't help in noticing that. And good luck to that guy figuring out where the state resets.

Also, AFAIU in the case of ign = IMSParagraph you rely on the behaviour of handleParagraph to strip the entire Paragraph subtree. This is not directly an issue, but this is a dependency between the code components that with this PR become coupled slightly less tightly (after being extracted to separate subfunctions), this increases the probability of bugs, and I'm a bit worried.

Also, AFAIU in the case of ign = IMSParagraph you rely on the behaviour of handleParagraph to strip the entire Paragraph subtree.

It seems that this code is fully made up of such implicit dependencies, because the main reason is how we use annotations. We put an annotation somewhere in the text, and then the further behavior depends on the next nodes.

In case of IMSParagraph, at least, we could try to look at the next node (not very easy to do) and perform some actions immediately, so, if the next is paragraph, ignore it, and if the next is something else, emit an error. But in case of link annotations we have to handle all this implicit stuff (and that's sad).

Martoon-00 · 2022-12-15T19:31:30Z

src/Xrefcheck/Scanners/Markdown.hs

+    isSimpleComment node = do
+      let isComment = isJust $ getCommentContent node
+          isNotXrefcheckAnnotation = isNothing $ getXrefcheckContent node
+      isComment && isNotXrefcheckAnnotation


Good renaming and use of span 👍

I'm not sure though why providing mere isComment was bad 🤔

Here we want to split header nodes and other contents of the file. So if for example ignore paragraph goes right after header nodes, we will stop immediately. And this annotations is also a comment, because isJust . getCommentContent is true for it. This way we exclude local annotations from the consideration.

Hm, but also this function will stop at an invalid annotation. Maybe it's better to allow it too, and just skip it reporting an error.

Hm, no, I think it's better to stop at incorrect annotations 🤔

Martoon-00 · 2022-12-15T19:59:43Z

src/Xrefcheck/Scanners/Markdown.hs

+              PARAGRAPH -> handleParagraph ign pos ty subs
+              LINK {}   -> handleLink      ign pos ty subs
+              IMAGE {}  -> handleLink      ign pos ty subs
+              _         -> handleOther     ign pos ty subs


Hmhm, looking at the result of this change, I feel quite skeptical in fact.

When imagining the implementation, it seems simpler to keep an annotation in mind and think how it affects markdown nodes, rather than mentally sorting through all types of nodes (hard) and for each think how it is affected by each of our annotations (keeping all annotations in mind can become hard in the future). And this code goes against this model of thinking.

A related thing: code locality issues, what happens in handleParagraph seems to affect what happens in handleLink, this makes reasoning about the code correctness harder.

Perhaps you applied this refactoring to make the code shorter? We really spared several lines (and on one such place I left a comment because I think it is arguable), but overall the code seems to take more space now even if we don't take the function signatures into account.

I mostly agree with other minor changes in this commit, but the core part of this commit I find suspicious.

Could you tell which benefits, in your opinion, this rewrite gives?

For me, it's easier to reason about code behavior when a node type is known, because our types for ignore modes are, IMHO, rather confusing and hard to understand. And, in contrast, node type is something very clear and easy to work with, so I preferred it.

Things only gets worse if there are some different types of annotation (the primary reason of the whole this refactor). We'll have to handle a lot of different cases for all types of annotations and nodes.

Martoon-00 · 2022-12-21T17:27:19Z

src/Xrefcheck/Scanners/Markdown.hs

+                Ignore IMSParagraph prevPos ->
+                  lift . tell . makeError prevPos fp . ParagraphErr $ prettyType nodeType
+                Ignore (IMSLink _) prevPos ->
+                  lift $ tell $ makeError prevPos fp LinkErr


Here too, the commit says that you apply a refactoring, but here in fact you add a behaviour that was not present before.

I more than agree that this check is good to add, but let's extract it to a separate commit.

In fact the behavior hasn't changed here, because in the original version some of these checks were present in other cases on the top level of remove function, and currently all of them are here.

YuriRomanowski · 2022-12-21T18:29:34Z

@Martoon-00 There are a lot of troubles in the Scanners/Markdown.hs module that are inherited from the past, I didn't fix them fully, just refactor slightly such that adding new annotations ceased to be really impossible. But we can try to further improve it in some extent, sure.

Martoon-00 · 2023-01-31T23:10:07Z

Related ticket: #276.

YuriRomanowski added 2 commits December 14, 2022 16:10

[#64] Refactor the markdown scanner

9c6d97c

Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; improve handling annotations, remove IMSAll state as it's not required, rename functions.

[#64] Refactor the markdown scanner

e2ef5db

Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; isolate processing annotations for different types of nodes.

YuriRomanowski changed the title ~~Yuri romanowski/#64 refactor markdown scanner~~ [#64] Refactor markdown scanner Dec 14, 2022

Martoon-00 requested review from aeqz and Martoon-00 December 14, 2022 18:30

aeqz suggested changes Dec 14, 2022

View reviewed changes

Review: rename GetAnnotation to Annotation, remove extra spaces

82e7292

aeqz self-requested a review December 15, 2022 09:10

aeqz approved these changes Dec 15, 2022

View reviewed changes

This was referenced Dec 15, 2022

[#197] Canonicalize filepaths #230

Merged

[#64] implement copy paste protection #240

Closed

aeqz approved these changes Dec 20, 2022

View reviewed changes

Martoon-00 requested changes Dec 21, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#64] Refactor markdown scanner #238

[#64] Refactor markdown scanner #238

YuriRomanowski commented Dec 14, 2022 •

edited

Loading

aeqz left a comment

aeqz Dec 14, 2022

aeqz Dec 14, 2022

YuriRomanowski Dec 14, 2022

aeqz Dec 14, 2022

aeqz Dec 14, 2022

aeqz Dec 14, 2022

aeqz Dec 14, 2022

Martoon-00 Dec 15, 2022

YuriRomanowski Dec 22, 2022

Martoon-00 Dec 15, 2022

YuriRomanowski Dec 21, 2022 •

edited

Loading

Martoon-00 Dec 15, 2022

YuriRomanowski Dec 21, 2022

YuriRomanowski Dec 21, 2022

Martoon-00 Dec 15, 2022

YuriRomanowski Dec 21, 2022

YuriRomanowski Dec 22, 2022

Martoon-00 Dec 15, 2022

YuriRomanowski Dec 21, 2022

Martoon-00 Dec 21, 2022

YuriRomanowski Dec 21, 2022 •

edited

Loading

YuriRomanowski commented Dec 21, 2022

Martoon-00 commented Jan 31, 2023

[#64] Refactor markdown scanner #238

Are you sure you want to change the base?

[#64] Refactor markdown scanner #238

Conversation

YuriRomanowski commented Dec 14, 2022 • edited Loading

Description

Related issue(s)

✅ Checklist for your Pull Request

Related changes (conditional)

Stylistic guide (mandatory)

✓ Release Checklist

aeqz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YuriRomanowski Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YuriRomanowski Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

YuriRomanowski commented Dec 21, 2022

Martoon-00 commented Jan 31, 2023

YuriRomanowski commented Dec 14, 2022 •

edited

Loading

YuriRomanowski Dec 21, 2022 •

edited

Loading

YuriRomanowski Dec 21, 2022 •

edited

Loading