-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#64] Refactor markdown scanner #238
base: master
Are you sure you want to change the base?
[#64] Refactor markdown scanner #238
Conversation
Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; improve handling annotations, remove IMSAll state as it's not required, rename functions.
Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations. Solution: Refactor; isolate processing annotations for different types of nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this refactor. It will allow us to create new annotations more easily. I have added just a few comments.
Also, maybe there will be the need for another small refactor if at some point we increase the ScannerState
with stateful annotations that do not correspond to ignore, but it can be done at that moment.
src/Xrefcheck/Scanners/Markdown.hs
Outdated
getPosition :: Node -> Maybe PosInfo | ||
getPosition node@(Node pos _ _) = do | ||
getPosition :: C.Node -> Maybe PosInfo | ||
getPosition node@(C.Node pos _ _) = do | ||
annLength <- length . T.strip <$> getHTMLText node | ||
PosInfo sl sc _ _ <- pos | ||
pure $ PosInfo sl sc sl (sc + annLength - 1) | ||
|
||
-- | Extract `IgnoreMode` if current node is xrefcheck annotation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this comment should be updated:
IgnoreMode
->
GetAnnotation
isIgnoreFile :: Node -> Bool | ||
isIgnoreFile = (ValidMode IMAll ==) . getIgnoreMode | ||
isIgnoreFile :: C.Node -> Bool | ||
isIgnoreFile = (Just (IgnoreAnnotation IMAll) ==) . getAnnotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have a function like
isGlobalAnnotation :: GetAnnotation -> Bool
and use it here. Also, if this function has no wildcard pattern matchings, the compiler will ask us if we want any new annotation that we add in the future to be global or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently there can be only one type of annotation, which is IgnoreFile
. Here similar function is introduced, called isHeaderNode
.
NodeType -> | ||
[ScannerM C.Node] -> | ||
ScannerM C.Node | ||
handleLink ign pos ty subs = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now all the ign
arguments can renamed to something like ann
. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, they always correspond to Ignore
.
<!-- ignore my-previous-comment -->
src/Xrefcheck/Scanners/Markdown.hs
Outdated
ScannerM C.Node | ||
traverseNodeWithLinkExpected ignoreLinkState modePos pos ty subs = do | ||
when (ignoreLinkState == ExpectingLinkInSubnodes) $ | ||
ssIgnore . _Just . ignoreMode .= IMSLink ParentExpectsLink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space after .=
?
src/Xrefcheck/Scanners/Markdown.hs
Outdated
-> GetAnnotation | ||
-> ScannerM C.Node | ||
handleAnnotation pos nodeType = \case | ||
IgnoreAnnotation mode -> do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space after mode
?
Nothing -> case e of | ||
Nothing -> Node pos ty <$> sequence subs | ||
Just (Ignore mode modePos) -> | ||
case (mode, ty) of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Where is it? It seems it's not present in the final version
@@ -18,7 +18,7 @@ | |||
➥ In file check-scan-errors.md | |||
scan error at src:21:1-50: | |||
|
|||
Unrecognised option "unrecognised-annotation" perhaps you meant <"ignore link"|"ignore paragraph"|"ignore all"> | |||
Unrecognised option "ignore unrecognised-annotation" perhaps you meant <"ignore link"|"ignore paragraph"|"ignore all"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good change.
Mm, however, in the commit description you explicitly tell that you refactor things, which assumes no changes in the application logic.
Please leave a note in the commit descrption that your refactoring led to this minor change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it, I'll change the description later when prettying commit history.
handleLink ign pos ty subs = do | ||
let traverseChildren = C.Node pos ty <$> sequence subs | ||
-- It can be checked that it's correct for all the cases | ||
ssIgnore .= Nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I'd agree with @aeqz on the maintainability issue.
Currently, the comment at line above is correct, but if someday someone adds some IMS123Paragraphs
constructor, this statement will turn false but the compiler won't help in noticing that. And good luck to that guy figuring out where the state resets.
Also, AFAIU in the case of ign = IMSParagraph
you rely on the behaviour of handleParagraph
to strip the entire Paragraph
subtree. This is not directly an issue, but this is a dependency between the code components that with this PR become coupled slightly less tightly (after being extracted to separate subfunctions), this increases the probability of bugs, and I'm a bit worried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, AFAIU in the case of ign = IMSParagraph you rely on the behaviour of handleParagraph to strip the entire Paragraph subtree.
It seems that this code is fully made up of such implicit dependencies, because the main reason is how we use annotations. We put an annotation somewhere in the text, and then the further behavior depends on the next nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of IMSParagraph
, at least, we could try to look at the next node (not very easy to do) and perform some actions immediately, so, if the next is paragraph, ignore it, and if the next is something else, emit an error. But in case of link annotations we have to handle all this implicit stuff (and that's sad).
isSimpleComment node = do | ||
let isComment = isJust $ getCommentContent node | ||
isNotXrefcheckAnnotation = isNothing $ getXrefcheckContent node | ||
isComment && isNotXrefcheckAnnotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good renaming and use of span
👍
I'm not sure though why providing mere isComment
was bad 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we want to split header nodes and other contents of the file. So if for example ignore paragraph
goes right after header nodes, we will stop immediately. And this annotations is also a comment, because isJust . getCommentContent
is true for it. This way we exclude local annotations from the consideration.
Hm, but also this function will stop at an invalid annotation. Maybe it's better to allow it too, and just skip it reporting an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, no, I think it's better to stop at incorrect annotations 🤔
PARAGRAPH -> handleParagraph ign pos ty subs | ||
LINK {} -> handleLink ign pos ty subs | ||
IMAGE {} -> handleLink ign pos ty subs | ||
_ -> handleOther ign pos ty subs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmhm, looking at the result of this change, I feel quite skeptical in fact.
When imagining the implementation, it seems simpler to keep an annotation in mind and think how it affects markdown nodes, rather than mentally sorting through all types of nodes (hard) and for each think how it is affected by each of our annotations (keeping all annotations in mind can become hard in the future). And this code goes against this model of thinking.
A related thing: code locality issues, what happens in handleParagraph
seems to affect what happens in handleLink
, this makes reasoning about the code correctness harder.
Perhaps you applied this refactoring to make the code shorter? We really spared several lines (and on one such place I left a comment because I think it is arguable), but overall the code seems to take more space now even if we don't take the function signatures into account.
I mostly agree with other minor changes in this commit, but the core part of this commit I find suspicious.
Could you tell which benefits, in your opinion, this rewrite gives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it's easier to reason about code behavior when a node type is known, because our types for ignore modes are, IMHO, rather confusing and hard to understand. And, in contrast, node type is something very clear and easy to work with, so I preferred it.
Things only gets worse if there are some different types of annotation (the primary reason of the whole this refactor). We'll have to handle a lot of different cases for all types of annotations and nodes.
Ignore IMSParagraph prevPos -> | ||
lift . tell . makeError prevPos fp . ParagraphErr $ prettyType nodeType | ||
Ignore (IMSLink _) prevPos -> | ||
lift $ tell $ makeError prevPos fp LinkErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, the commit says that you apply a refactoring, but here in fact you add a behaviour that was not present before.
I more than agree that this check is good to add, but let's extract it to a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact the behavior hasn't changed here, because in the original version some of these checks were present in other cases on the top level of remove
function, and currently all of them are here.
@Martoon-00 There are a lot of troubles in the |
Related ticket: #276. |
Description
Problem: Current implementation of the markdown scanner is hard to extend, so we need to refactor it to add support for new annotations.
Solution: Refactor: improve handling annotations, remove IMSAll state, rename functions, isolate processing annotations for different types of Nodes, sharing common behaviour with factored out functions.
Related issue(s)
Needed to fix #64
✅ Checklist for your Pull Request
Ideally a PR has all of the checkmarks set.
If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).
Related changes (conditional)
Tests
silently reappearing again.
Documentation
Public contracts
of Public Contracts policy.
and
Stylistic guide (mandatory)
✓ Release Checklist
package.yaml
.under the "Unreleased" section to a new section for this release version.
vX.Y.Z
.xrefcheck-action
.