Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Arc parsing for Flags. #53

Closed
tatarize opened this issue Nov 25, 2019 · 7 comments
Closed

Invalid Arc parsing for Flags. #53

tatarize opened this issue Nov 25, 2019 · 7 comments

Comments

@tatarize
Copy link
Contributor

This may not be worth fixing.

I checked the SVG 1.1 XBNF ( https://www.w3.org/TR/SVG11/paths.html#PathDataBNF ) to be sure, and it's completely true. It turns out:

M200,120 h-25 a25,25 0 1125,25 z

... is a valid path. Since the flag token is either 1 or 0 and does not require any space to say it's valid. This is a test in the W3C_SVG_11_TestSuite. Namely paths-data-20-f.svg.

 <!-- no commawsp between arc flags and no commawsp after arc flags -->
    <path d="M200,120 h-25 a25,25 0 1,1 25,25 z" fill="red" stroke="lime"/>
    <path d="M200,120 h-25 a25,25 0 1125,25 z" fill="lime"/>
from svg.path import *
parser.parse_path("M200,120 h-25 a25,25 0 1125,25 z")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "...parser.py", line 181, in parse_path
    end = float(elements.pop()) + float(elements.pop()) * 1j
ValueError: could not convert string to float: 'z'

Flags are only used in the BNF for arcs:

elliptical-arc-argument:
nonnegative-number comma-wsp? nonnegative-number comma-wsp?
number comma-wsp flag comma-wsp? flag comma-wsp? coordinate-pair
flag:
"0" | "1"

Since the comma-wsp? has ? it's optional so 1125 means 1 1 25.

@regebro
Copy link
Owner

regebro commented Nov 25, 2019

Yeah, I saw that when originally implementing this and filed it under "that makes no sense, look at it later" and then forgot about it...

@tatarize
Copy link
Contributor Author

I've only seen such a thing in this test and had to check the spec to see if it could plausibly be correct. And would require some pretty radical changes to fix, and it's one special weird quirk with arc grammar.

@tatarize
Copy link
Contributor Author

Fixing this in svgelements basically required recoding the entire parser from scratch to better fit the XBNF. There's other issues like if you have # in the path command it's supposed to stop parsing right there. So it would fail some of the other tests and basically had the inline-z command really hard to implement (though there's some debate whether they are getting into SVG-2.0 proper for lack of implementations).

https://github.com/meerk40t/svgelements/blob/3e82366f31d923e47cf24122de7350c68945dc9e/svgelements/svgelements.py#L164

PATTERN_WS = r'[\s\t\n]*'
PATTERN_COMMA = r'(?:\s*,\s*|\s+|(?=-))'
PATTERN_COMMAWSP = r'[ ,\t\n\x09\x0A\x0C\x0D]+'
PATTERN_FLOAT = '[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?'

svg_parse = [
    ('COMMAND', r'[MmZzLlHhVvCcSsQqTtAa]'),
    ('SKIP', PATTERN_COMMAWSP)
]
svg_re = re.compile('|'.join('(?P<%s>%s)' % pair for pair in svg_parse))
num_parse = [
    ('FLOAT', PATTERN_FLOAT),
    ('CLOSE', r'[Zz]'),
    ('SKIP', PATTERN_COMMAWSP)
]
num_re = re.compile('|'.join('(?P<%s>%s)' % pair for pair in num_parse))
flag_parse = [
    ('FLAG', r'[01]'),
    ('SKIP', PATTERN_COMMAWSP)
]
flag_re = re.compile('|'.join('(?P<%s>%s)' % pair for pair in flag_parse))


class SVGLexicalParser:
    def __init__(self):
        self.parser = None
        self.pathd = None
        self.pos = 0
        self.limit = 0
        self.inline_close = None

    def _command(self):
        while self.pos < self.limit:
            match = svg_re.match(self.pathd, self.pos)
            if match is None:
                return None  # Did not match at command sequence.
            self.pos = match.end()
            kind = match.lastgroup
            if kind == 'SKIP':
                continue
            return match.group()
        return None

    def _more(self):
        while self.pos < self.limit:
            match = num_re.match(self.pathd, self.pos)
            if match is None:
                return False
            kind = match.lastgroup
            if kind == 'CLOSE':
                self.inline_close = match.group()
                return False
            if kind == 'SKIP':
                # move skipped elements forward.
                self.pos = match.end()
                continue
            return True
        return None

    def _number(self):
        while self.pos < self.limit:
            match = num_re.match(self.pathd, self.pos)
            if match is None:
                break  # No more matches.
            kind = match.lastgroup
            if kind == 'CLOSE':
                # Inline Close
                self.inline_close = match.group()
                return None
            self.pos = match.end()
            if kind == 'SKIP':
                continue
            return float(match.group())
        return None

    def _flag(self):
        while self.pos < self.limit:
            match = flag_re.match(self.pathd, self.pos)
            if match is None:
                break  # No more matches.
            self.pos = match.end()
            kind = match.lastgroup
            if kind == 'SKIP':
                continue
            return bool(int(match.group()))
        return None

    def _coord(self):
        x = self._number()
        if x is None:
            return None
        y = self._number()
        if y is None:
            raise ValueError
        return x, y

    def _rcoord(self):
        position = self._coord()
        if position is None:
            return None
        current_pos = self.parser.current_point
        if current_pos is None:
            return position
        return position[0] + current_pos[0], position[1] + current_pos[1]

    def parse(self, parser, pathd):
        self.parser = parser
        self.parser.start()
        self.pathd = pathd
        self.pos = 0
        self.limit = len(pathd)
        while True:
            cmd = self._command()
            if cmd is None:
                return
            elif cmd == 'z' or cmd == 'Z':
                if self._more():
                    raise ValueError
                self.parser.closed(relative=cmd.islower())
                self.inline_close = None
                continue
            elif cmd == 'm':
                if not self._more():
                    raise ValueError
                coord = self._rcoord()
                self.parser.move(coord, relative=True)
                while self._more():
                    coord = self._rcoord()
                    self.parser.line(coord, relative=True)
            elif cmd == 'M':
                if not self._more():
                    raise ValueError
                coord = self._coord()
                self.parser.move(coord, relative=False)
                while self._more():
                    coord = self._coord()
                    self.parser.line(coord, relative=False)
            elif cmd == 'l':
                while True:
                    coord = self._rcoord()
                    if coord is None:
                        coord = self.inline_close
                        if coord is None:
                            raise ValueError
                    self.parser.line(coord, relative=True)
                    if not self._more():
                        break
            elif cmd == 'L':
                while True:
                    coord = self._coord()
                    if coord is None:
                        coord = self.inline_close
                        if coord is None:
                            raise ValueError
                    self.parser.line(coord, relative=False)
                    if not self._more():
                        break
            elif cmd == 't':
                while True:
                    coord = self._rcoord()
                    if coord is None:
                        coord = self.inline_close
                        if coord is None:
                            raise ValueError
                    self.parser.smooth_quad(coord, relative=True)
                    if not self._more():
                        break
            elif cmd == 'T':
                while True:
                    coord = self._coord()
                    if coord is None:
                        coord = self.inline_close
                        if coord is None:
                            raise ValueError
                    self.parser.smooth_quad(coord, relative=False)
                    if not self._more():
                        break
            elif cmd == 'h':
                while True:
                    value = self._number()
                    self.parser.horizontal(value, relative=True)
                    if not self._more():
                        break
            elif cmd == 'H':
                while True:
                    value = self._number()
                    self.parser.horizontal(value, relative=False)
                    if not self._more():
                        break
            elif cmd == 'v':
                while True:
                    value = self._number()
                    self.parser.vertical(value, relative=True)
                    if not self._more():
                        break
            elif cmd == 'V':
                while self._more():
                    value = self._number()
                    self.parser.vertical(value, relative=False)
            elif cmd == 'c':
                while True:
                    coord1, coord2, coord3 = self._rcoord(), self._rcoord(), self._rcoord()
                    if coord1 is None:
                        coord1 = self.inline_close
                        if coord1 is None:
                            raise ValueError
                    if coord2 is None:
                        coord2 = self.inline_close
                        if coord2 is None:
                            raise ValueError
                    if coord3 is None:
                        coord3 = self.inline_close
                        if coord3 is None:
                            raise ValueError
                    self.parser.cubic(coord1, coord2, coord3, relative=True)
                    if not self._more():
                        break
            elif cmd == 'C':
                while True:
                    coord1, coord2, coord3 = self._coord(), self._coord(), self._coord()
                    if coord1 is None:
                        coord1 = self.inline_close
                        if coord1 is None:
                            raise ValueError
                    if coord2 is None:
                        coord2 = self.inline_close
                        if coord2 is None:
                            raise ValueError
                    if coord3 is None:
                        coord3 = self.inline_close
                        if coord3 is None:
                            raise ValueError
                    self.parser.cubic(coord1, coord2, coord3, relative=False)
                    if not self._more():
                        break
            elif cmd == 'q':
                while True:
                    coord1, coord2 = self._rcoord(), self._rcoord()
                    if coord1 is None:
                        coord1 = self.inline_close
                        if coord1 is None:
                            raise ValueError
                    if coord2 is None:
                        coord2 = self.inline_close
                        if coord2 is None:
                            raise ValueError
                    self.parser.quad(coord1, coord2, relative=True)
                    if not self._more():
                        break
            elif cmd == 'Q':
                while True:
                    coord1, coord2 = self._coord(), self._coord()
                    if coord1 is None:
                        coord1 = self.inline_close
                        if coord1 is None:
                            raise ValueError
                    if coord2 is None:
                        coord2 = self.inline_close
                        if coord2 is None:
                            raise ValueError
                    self.parser.quad(coord1, coord2, relative=False)
                    if not self._more():
                        break
            elif cmd == 's':
                while True:
                    coord1, coord2 = self._rcoord(), self._rcoord()
                    if coord1 is None:
                        coord1 = self.inline_close
                        if coord1 is None:
                            raise ValueError
                    if coord2 is None:
                        coord2 = self.inline_close
                        if coord2 is None:
                            raise ValueError
                    self.parser.smooth_cubic(coord1, coord2, relative=True)
                    if not self._more():
                        break
            elif cmd == 'S':
                while True:
                    coord1, coord2 = self._coord(), self._coord()
                    if coord1 is None:
                        coord1 = self.inline_close
                        if coord1 is None:
                            raise ValueError
                    if coord2 is None:
                        coord2 = self.inline_close
                        if coord2 is None:
                            raise ValueError
                    self.parser.smooth_cubic(coord1, coord2, relative=False)
                    if not self._more():
                        break
            elif cmd == 'a':
                while self._more():
                    rx, ry, rotation, arc, sweep, coord = \
                        self._number(), self._number(), self._number(), self._flag(), self._flag(), self._rcoord()
                    if sweep is None:
                        raise ValueError
                    if coord is None:
                        coord = self.inline_close
                        if coord is None:
                            raise ValueError
                    self.parser.arc(rx, ry, rotation, arc, sweep, coord, relative=True)
            elif cmd == 'A':
                while self._more():
                    rx, ry, rotation, arc, sweep, coord = \
                        self._number(), self._number(), self._number(), self._flag(), self._flag(), self._coord()
                    if coord is None:
                        coord = self.inline_close
                        if coord is None:
                            raise ValueError
                    self.parser.arc(rx, ry, rotation, arc, sweep, coord, relative=False)
        self.parser.end()

@ctrlcctrlv
Copy link

It's bizarre that the GNOME developers would put one of these paths in a system file, but they did. See #69.

@regebro
Copy link
Owner

regebro commented Mar 27, 2021

I made a new parser that fixes this issue, and released a 5.0.0a1, with this.

It's an alpha, because besides this issue, I only fixed the easy bugs, and of course, it's a complete rewrite of the parser, so there may be new bugs. Also, I didn't do anything backwards incompatible, which I have a huge urge to do.

@regebro regebro closed this as completed Mar 21, 2022
@tatarize
Copy link
Contributor Author

I realized after completed almost all the svgelements work to parse most of the SVG spec correctly, that the better solution to parsing things from the start would actually be to parse XML into a proper DOM tree of bootstrapped nodes and then rendering the DOM tree into svg.path like objects during different process rather than parsing and rendering in a single operation. ( meerk40t/svgelements#87 ).

So basically perform a subset of how browsers parse things. And use the code to manipulate svg DOM-nodes like one would do for javascript. Browsers apparently have baked in solutions to a lot of problems within svg parsing ( http://taligarsiel.com/Projects/howbrowserswork1.htm ). And really you can do bootstrapping of nodes to replace the default parsed xml node with a more specific svg-type node (as browsers do) (you parse <rect> you get a RectNode, which is not by a rendered path but becomes that during the rendering process).

This stuff basically solves a question of how you could load an SVG file and have a node-tree object like you'd have for svgwrite. And then when you have a DOM-tree like svgwrite would create programatically but loaded as file. And then when you perform a render process you create a render tree of objects that basically look exactly svg.path objects, it fixes sooo many things.

You can apply CSS to the DOM Tree and then re-render the tree to get a different set of rendered paths, rather than make a bunch of makeshift changes to partially rendered svg-like objects. This solves harder problems like modifying a transform on a <group> node and having it correctly affect all child nodes.

But the entire thing is basically a rewrite that fixes things profoundly. It basically makes the broken family of good python svg projects: svg.path, svgelements, svgpathtools, svgwrite code into a simple, correct, highly-scriptable, css-operable, and unified project rather than a broken family of parts that do not cooperate with each other except for the parts that descend from your original library. But, since I mostly finished svgelements, even with my parsing and rendering single-step being a fundamental foundational flaw in the project. I kept svgelements untouched since correctly parsing most of the svg spec is a big accomplishment, and breaking backwards compatibility quickly becomes untenable once basically anybody is using your library.

132128583-9f0b1cf9-b1e0-4e4d-aeb3-c53cca19580b


It might be tempting to break backwards compatibility but it's probably better to do so in a different project.

@regebro
Copy link
Owner

regebro commented Mar 23, 2022

Yes, SVG is an application of XML, and in general, should be treated like and parsed like XML.

svg.path is entirely dedicated only to parsing the d attribute of <path> elements, originally with the aim of drawing/using those paths, but now also with the aim of modifying and calculating with them. Perhaps it should have been called svg.d but too late now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants