-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVG Groups are ignored. Especially transformations acting on the paths inside the group. #16
Comments
I'd like to add this functionality. I probably won't get to this in the near future though. Here's how to do this (in case some helpful awesome person wants to make a pull-request):
This should work (I didn't test it, but just wanted to give the idea) for everything but paths containing Arc objects. Arc objects shouldn't be too hard to deal with either (see Arc.rotated() and Arc.translated()). |
I took a look at the code by now. Yes. The transformation should be fairly easy to implement and I have started working on it. On Bézier curves, can I also just do a matrix transformation of the start, end and the two control nodes? The issue is more that currently there is only a list of d strings created and then it is parsed. This way all information about the XML tree is lost. Because groups can be nested, the file has to be recursively parsed and transformed. If I am happy with my solution to the problem and it doesn't affect performance too much, I can do a pull request myself, if you want to have a look at it. It is still work in progress, though. |
That's exactly what the above code is doing.
Keeping track of the group structure could be accomplished to some extent by simply adding an attribute to |
OK. I included the code now with a few changes to keep it closer to this documentation https://developer.mozilla.org/de/docs/Web/SVG/Attribute/transform My changes are ready for a pull request now. However I was not able to create a new branch. I get a |
@alphanoob1337 to make a pull request when you don't have commit permission you first need to create a fork (on your own repository), commit to that, then click the "create pull request" button on this (the svgpathtools) repository and you should see your own fork as a "compare" option. |
Groups are now taken into account during parsing and transform attributes of groups are applied to the paths before they are returned.
So let's give this a fresh start. I think we both agree by now, that it is not that simple. If you, @mathandy want to implement it in another way, that is totally fine. I just think it doesn't make sense for me to do it, if you already have very specific ideas about how you want it to be done. It will be more effort for you to tell me how you would like it to be implemented. Although I am by no means a professional (I just code out of necessity for hobby and professional projects), I do care about coding style. Not about PEP8, but about thinking before implementing. So please let me explain my thoughts about the solution to this issue: I was assuming that SVGs are typically composed the following way:
The file I had, had 200 paths in 86 groups nested up to four levels deep. So for the sake of this argument, let's assume it has 200 paths in 62 groups in three layers. Then, in the first layer it would be 50 groups with 4 paths in each them, the second layer would be 10 groups of 5 groups of the first layer and the third layer would be 2 groups of 5 groups of the second layer. I see two main ways of implementing the group transformations. Let me call them 'top-down approach' and 'bottom-up approach'. The top-down approach is what I preferred and implemented. It is recursively walking through the XML tree of the SVG and parsing the paths and groups along the way. I call this top-down because one goes from the root to the branch. In a hierarchy this is called 'top-down'. The bottom-up approach would be to walk through the XML tree once and identify all the path objects along the way. Then one loops over the parent nodes for each path to parse and apply the group transformations. This is parsing the groups starting from the leaves to the root. Hence 'bottom-up'. As always, both approaches have their benefits: The top-down works best if the structure is deeply nested and if there are multiple paths in one group. However, empty groups are bad for performance. The bottom-up approach doesn't parse empty groups. Also it works well if there is one group per path. Therefore this approach is the better choice for shallow structures. One can also implement hybrid approaches, where the groups are parsed top-down and the paths independently of that. Then the transformation matrices are applied afterwards. Apart from the thoughts about the best way to do it, there are also practical considerations, that play a role during the choice. The ones I see as most prominent are the following: In the current implementation the structure of the SVG file does not play a role so far. So it was perfectly valid, to extract the paths and work on them after they have been extracted from the XML tree. Since there were library functions available for the task, which can be assumed to be somewhat optimized, I agree that this was the best choice at that time. So to implement the top-down approach, one has to change existing code unnecessarily and the order in which the objects are returned by the function changes. The bottom-up had the clear disadvantage for me, that it was not the best approach for my example described above: in total, 600 group transformations would have had to be parsed and applied. In the top-down it is only 62 groups have to be parsed. Of course this is highly non-linear with the complexity of the file. In addition it allowed for all the other objects that are parsed into a path object, to be dealt with in the same walk through the tree. In principle there is only a small performance penalty for a hybrid approach. But there is a huge practical drawback: I was not able to come up with a good way to identify the individual path objects during the walk through the tree: paths usually have ID attributes, but you can not rely on this. The order in which minidom returns the objects in getElementsByTagName is not documented and therefore also unreliable (in the current version the elements are returned in order of appearance in the file). Of course one could say that it is unlikely that the order in which getElementsByTagName returns the nodes changes and rely on the current implementation. But still it would make the code more complex to read if two different approaches are mixed together. Since for the svgpathtools the order in which the paths and converted objects are returned was not documented either and also was subject to change with minidom anyway, I decided this is undesirable but not a huge issue. The change of existing code was also undesirable to me, because it requires additional work, but in my opinion it was worth the hassle because it is the more elegant choice and it avoids walking the XML tree unnecessarily often and allows to directly cross-parse other nodes into paths in the same walk, which reduces the number of walks even compared to the current implementation without any groups. Also other complex features, like parsing the other group attributes and overwriting parts of the path attributes where needed, can be implemented in the same XML tree walk, if that is desired in the future. Of course this is a question of style and therefore it comes down to personal opinions and preferences. Since you are maintaining the project it should be your choice. However it was not really clear to me from your comments, which approach you prefer. Is it the bottom-up or the hybrid one? I went ahead with the top-down approach, because it appeared to be the most logical one to me and my assumptions about SVG files. Also it complied best to my personal preferences, as I value simplicity, scalability and performance over keeping well-tested code. But as I said: it is fully up to you to decide which changes are unnecessary and which are not. I am just explaining here why I decided to implement it the way I did in my fork. |
Thanks for the explanation of your thinking @alphanoob1337. Does that make sense? If so, what would you suggest my next step be? P.S. The recursive structure in your code is aesthetically an excellent choice, but I'm worried about its performance and scalability compared to other options. What do you think? |
@alphanoob1337 would you mind me using some of the code from your suggested fix to add support for groups? I opened a branch for this issue. If you make a new pull-request to the issue16 branch, I'll merge it and use your code as a base for the new functionality. |
Sorry for the slow response, but I am fairly busy at the moment. Yes, please use as much of my code, as you want to. I have a five hour train ride today, where I plan to clean up the PEP8 compatibility and create a pull request. Yes. Python sometimes behaves funny in terms of performance. Appending the paths to a list in the loop as is my main concern at the moment since copying the list of paths around is very costly in terms of performance. However I was not really able to figure out a faster way to do this, but I am confident, that it still is fast enough for most every-day situations: I checked the code against a large SVG file from Wikimedia Commons: Geological map of East-Westfalia created by the author Hagar66 and it took my laptop 6.31 to 8.60 seconds to run the svg2paths command on it, although the file has 22,920 paths and also some groups with transformations in it. According to my experience often it is more efficient, to parse a file of unknown size twice: once to count the number of objects in it and then allocate the memory needed. Then, in a second run, the data is actually parsed and written to the allocated memory. This works well i.e. with large numerical data files. In this case however, the space needed in memory to store a path object is not clear a priori. So the pre-parsing and serialization will probably create more overhead than it speeds up the code. So I think there is different ways of continuing from here for the group implementation:
So those are my thoughts on this. What do you think about how to proceed? |
@alphanoob1337 as soon as I've fixed the bug I'm currently working on (hopefully today, but I don't know -- I've got a lot on my plate), I'm going to play around with an implementation of your (1) -- I'll try a bottom-up approach to make it as self-contained as possible. As far as (2) goes... I'm not sure how inheritance works in SVGs.
If these are the only two exceptions to the children-override-parents rule, then implementing (2) doesn't sound so bad... though that's a pretty big if. To anyone reading: Do you know of any documentation where inheritance in SVGs is well-explained? @alphanoob1337 like you say, it seems like (2) is opening pandora's box a bit, but it might be worthwhile to add an experimental implementation that comes with a warning in the documentation. On a somewhat related note, if your original top-down approach offers speed benefits, it might be worthwhile to offer it also as another "experimental feature". |
Just another quick note: |
Hi @mathandy, I was mostly concerned with the style attribute, because it requires full CSS parsing to resolve the inheritances. A group can have style properties, that the path overwrites but the others also apply to the path. Also there is style classes which can be defined for groups and path elements, that are also inherited according to rules. And last but not least, there is inline stylesheets in SVG that can address specific elements (groups or paths) by their respective ID and change their style. For SVG 2 it is required for the user agent to parse the CSS. In the SVG 1.1 documentation I found the hint that the CSS cascade must be obeyed if CSS is parsed (this was optional for SVG 1.1). The inheritance rules for CSS are given in chapter 6 of the the CSS documentation. Those can get really complicated and this fact alone was reason for me not to deal with inheritance at all. But maybe there is some Python code for CSS parsing around, you can just import as a dependency... And another thought: I would intuitively say that clip-paths are inherited similar to transformations so that all of them apply to the path in the respective order... Cheers |
Support for transformations will be very useful. Inkscape uses transformation matrices on many objects by default, resulting in completely misplaced paths. I hope the code-style issues get resolved quickly. |
Hi @sbliven, This is not only the case for Inkscape but also for Adobe Illustrator. If you don't care about the style properties of the paths but only their geometry, you can download my fork https://github.com/alphanoob1337/svgpathtools . I think it is not fully up to date, but mostly. Cheers |
@alphanoob1337 and @sbliven, I'm with you. I just pushed a partially implemented fix. The idea being to add a container for a DOM-style document. It's not fully implemented but I was pretty verbose in my plan in the dostrings (see here). This will also add support for modifying SVG-Path elements inside html files (or any DOM-style file). Typical usage will look something like:
There will also be (or so I'm planning). A "flatten_paths" function or method to remove all transform attributes from the document and apply them directly to the paths. To anyone reading: I've got very little time to spare on svgpathtools these days, but I'm hoping to finish this new feature in the next few weeks. Also: Anyone have a good idea how to implement automated testing (unittests) for this? |
It would also be useful for the |
@mathandy Many thanks for the library. @alphanoob1337 support for transformation will be much helpful for extracting geographical maps in PDF files as embedded as svg path segments. Is this feature is enabled in the current version? how to use that. As per this method to preserve relative position of path segments, how to collect the transformation matrix for each path segments, is this available as like path and attributes collection by svg2path, |
@nishadhka there is an intrinsic problem with this and this led to the proposed changes not being included in the library: It is not possible to map the group transformations onto attributes. A simple example would be the line width of a circle wich you stretch along the x axis by a factor of two: if the thickness attribute of the path is two, the resulting ellipse after the group transformation will have a line width varying from two to four along the path, because the object is first created and then transformed. This can't be mapped on a flat list of paths but would need some Python implementation of SVG groups which is a lot of work to do and will probably cause problems if you want to use the path objects with other libraries. If you are only interested in the path itself, one can use the fix I proposed. If you need the attributes such as line width to be transformed as well, this will not do the job for you. If you only need the path itself, please tell me, then I will provide you an updated fork of this library which you can use. |
@alphanoob1337, Your transformation support just works for the requirement, thank you very much. I used the svg2paths from your fork and it's parse returned transformed svg paths with its relative position, which was then converted into as Shapely coordinates and into polygon. The pdf page is as below |
I'm working on a python project that needs to consume SVG files, and ideally I would like to be able to traverse the original group structure of the SVG after the file is parsed, but still have convenience functions to flatten out the paths from the groups that they're nested inside of. There are other python SVG parsing libraries that seem to handle the group transforms, but their APIs for traversing the group structure as a user seems rather lacking. Based on what I've seen, I think the I think the one design change I would propose to
I'm happy to make these changes, implement the missing features, and write unit tests for all the currently untested features if you'd be on board with this idea. Since I have deadlines to meet, I'll probably move forward on this very soon (with or without your buy-in), but ideally I would like to submit any progress that I make upstream to you. Disclaimer: I'm rather new to python (my expertise is in C++), so when I get around to submitting a pull request, there may be some novice python-specific issues that need correcting. |
Hey @mxgrey , that sounds good! @alphanoob1337 already implemented some code to flatten groups which might be useful. Assuming you meet the following two requests, I'll merge in whatever pull-requests you make.
Thanks! What your suggesting would be the most significant improvement to If you want advice on anything, just ask! |
Looks like we have a solution to this now thanks the new Document class pushed recently by @mxgrey ! I'll leave this issue open for a bit to encourage input on any minor changes that should be implemented before posting this on the PyPI -- it'll be tagged The one significant change I'm likely to implement is to have Two minor changes so far:
|
That's totally reasonable. I used named tuples to avoid treading on the Paths class, but if it's okay to make those attributes of Path, then I think that offers a much cleaner API. |
Some of the earlier discussions about going backwards up the line made me suspect some flaws were going to crop into that code. Since the CTM is always the concatenated result of all previous groups, and the groups list of transforms would need to be processed backwards I'd have a great a-ha comment. But, nope. Even with my expertise having written svg parsers before and multiple readings of the SVG documentation. That's solid code. I looked and the only thing I could find is that, technically, according to the BNF: Tab, Linefeed, and Carriage Return are also white spaces. So:
Is technically correct but doesn't seem like it would be parsed (see parser.py 223, 224). But nobody does that. |
I suspect that would still work, unless I'm overlooking something. On this line we have One issue that I would be concerned about is that we might be too lenient about what kind of transform strings we accept. For example, |
For that you're right. From my take it slices at the ) of the parameters between the items. Then would hand it.
The slice with the ( would give it:
and
So yeah with
So maybe feed it something like rotate
Though that might fall into the same successful parsing anyway. Or it might try to cast some spaces and tabs into being floats. Even with malicious intent I can't see a clear way to fully malform it consistent with the SVG BNF but inconsistent with the parsing here. you could do something like do_not_scale(2) and the SVG says that becomes SVG_TRANSFORM_UNKNOWN but it would scale, but those are generally malformed anyway so you should expect inconsistent behavior. I suppose the 2.0 recommendations include allowing transform on the The document parsing in the Document.py will certainly work and be reasonably effective. There is however a slightly less flat but perhaps more effective way of doing that. The SVG inheritance is generally that all children of all groups inherit all properties. The transforms are concatenated and the same values are replaced. Programatically SVGs can better represented the same nested structure in the SVGs themselves and a dictionary of subproperties and a matrix, while this sort of looks like a DOM tree, you can kind of implement it more like a light-tree structure parsed with a SAX parser, with end nodes that generally have a path, a matrix, (if needed) a local dictionary, and an inherited dictionary. Where each parent node gives a copy of its dictionary and matrix to its children (when that tag for the child opens). Basically a minimum viable preservation of most svg documents. You basically pre-flatten the end nodes, not just for the transformations, but also for the dictionary of properties you might care about, and preserve the structure a bit. While there's some obvious advantages to keeping the entire DOM. Unless you want to save things again with entirely preserved structure, it's not really worth it. Consider something like a fill color. If you had to go back and find all the groups and then check for the most recent parent, or current object with a fill="lime" or style="opacity:1;fill:#000" to figure out the expected fill color for the SVG it might be kind of nightmarish. But, it wouldn't be too hard to find all nodes with paths, and query that node dictionary for its value for "fill". And you could use that data to rebuild the document. You could also just ignore all that structure, use a SAX parser and parse all the elements maintaining a stack of matrices and dictionaries as it goes through the tree, and add each leaf node to the list, as you go. Which you could still build an SVG from but you'll lose the structure. Which seems like it would do the same thing the flatten_all_groups is doing but will simply load the SVG document flat, rather than load the entire DOM and search it for that data, and make a point to maintain the properties that apply to that particular path. When I implemented this I had the parsing of the document generate paths prefixed with a fake svg-path commands I called "x" for matrix which literally just contained the six values for the matrix, and "f" for fill_color and "k" for stroke_color, followed by the RGB integer for those colors. In my usecase I didn't care about more than that, and that basically entirely defined all shapes I cared about. |
Figured it my comments might have been a little bit oblique with regard to SAX parsing for the flattened values. Basically if you want to parse the SVG document quickly, you need something like in the pull request #66 there. I obviously missed some spots I shouldn't have. And there might be better ways to do some of that, or more compatible ones. You should be able to tell from it's much faster. Like day and night. Less memory problematic. But, moreover since you're already skidding through the file, it's not actually that hard to track the matrix or the process the paths. We are just extracting the information we need rather than loading the entire DOM then collapsing those elements. It also allows you to properly keep track of the fill and stroke values which oddly enough people will care about. |
Sax Parsing: 15.340695629785454
Doesn't actually seem much faster, though seems like there's likely some ways to speed that up. |
Got it to about twice as fast. Switching over to iterparse and skipping applying the matrix if it is identity (which should be basically always). Sax Parsing: 6.087709155595086 But, it basically already reads flat, with only processing or caring about the leaf nodes and the properties inherited by the leaf. And taking that matrix equally in stride compounding in the stack as it iterates through. |
Are there any plans to include support for transformations? I see a lot of good discussion here and it seems @alphanoob1337 has produced some limited functionality but are there any plans to include this functionality in a future release? |
Do you mean as a proper release here and on PyPI? Both the Sax and Dom functionalities do transformations but it does look like the last release was done before that functionality was added. |
When building from scratch the image below still cannot be rebuilt without alteration using the following code taken from the README.
Note: The issue could be with something other than the inclusion of transforms I'm still learning about the SVG format. |
Looking at that code, my best guess is that the code here doesn't actually load the def tags. Specifically the way that code is working is that it defines a def tag and then calls that def tag later in the body of the SVG. While this is clearly a valid SVG, it's not, as far as I'm aware something that either the DOM or SAX parser deals with. |
I just pushed some improvements to the
My "heart.svg" is the following example (taken from here).
|
If I wanted to do that I think my code would be:
Also, you could add some dunder methods to documents. Assuming you're adding a path to that object. flattened_rotated_doc = Document()
for p in doc.paths():
flattened_rotated_doc += p.rotated(90)
flattened_rotated_doc.display() Also, if you want to store transforms like that on paths, I have some code for those various elements. Full fledged parsing is hard, and I had to read through so many different technical documents to get mostly right. But, I did build things really modularly so you could just use my To get all this stuff right I had to add TL;DR SVG Parsing is a deep deep hole. Though if you want, the patch for the use/def objects is posted in #90. You can just create a stealthy generator for the iterparse generator to get the use stuff to work. The code you have is fantastic with math, in fact several of your bbox() ended up in my I made one such function for Arc, if I had cubics I'd have a full set of functions with scipy that would instantly give exact lengths: def _exact_length(self):
"""scipy is not a dependency. However, if scipy exists this function will find the
exact arc length. By default .length() delegates to here and on failure uses the
fallback method."""
from scipy.special import ellipeinc
a = self.rx
b = self.ry
phi = self.get_start_t()
m = 1 - (a / b) ** 2
d1 = ellipeinc(phi, m)
phi = phi + self.sweep
m = 1 - (a / b) ** 2
d2 = ellipeinc(phi, m)
return b * abs(d2 - d1) |
Hi, I was playing around with your tool and notice that the transformations described in group tags do not affect the contained paths yet. Where is this project at? Just to know how I may help. |
I believe in this implementation which I have sort of come around to it's under Part of the issue here is a larger one with all the various svg implementations. They aren't really subsets of doms. Which is what the document class here is meant to address. I think meerk40t/svgelements#87 captures the problem fairly well. I think what needs to be done is take all the code from Flatten should work here if you modify the group and you need to propagate them down. In svgelements you can multiply a group containing things by a matrix and it'll respect that. But, like most things in these render-only libraries it's kinda destructive. |
Just to adding more context to tatarize's answer: It is important to understand that many transformations can inherently not be flattened out without loss of information. Do you just want the coordinates of the path to be updated according to the group transformations or do you want to render an SVG with transformations? The latter is harder to achieve: Imagine how scaling a complicated path with a certain line width along the x-axis but NOT along the y-axis using a group transformation will give you a line with varying thickness in most SVG rendering software. In many cases flattening the transformations will give you the expected result (i.e. in a filled shape). If you are just interested in the geometry of the path but not in rendering it you can completely ignore this comment because flattening is what you want. |
Having read the SVG docs a bunch of times. While the behavior alpha describes is how that often works. The SVG Spec is kind of silent on the issue. My own project scales the stroke widths by the square root of the determinant of the matrix in that case which would scale up the strokes but would not create non-uniform stroke scaling. Having the actual values in the object update based on the group modification would require re-rendering a properly dom based system (which I'm planning on writing and doing like that). You'd have to take the change in group and flag everything below it as dirty and then process the rerender if any function tried to get an update on that point. |
Yes, with "expected behaviour" I was referring to how Inkscape handles SVG group transformations. I see Inkscape as the de facto reference implementation of the SVG standard, but that is a personal opinion and in this case there is no conflict with the W3C standard. For some projects rendering is irrelevant (i.e. I was using svgpathtools for the optimization of SVG files for laser cutters). Even for most projects where rendering matters, tatarizes approach will match the expected result or will be very close to it. |
Example in an SVG created in Inkscape:
The group transformation is ignored when calling
svgpathtools.svg2paths()
.The text was updated successfully, but these errors were encountered: