Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header rows #44

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Header rows #44

wants to merge 13 commits into from

Conversation

pond
Copy link
Contributor

@pond pond commented Apr 28, 2020

This is something I built for our own use and it might not fall into the ethos or flexibility you prefer to keep in your own original fork - but just in case, here it is :-) - we wanted our Excel output to have bold, centred heading-like text in the first row.

Since there's no obvious way to specify per-cell options given the elegant worksheet << array-of-cell-data semantics, then rather than provide some kind of more complicated expansion of the gem-client-facing API, I decided for a top-level option that just makes the gem auto-format the first output row based on a set of 3 new styles that match the 3 existing styles, except with the "bold" attribute added. This means that it should obey other font styles chosen via other options (provided a bold variant of the font is available) and ought to work correctly with the various different value formatters for different data types, even though we'd normally only expect simple strings in a header row.

I also worked around the test-only issue with RubyZip / ZipTricks re. dos_equals / nil which allowed me to release the v1 constraint on RubyZip and get the newer gem for any bug fixes and security updates it might contain (if on Ruby >= 2.4, else Bundler will pick v1.3 as before).

@felixbuenemann
Copy link
Owner

Hey @pond, thanks for the effort you put into this PR, but I'm going to reject it for now.

I don't object to the idea of having a different format for header rows, but I'm not too happy with this implementation:

  • The format for header rows is a bit arbitrary: While using bold fonts for the headers is quite usual, the conventions differ when it comes to text alignment, since centered text doesn't work well with all data.
  • Since the header row is written along normal data, the implementation overhead is paid for every row, even if it's only needed for the first row.

I think a nicer approach would be to pass the headers as an option when creating a worksheet:

xlsx.write_worksheet headers: ['First Column', 'Second Column'] do |sheet|
  sheet << ['first', 'row']
  sheet << ['second', 'row', 'with', 'more colums']
end

@fsateler
Copy link

Hi,

I'm very interested in a feature like this. However, header rows don't necessarily live in the first row. On the stuff I generate, we do something like:

Report My BlaBla Report
Generation Date 2020-10-12 12:00:00
 
ID Column1 Column2
1 a b
2 a b

So, I would like to add bold to the start-of-table row:

Report My BlaBla Report
Generation Date 2020-10-12 12:00:00
 
ID Column1 Column2
1 a b
2 a b

So, how could one go about doing this? I can put some hours into this, but I'd like to first agree on how the API should look like. I believe there are the following options:

  1. Add a different method. add_header_rows, which applies the header styling.
  2. Add optional parameters to add_row
  3. Make add_row take an optional block, to which the Row is given (and then the row needs methods allow setting the header style).
  4. Other option?

What do you think?

@pond
Copy link
Contributor Author

pond commented Oct 12, 2020

Since we've got to be streaming, then perhaps something as @felixbuenemann describes, with a headers: option; but perhaps it takes either strings in the Array, or Hashes which provide attributes such as text alignment - might be too many features, but it'd only have impact when parsing the header row options which would only happen once for the entire sheet. Since you want the "header" row to appear at a particular line, and since we need streaming output, then it might not be too hard to use a row counter (there may be one internally already) and have e.g. an additional option somewhere which specified the output row that the headers: option describes. Index from 0 or from 1 would be an interesting question; programmers would go with 0, but I think most spreadsheets count from row 1, so that might make more sense.

We're using our fork of this gem still in our own code so it'd be nice to rework this into a form where it could be merged, and be able to use the original gem source again. Time is short, but if I get time, I'll do it!

@felixbuenemann
Copy link
Owner

Since xlsxstream doesn't allow jumping to specifc cells I'm not sure how much sense multiple header rows would make.

Simulating that by inserting rows with nil values isn't something I would suggest.

The main purpose of the gem is to dump data to an Excel sheet in a matter similar to CSV.

Having a header row with bold text is fine by me, but anything more complex than that is IMHO going in the wrong direction.

There are a lot of full featured XLSX gems and I'm really trying to keep this gem lean and maintainable.

@fsateler
Copy link

Having a header row with bold text is fine by me, but anything more complex than that is IMHO going in the wrong direction.

Agreed. That's why I didn't suggest setting arbitrary cells to this format 😄

I'm confused if you were replying to me or @pond (or both!). All I wanted is to select which row is the bold one. Would that be out of scope too?

@pond
Copy link
Contributor Author

pond commented Oct 13, 2020

@felixbuenemann My use of plural option key headers: was only in response to your post on Jun 28, when you use that key as a suggestion. I'm not suggesting multiple header rows. Just one. We tend to refer to those colloquially as "headers" I guess, which I think is why you and I both used headers: plural in our examples.

Accordingly, a simple header option with an array of the strings used in there, with automatic bold, not necessarily any alignment specifier, but perhaps an ability to give the row number when this is emitted, would not be too heavy?

I do totally agree it's a good idea to keep your gem lean and simple, without too many extra features.

@felixbuenemann
Copy link
Owner

@pond The naming "headers:" is fine, I was talking about the suggestion of fsateler, with add_header_rows that would have allowed multiple header rows.

I'm against controlling the row number for header rows, it doesn't make sense to me and you would have to check in the loop for rows constantly instead of writing headers once before the loop.

@pond
Copy link
Contributor Author

pond commented Nov 25, 2020

Hi @felixbuenemann and @fsateler.

Thanks for your feedback on the PR so far. Given the input, I ended up rewriting it in a far simpler form. Sorry it's taken so long to get around to this; 2020 - say no more! 😱

  • @felixbuenemann was concerned about the cost of checking is-header-row on every row's XML generation ("...the implementation overhead is paid for every row, even if it's only needed for the first row") and did not agree with the horizontal & vertical alignment I chose.
  • @fsateler was hoping to be able to output header rows in places other than the first row, but there was again concern about the cost and complexity of this.

I initially merged "upstream" and resolved conflicts. Then I set about implementing @felixbuenemann's suggested alternative xlsx.write_worksheet headers: ... approach. It was still difficult to avoid per-row cost - anything inside the Row class would always require some ability to distinguish between is-header or not, either when the Row was being constructed or when it was generating XML output. I felt that I was just shuffling code and options around, but not tackling any core concerns.

In a "why didn't I think of this sooner" moment, I remembered that Ruby is object orientated 😂 and just subclassed Row as HeaderRow. Suddenly, everything is simpler. The test file for the Row class is no longer polluted by a long list of extra "header" checks; that's now in its own test file, unit testing the HeaderRow class. There are no additional options being passed from layer to layer, so no pollution of concerns, no extra lines of code therein and no extra tests. All we have is Worksheet#add_header_row as a companion to Worksheet#add_row, which has the same implementation but just constructs a HeaderRow instead.

Better yet, since one can choose to (or choose not to) call #add_header_row at any time during streaming out data into the worksheet, you can add a header wherever you like - so @fsateler's unusual use case of "not first row" is catered for at no cost. There is no runtime overhead for non-header rows.

I've also adjusted the alignment for vertical only, with no horizontal alignment. I could drop the alignment specification entirely if you prefer, or potentially could have it as a HeaderRow constructor option but that'd add more code / feature bloat.

Does this look any better, or does it still feel like too much bloat on the code base overall, @felixbuenemann?

@felixbuenemann
Copy link
Owner

Thanks, that sounds like a good approach. I'll look through the code as soon as I find some time.

@felixbuenemann
Copy link
Owner

@pond This is looking great already, using inheritance makes it really clean.

I'm sorry for my horrible response times, but I've been switching jobs and there's just so much going on.

Could you remove the now redundant DATE_STYLE and TIME_STYLE constants as well as the vertical entering in the formats?

I'm not trying to be draconian here with the formatting, but I wan't to keep the possibility for limited custom formatting open and I don't want to break existing formats or strange defaults if I do that. – So for now that means keeping the formatting as limited as possible.

If you're busy let me know, then I'll take care of that myself.

I would also be interested in the dos_equals exception workaround you added: Is that needed for all RubyZip versions or only specific ranges?

@sandstrom
Copy link
Contributor

I understand if people disagree, but I think the pros of adding this (more capability) should be weighted against added complexity. The primary reason being to keep the complexity down, which makes the project easier to maintain over time.

Maybe it would make sense to add a 'discussions' tab to this repo (github feature) and gauge interested before certain capabilities are added?

@pond
Copy link
Contributor Author

pond commented Jun 27, 2021

@felixbuenemann Once again, sorry for slow responses. It's been that kind of year or two pretty much everywhere I think...!

I have removed the unnecessary constants as requested. I also dug deeper into the ZipTricks error. It turns out that this only affects version 5.0.0 and 5.1.0 specifically. Earlier or later versions, including v4.x or >= v5.1.1, seem to be fine. I have therefore removed the workaround as it seems unnecessary given that both the v4 and v5 series gems can be set at versions which work correctly. Happily, this reduces the PR to just the specific feature of interest without needing any additional stuff around the edges :-)

@pond
Copy link
Contributor Author

pond commented Jan 9, 2022

Happy New Year! 🎉

Just adding a bump to this given a ot of time has passed - I know we're all crazy busy. Just wondered how it looked re. merging. It'd be nice to drop my local branch and use the mainstream source if possible 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants