-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncate the length of variable names #332
Comments
There is not. I am open to adding this, either as a separate function or as part of Would you shorten it by truncating from the end, then forcing uniqueness? I sometimes have very long survey questions with identical beginnings and in those cases I take maybe the first and last 10 characters, separated by a |
Yes, that sounds most appealing. |
I second |
Is this the same as #201 ? That's a couple of votes then. I wasn't aware of I like the idea but this would be adding a bunch of new arguments to |
I'll try to pull something like this into the #340 rewrite. |
If you think it doesn't add too much for something that already exists in a
modular form in abbreviate.
Maybe just a max length argument and in the docs note that finer control of
truncation can be achieved through abbreviate?
…On Sat, Mar 7, 2020, 8:00 AM Bill Denney ***@***.***> wrote:
I'll try to pull something like this into the #340
<#340> rewrite.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#332?email_source=notifications&email_token=ABZYDEBXC665I5BZ527CSJDRGJAPDA5CNFSM4K466O52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEODYZOI#issuecomment-596085945>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZYDEA75VLGPFVCXYISNI3RGJAPDANCNFSM4K466O5Q>
.
|
I realized that I missed this feature in the |
I'm not finding abbreviate useful for my cases where the variable is a long text string from a survey.
This is not useful:
I much prefer:
As a better use of 30 characters. I may be biased by this being my main use case for this problem. |
I agree that a truncated version is more useful than a long abbreviation. I played around with One thing that occurred to me: are there special things to watch for that are more critical pieces of info in a long colname that we want to try to keep? For instance numerals - "Please indicate your level in 2019", "Please indicate your level in 2020", or capitals, etc? If so, we could check for these markers and make sure they are always kept. Another thing would be to combine truncation with abbreviation for a string above some number of chars - truncate the first 15 or 20 keeping the special markers, then abbreviate the rest? |
@jzadra, I was thinking that if we went down a path with a lot of controls for how abbreviation happens, then we may want another pair of functions (e.g. data %>%
abbrev_names([all the controls for abbreviation]) %>%
clean_names([all the controls for cleaning]) In my mind, the part that fits in One other item that I'd suggest is that abbreviated names would not be guaranteed to have the number of characters due to duplicated column names having For my use cases, I find myself often doing the following: data %>%
clean_names() %>%
rename(
# make the names what I actually want them to be, but
# start from something known to be ok and unique
) |
That makes sense to me. And I often do something similar to your use case - let clean_names() get them most of the way there and unique, and then modify what needs to be changed manually. |
Feature requests
Just to confirm: there's no option yet to truncate variable names?
E.g.
clean_names(x, max = 24)
The text was updated successfully, but these errors were encountered: