-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more rows to data set; discuss reconciliation, GREL, extensions/packages #29
Comments
I understand this is an issue from the previous Curriculum Advisory Committee, when other people maintained this lesson. Regardless, I would still like to thank that CAC for bringing up these action items and see how we can address them. I have referred to these suggestions on multiple occasions, so they haven't been ignored all these years. TL;DR: regarding the action items, I would say:
As an instructor of this lesson, I agree that 131 rows of data is not a lot. Instead of using facets or clustering to find outliers, you could fix incorrect values in these rows manually. What would be a good number? 1000, which according to the minutes is half the dataset? Reconciliation is very useful too, but we would need things to reconcile, preferably with example research questions that help learners understand why you would reconcile. Perhaps the easiest example would be to reconcile the names of the villages, districts, province and wards. There may be a possibility to use this to spot that in one row the village is said to be in the incorrect ward or district. If I understand correctly, packages in the action items refers to extensions and distributions for OpenRefine. I think that could be a topic for the discussion page/section, because support for extensions across OR versions varies a lot and the distributions appear to be niche products. They can be powerful, but I feel they are less suited to people who first discover OR. More GREL: yes! GREL is of course the way to transform the data. I wonder if GREL should be introduced with simpler examples than Overall, I would like to check in with the current CAC for their views and suggestions. I suggested several potential improvements for this lesson in #102, #122, #108 that would require or benefit from CAC input. They influence how much time opens up for other learning objectives. |
I posted a link to this issue in the lesson's Slack channel, with questions that are discussed here. @ostephens responded as follows (copied with permission): How many rows should the dataset have?
What kind of GREL expressions should be added to the lesson?
Is reconciliation useful for this dataset?
Should we discuss extensions and alternative distributions of OpenRefine?
|
@datacarpentry/curriculum-advisors-social-science Your input would be very welcome. |
One comment from teaching this recently with the list of items column - the lesson uses GREL to facet by subsets of the column but doesn't demonstrate how to change that column to something more usable (such as dummy variables for each category of item once they're cleaned). As a bonus, parsing it to columns also highlights for learners the difference between cell transforms, multi-valued cell splits, and column splits. All of that said, adding more GREL is also tricky when learners don't have programming experience because chaining functions can rapidly become confusing to novice coders. |
Thanks for your responses, @ndporter and @eirini-zormpa! I look forward to the results of your discussion. As to your comment, @ndporter: the idea of using OpenRefine to create dummy variables from the items column had not yet crossed my mind. I like it. After trying and going through the manual and StackOverflow for a little bit, I think it is doable, but not in this workshop. It requires exporting the ID and items columns, doing the transformation in a new project and then importing the new columns ( |
As a Maintainer, I would like to be able to close this issue after five years. It has been open for so long, because it is a collection of suggestions. Some suggestions can be worked on, but others are probably out of scope for the lesson. To allow for more targeted discussion and decisions, as well as progress on incorporating them, I updated #108 to also track the expansion of the data set with more rows and I created separate issues for the other suggestions.
I will copy relevant comments to these other topics, so we can continue the discussions and close this issue. |
The Social Sciences CAC ([email protected]) met June 15th and 19th to discuss the full Social Sciences curriculum and provide recommendations to the Maintainers about work for these lessons between now and their publication (September 2018). Their specific action items for this lesson are as follows:
Please see the meeting minutes for more details.
The text was updated successfully, but these errors were encountered: