Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for Indonesian Poem Tweets #214

Open
SamuelCahyawijaya opened this issue Aug 16, 2022 · 8 comments
Open

Create dataset loader for Indonesian Poem Tweets #214

SamuelCahyawijaya opened this issue Aug 16, 2022 · 8 comments
Assignees

Comments

@SamuelCahyawijaya
Copy link
Member

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_poem_tweets

Dataset id_poem_tweets
Description Indonesian Poem tweets is dataset crawled from Twitter. The purpose of this data is to create text generation model for short text and make sure they are all coherence and rhythmic
License CC-BY 4.0
@aliakbars
Copy link
Contributor

#self-assign

@bryanwilie
Copy link
Collaborator

Hi @aliakbars , are you still working on this? I will assume inactivity if there's no reply and will free the assignees. Thanks!

@aliakbars
Copy link
Contributor

Hi, @bryanwilie Yes. Working on this. I'll create the PR asap. Sorry for the delay.

@bryanwilie
Copy link
Collaborator

No worries @aliakbars, please take your time. Thank you for contributing!

@aliakbars
Copy link
Contributor

Just did some exploratory data analysis. I found that the tweets are only from 6 users (might be a retweet). Also, it's not filtered yet. Some of the tweets are replies, e.g.

"RT <screen_name>: Siap-siap"

or an image/video, e.g.

"RT <screen_name>: https://t.co/Z6Ls07s1bn"

Should we proceed with this?

It does have local languages, e.g. Sundanese, though.

@aliakbars
Copy link
Contributor

@bryanwilie What do you think about this issue?

@aliakbars
Copy link
Contributor

@SamuelCahyawijaya
Copy link
Member Author

SamuelCahyawijaya commented Oct 27, 2022

Hi @aliakbars : thank you for the update and I apologize for the late reply.
Later on, we plan to label the quality for all the datasets in NusaCatalogue, so we can push this one through first for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants