Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ELMo modules #299

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add ELMo modules #299

wants to merge 2 commits into from

Conversation

gpengzhi
Copy link
Collaborator

Add texar-styled ELMo encoder adapted from allennlp. The corresponding tokenizer will be in another PR.

Resolve some comments in #298

I checked the implementation of ELMo in allennlp, It seems that they used customized LSTM such that we cannot use our LSTM module to implement it directly. And the Highway module they used is different from our HighwayWrapper. I feel that it is better to directly use their implementations, and the correctness of the implementation is guaranteed by their unit tests. Please let me know your thought @huzecong

@gpengzhi gpengzhi requested a review from huzecong February 24, 2020 21:07
@codecov
Copy link

codecov bot commented Feb 24, 2020

Codecov Report

Merging #299 into master will increase coverage by 0.29%.
The diff coverage is 86.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #299      +/-   ##
==========================================
+ Coverage   82.63%   82.93%   +0.29%     
==========================================
  Files         207      215       +8     
  Lines       16006    17271    +1265     
==========================================
+ Hits        13226    14323    +1097     
- Misses       2780     2948     +168
Impacted Files Coverage Δ
texar/torch/utils/utils_test.py 99.19% <100%> (+0.25%) ⬆️
texar/torch/utils/test.py 93.75% <100%> (+0.41%) ⬆️
texar/torch/modules/pretrained/__init__.py 100% <100%> (ø) ⬆️
texar/torch/modules/encoders/__init__.py 100% <100%> (ø) ⬆️
texar/torch/modules/pretrained/elmo_test.py 47.05% <47.05%> (ø)
texar/torch/modules/pretrained/elmo.py 53.84% <53.84%> (ø)
texar/torch/modules/encoders/elmo_encoder_test.py 73.33% <73.33%> (ø)
texar/torch/modules/encoders/elmo_encoder.py 74.69% <74.69%> (ø)
texar/torch/utils/utils.py 80.1% <80.64%> (+0.04%) ⬆️
texar/torch/modules/pretrained/elmo_utils.py 84.77% <84.77%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 931ead9...1ad8c7d. Read the comment docs.

@huzecong
Copy link
Collaborator

I agree that their implementations are a bit different and it's more reliable to use their code as is, but I don't think I would approve of merging into master at this point.

True, if I were to choose between installing allennlp along with a plethora of dependencies, and installing this version of texar, I would choose texar. However, ELMo is not the only module worth using in our package. To introduce 3k lines of code and an additional dependency is a bit too much for just one functionality. I still believe it is possible to somehow rewrite the module to use more existing parts in our package, but I agree that it would mean a lot of work and might not be worth it at this point.

Speaking of too much code, although not related to this PR, I think it's better to move the *_test.py files out of the texar folder, and move them to a separate tests/ folder. This way the end user doesn't have to install the test files when they do pip install texar-pytorch. @ZhitingHu What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants