Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformer model based on Tensorlayer #1027

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Conversation

ArnoldLIULJ
Copy link
Member

Checklist

  • I've tested that my changes are compatible with the latest version of Tensorflow.
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary.

Motivation and Context

Description

@ArnoldLIULJ
Copy link
Member Author

Documentations haven't been done

import tensorlayer as tl


class MultiHeadAttentionLayer(tl.layers.Layer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiHeadAttention is better than MultiHeadAttentionLayer?

K = tf.keras.backend


class LazyAdam(tf.keras.optimizers.Adam):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this part to tl.optimizer ??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OKAY.
Also, should I modify the "copyright" in the top? I reference most of the codes from TensorFlow officials.

@JingqingZ
Copy link
Member

The dependency on Keras should be removed gradually.

tf.keras.optimizers -> tf.optimiziers
tf.keras.initializers -> tl.initializers

@zsdonghao
Copy link
Member

ready to merge?

@ArnoldLIULJ
Copy link
Member Author

Add attention visualisation util
Add attention visualisation test

@ArnoldLIULJ
Copy link
Member Author

ready to merge?

not yet

@ArnoldLIULJ
Copy link
Member Author

Add documentation
Add attention-weights visualisation and pass unit-testing
READY TO MERGE

@zsdonghao
Copy link
Member

Hi, could you provide an example code in the examples folder? and update changelog.md ? thanks

"""The :class:`MultiHeadAttentionLayer` layer is for multi-head attention computation.
The weight computation is between "key" and "query", which will then matmul with "value" to generate information
that selectively focuses on the "query" messages.
Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

):
"""Search for sequence of subtoken ids with the largest probability.

Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect RST format, should be Parameters with underline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@luomai
Copy link
Member

luomai commented Sep 2, 2019

There are some naming issues with this PR. Please don't merge it for now.

K = tf.keras.backend


class LazyAdam(tf.optimizers.Adam):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LazyAdamOptimizer

@@ -0,0 +1,147 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the file to lazy_adam.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.verbose = verbose
if init_steps is None:
init_steps = 0.0
self.steps = float(init_steps) # Total steps during training.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why steps is a float not an int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will get rid of this part. It has been updated in tf.keras library. thanks

import tensorlayer as tl


class FeedForwardLayer(tl.layers.Layer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this FeedForwardLayer different from conventioinal feedforward layer? If this is a special implementation, can it have a more specific name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FeedforwardLayer here is specifically designed in Transformer model, which includes two transformations.
The conventional feedforward layer only contains one.
Any suggestion on how the naming?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransformerFeedFordwardLayer?


import numpy as np
import tensorflow as tf
K = tf.keras.backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this has to be a global variable of tensorlayer?

import tensorflow as tf
import tensorlayer.models.transformer.beamsearchHelper.beam_search_v1 as v1

_StateKeys = v1._StateKeys # pylint: disable=protected-access
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid global variables in the library as much as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to optimise this one.

@luomai
Copy link
Member

luomai commented Sep 12, 2019

@ArnoldLIULJ any update?

@ArnoldLIULJ
Copy link
Member Author

@ArnoldLIULJ any update?

was on vocation and would be working on a simplified tutorial today

@ArnoldLIULJ
Copy link
Member Author

Add examples in example/translation_task/tutorial_transformer

@ArnoldLIULJ
Copy link
Member Author

Hi, could you provide an example code in the examples folder? and update changelog.md ? thanks

done

@zsdonghao
Copy link
Member

Hi the RST format is not correct in many function, please check~

Lingjun Liu added 2 commits September 14, 2019 11:18
@ArnoldLIULJ
Copy link
Member Author

please check

@zsdonghao
Copy link
Member

I think this one can be merged after the travis pass~.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants