-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Binary Operations
Elementwise binary operators.
ElementTimes (x, y)
x .* y
Minus (x, y)
x - y
Plus (x, y)
x + y
LogPlus (x, y)
Less (x, y)
Equal (x, y)
Greater (x, y)
GreaterEqual (x, y)
NotEqual (x, y)
LessEqual (x, y)
BS.Boolean.And (x, y)
BS.Boolean.Or (x, y)
BS.Boolean.Xor (x, y)
-
x
: left input -
y
: right input
The dimensions of x
and y
must match (subject to broadcasting rules, see below).
For the three Boolean
operations, both inputs are expected to be either 0 or 1, otherwise the behavior of
the functions is unspecified, and will in fact change in future versions.
Sparse values are currently not supported.
These functions return the result of the corresponding operations. The relation operators (Equal()
etc.)
and the three Boolean
operations return values that
are either 0 or 1.
The output dimension or tensor shape is identical to those of the inputs, subject to broadcasting, see below.
These are the common binary operators.
They are applied elementwise.
(Note that BrainScript's *
operator is not elementwise, but stands for the matrix product. This is different, for example, from Python's numpy
library.)
The dimensions of the inputs must be identical, with the exception of broadcasting.
Broadcasting, a concept that CNTK models after Python's numpy
library,
means that a dimension in one of the inputs can be 1 where the other input's is not.
In that case, the input with the 1-dimension will be copied n
times, where n
is the
corresponding other input's dimension.
If the tensor ranks do not match, the tensor shape of the input with less dimensions will be
assumed to be 1, and trigger broadcasting.
For example, adding a [13 x 1]
tensor to a [1 x 42]
vector would yield a [13 x 42]
vector
that contains the sums of all combinations.
The relation operators (Equal()
etc.) are not differentiable, their gradient is always considered 0.
They can be used for flags, e.g. as a condition argument in the If()
operation.
The LogPlus()
operation computes the sum of values represented in logarithmic form.
I.e., it computes:
LogPlus (x, y) = Log (Exp (x) + Exp (y))
where x
and y
are logarithms of values.
This operation is useful when dealing with probabilities,
which are often so small that only a logarithmic representation
allows for appropriate numeric accuracy.
Note: Another common name for this operation is log-add-exp, e.g. SciPy.
This layer uses the elementwise binary +
:
z = Sigmoid (W * x + b)
Note that *
above is not elementwise, but stands for the matrix product.
The Softmax()
activation function can be written using broadcasting Minus
:
MySoftmax (z) = Exp (z - ReduceLogSum (z))
Here, ReduceLogSum()
reduces the vector z
to a scalar by computing its logarithmic sum. Through broadcasting semantics
of subtraction, this scalar is then subtracted from every input value.
This implements the division by the sum over all values in the Softmax function.
The elementwise maximum of two inputs can be computed as a combination of Greater()
and If()
:
MyElementwiseMax (a, b) = If (Greater (a, b), a, b)
This also works with broadcasting. For example, the linear rectifier can be written with this using a scalar constant as the second input:
MyReLU (x) = MyElementwiseMax (x, Constant(0))