-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Binary Operations
Elementwise binary operators.
ElementTimes (x, y)
x .* y
Minus (x, y)
x - y
Plus (x, y)
x + y
LogPlus (x, y)
BS.Boolean.And (x, y)
BS.Boolean.Or (x, y)
BS.Boolean.Xor (x, y)
-
x
: left input -
y
: right input
The dimensions of x
and y
must match (subject to broadcasting rules, see below).
For the three Boolean
operations, both inputs are expected to be either 0 or 1, otherwise the behavior of
the functions is unspecified, and will in fact change in future versions.
These functions return the result of the corresponding operations. The three Boolean
operations return values that
are either 0 or 1.
The output dimension or tensor shape is identical to those of the inputs, subject to broadcasting, see below.
These are the common binary operators.
They are applied elementwise.
(Note that BrainScript's *
operator is not elementwise, but stands for the matrix product. This is different, for example, from Python's numpy
library.)
The dimensions of the inputs must be identical, with the exception of broadcasting.
Broadcasting, a concept that CNTK models after Python's numpy
library,
means that a dimension in one of the inputs can be 1 where the other input's is not.
In that case, the input with the 1-dimension will be copied n
times, where n
is the
corresponding other input's dimension.
If the tensor ranks do not match, the tensor shape of the input with less dimensions will be
assumed to be 1, and trigger broadcasting.
For example, adding a [13 x 1]
tensor to a [1 x 42]
vector would yield a [13 x 42]
vector
that contains the sums of all combinations.
The LogPlus()
operation computes the sum of values represented in logarithmic form.
I.e., it computes:
LogPlus (x, y) = Log (Exp (x) + Exp (y))
where x
and y
are logarithms of values.
This operation is useful when dealing with probabilities,
which are often so small that only a logarithmic representation
allows for appropriate numeric accuracy.
Elementwise operations can currently not be applied to sparse vectors.
The standard sigmoid layer uses the elementwise binary +
:
z = Sigmoid (W * x + b)
Note that *
above is not elementwise, but stands for the matrix product.
Another example is that the Softmax()
function can be written using broadcasting Minus
:
Softmax (z) = Exp (z - ReduceLogSum (z))
Here, ReduceLogSum()
reduces the vector z
to a scalar by computing its logarithmic sum. Through broadcasting semantics
of subtraction, this scalar is then subtracted from every input value.
This implements the division by the sum over all values in the Softmax function.