Binary Operations

Elementwise binary operators.

ElementTimes (x, y)
x .* y
Minus (x, y)
x - y
Plus (x, y)
x + y
LogPlus (x, y)
BS.Boolean.And (x, y)
BS.Boolean.Or (x, y)
BS.Boolean.Xor (x, y)

Parameters

x: left input
y: right input

The dimensions of x and y must match (subject to broadcasting rules, see below).

For the three Boolean operations, both inputs are expected to be either 0 or 1, otherwise the behavior of the functions is unspecified, and will in fact change in future versions.

Return value

These functions return the result of the corresponding operations. The three Boolean operations return values that are either 0 or 1.

The output dimension or tensor shape is identical to those of the inputs, subject to broadcasting, see below.

Descriptions

These are the common binary operators. They are applied elementwise. (Note that BrainScript's * operator is not elementwise, but stands for the matrix product. This is different, for example, from Python's numpy library.)

The dimensions of the inputs must be identical, with the exception of broadcasting.

Broadcasting, a concept that CNTK models after Python's numpy library, means that a dimension in one of the inputs can be 1 where the other input's is not. In that case, the input with the 1-dimension will be copied n times, where n is the corresponding other input's dimension. If the tensor ranks do not match, the tensor shape of the input with less dimensions will be assumed to be 1, and trigger broadcasting.

For example, adding a [13 x 1] tensor to a [1 x 42] vector would yield a [13 x 42] vector that contains the sums of all combinations.

LogPlus()

The LogPlus() operation computes the sum of values represented in logarithmic form. I.e., it computes:

LogPlus (x, y) = Log (Exp (x) + Exp (y))

where x and y are logarithms of values. This operation is useful when dealing with probabilities, which are often so small that only a logarithmic representation allows for appropriate numeric accuracy.

Known Issue

Elementwise operations can currently not be applied to sparse vectors.

Example

The standard sigmoid layer uses the elementwise binary +:

z = Sigmoid (W * x + b)

Note that * above is not elementwise, but stands for the matrix product.

Another example is that the Softmax() function can be written using broadcasting Minus:

Softmax (z) = Exp (z - ReduceLogSum (z))

Here, ReduceLogSum() reduces the vector z to a scalar by computing its logarithmic sum. Through broadcasting semantics of subtraction, this scalar is then subtracted from every input value. This implements the division by the sum over all values in the Softmax function.

New Documentation Site

Iteration Plans

Provide feedback

Saved searches

Use saved searches to filter your results more quickly