Entmax

A neural network module implementing the Entmax15 activation function with α=1.5.

This activation function is based on the paper “Sparse Transformers: Sparsity-preserving activations” (https://arxiv.org/abs/1905.05702). It provides a sparse probability distribution over inputs, making it suitable for attention mechanisms and tasks requiring sparsity.

activations_plus.Entmax.__init__(self, dim: int = -1) → None: Entmax15 activation with α=1.5 from https://arxiv.org/abs/1905.05702.

Parameters.

dim: The dimension to apply the activation.

activations_plus.Entmax.forward(self, x: Tensor) → Tensor

Apply the Entmax15 function along a specified dimension.

Entmax15 is a smooth variation of softmax that includes the capability to sparsify the output. It is commonly used in machine learning tasks such as natural language processing where sparse, non-negative distributions are desired.

Parameters:: x – The input tensor on which the Entmax15 function will be applied.
Returns:: The tensor obtained after applying the Entmax15 transformation to the input tensor. The output tensor has the same shape as the input but may exhibit sparse behavior depending on the input values.

Reference Paper: Entmax Activation Function

Mathematical Explanation:

The Entmax activation function is defined as:

\[\text{Entmax}_\alpha(z) = \underset{p \in \Delta^{d-1}}{\operatorname{argmax}} \left( p \cdot z - \frac{1}{\alpha(\alpha-1)} \sum_{i=1}^d p_i^\alpha \right)\]

where \(\alpha\) controls the sparsity of the output.

Example Usage:

import torch
from activations_plus.entmax import Entmax

activation = Entmax(dim=-1)
x = torch.tensor([[1.0, 2.0, 3.0], [0.5, 0.5, 0.5]])
y = activation(x)
print("Entmax Output:", y)

Entmax

Parameters.