Sparsemax

Sparsemax class implements a transformation function from the paper.

“From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification” (https://arxiv.org/pdf/1602.02068.pdf). This function is used as an activation function that is similar to softmax but can produce sparse output, where some of the entries are exactly zero.

This class is designed to handle the computation over a specified dimension, and it can be used as a module in neural network architectures.

activations_plus.Sparsemax.__init__(self, dim: int = -1) → None

Initialize the Sparsemax activation function.

Parameters:: dim (int, optional) – The dimension along which to apply the Sparsemax operation. Defaults to -1, indicating the last dimension.

activations_plus.Sparsemax.forward(self, x: Tensor) → Tensor

Apply the sparsemax function along the specified dimension.

Sparsemax is a neural network activation function that maps input logits to probabilities, similar to softmax. Unlike softmax, it can lead to sparse probability distributions where some probabilities are exactly zero.

Parameters:: x – The input tensor to which the sparsemax function will be applied.
Returns:: The tensor after applying the sparsemax operation along the specified dimension.

Reference Paper: Sparsemax Activation Function

Mathematical Explanation:

The Sparsemax activation function maps inputs to a probability distribution, similar to softmax, but encourages sparsity by projecting onto a simplex.

\[\text{Sparsemax}(z) = \underset{p \in \Delta^{d-1}}{\operatorname{argmin}} \|p - z\|^2\]

where \(\Delta^{d-1}\) is the \((d-1)\)-dimensional probability simplex.

Example Usage:

import torch
from activations_plus.sparsemax import SparsemaxFunction
input_tensor = torch.tensor([[0.5, 2.0, 1.0], [1.0, 0.0, 3.0]])
sparsemax = SparsemaxFunction.apply
output_tensor = sparsemax(input_tensor, 1)  # Pass the dimension as a positional argument
print(output_tensor)  # Example output: tensor([[0.0000, 0.6667, 0.3333], [0.0000, 0.0000, 1.0000]])