What is Softmax Function in AI Training
Softmax is an activation function, typically placed as the final layer in a deep learning model. Its primary purpose is to convert a vector of numbers, often referred to as logits, into a probability distribution. The numbers in this vector represent the model's raw predictions for each class in a classification task. Softmax ensures that these numbers sum up to one, thereby converting them into probabilities.
The Softmax Formula
For a given logit (or score) $L_i$ from a vector of logits $L$, the Softmax function is mathematically expressed as:
$$P_i = \frac{e^{L_i}}{\sum_{j}e^{L_j}}$$
Here’s what each component represents:

$P_i$: This is the probability of the $i$th class. After applying Softmax, $P_i$ indicates how likely it is that the input belongs to class $i$.

$e^{L_i}$: This represents the exponential of the $i$th logit. The exponential function (denoted as $e^x$) is used for transforming each logit into a positive number. The reason for using the exponential function is twofold:

Nonnegative Values: The exponential function ensures that all outputs are nonnegative. Since probabilities cannot be negative, this property is crucial.

Amplifying Differences: The exponential function exaggerates the differences between the logits. Larger logits result in much larger exponentials compared to smaller logits, which helps in making the probabilities more distinct.


$\sum_{j}e^{L_j}$: This is the sum of the exponentials of all logits in the vector. It acts as a normalizing factor, ensuring that the probabilities sum up to 1. By dividing the exponential of a given logit by this sum, Softmax converts the logit scores into probabilities.
Example: Fruit Classification with CNN and Softmax Calculation
Let's look at one simple example of Softmax Calculation. In this scenario, a Convolutional Neural Network (CNN) is tasked with classifying images into fruit categories: apple, orange, banana, and avocado. We will use the Softmax function to turn the output logits from the CNN into probabilities.
Given Data
Consider the CNN outputs the following logits for an input image:
Fruit  Logit 

Apple  1.5 
Orange  2.2 
Banana  0.3 
Avocado  0.8 
These logits represent the network's raw scores for each fruit category based on the input image.
Applying Softmax
The Softmax formula is:
$$P_i = \frac{e^{L_i}}{\sum_{j}e^{L_j}}$$
where $L_i$ is the logit for the $i$th fruit, and $\sum_{j}e^{L_j}$ is the sum of the exponentials of all logits.
StepbyStep Calculation

Calculate the Exponential of Each Logit:
Fruit Logit Exponential Apple 1.5 $e^{1.5} \approx 4.48$ Orange 2.2 $e^{2.2} \approx 9.03$ Banana 0.3 $e^{0.3} \approx 0.74$ Avocado 0.8 $e^{0.8} \approx 2.23$ 
Sum the Exponentials:
$Sum = 4.48 + 9.03 + 0.74 + 2.23 \approx 16.48$

Divide Each Exponential by the Sum to Get Probabilities:
$P_{apple} = \frac{4.48}{16.48} \approx 0.27$
$P_{orange} = \frac{9.03}{16.48} \approx 0.55$
$P_{banana} = \frac{0.74}{16.48} \approx 0.04$
$P_{avocado} = \frac{2.23}{16.48} \approx 0.14$
Interpretation of Results
After applying the Softmax function, we get the following probability distribution:
 The probability that the image is an apple is approximately 27%.
 The probability that it's an orange is about 55%.
 The probability for a banana is around 4%.
 The probability for an avocado is approximately 14%.
These probabilities suggest that the CNN model is most confident that the image is of an orange, with a significant likelihood also for an apple, and lower probabilities for banana and avocado.
Why Use Softmax?

Probability Distribution: In classification tasks, interpreting the model's predictions as probabilities is incredibly useful. It provides a clear understanding of the model's confidence across different classes.

Differentiable Function: Softmax is continuous and differentiable. This property is essential for backpropagation in neural networks, where gradients are used to update the model's weights.

Handling Multiple Classes: Softmax is particularly suited for multiclass classification problems, as it provides a distinct probability for each class.
Application in Deep Learning Models
Softmax is predominantly used in the final layer of neural networks for classification tasks. It takes the logits, which are the outputs of the previous layers, and transforms them into probabilities. These logits are generally real numbers that can be positive, negative, or zero, and do not inherently sum to one.