ML Talk: Yash Patel

20th January 2022, (Thursday) from 17:00

“Training Neural Networks on Non-Differentiable Losses”

Yash Patel (Visual Recognition Group, FEE CTU in Prague; Amazon Research Award)

Many important computer vision and natural language processing tasks have a non-differentiable objective. Yet the standard, dominant, training procedure of a neural network is not applicable since back-propagation requires the gradients of the loss with respect to the model’s output. Most deep learning methods side-step the problem, sub-optimally, by using simple proxy loss functions that may or may not align well with the original non-differentiable evaluation metric. For a novel task, an appropriate proxy has to be designed, which may not be feasible for a non-specialist. The presented research aims to optimize neural networks on the evaluation metric. For decomposable function this is achieved via a learned surrogate, without requiring gradients. The learned surrogate is realized by a deep embedding where the euclidean distance between the model output and ground truth corresponds to the value of the evaluation metric. For non-decomposable evaluation metrics, that involve a step-function, Yash et al. propose a non-parametric procedure to approximate the evaluation metric. The presented results show that training a neural network on surrogate of the evaluation metric leads to better performance compared with training on a proxy loss function. Further, such an optimization technique makes the training of neural networks more scalable — to new tasks in a nearly labour free manner.

Location: This talk is organized remotely due to Covid situation. []

Link to join:

More Machine Learning Talks @ Rossum