Quick survey of recent deep learning

3Classification Nonlinear Deep Discriminative 4Unsupervised Generative

Att: the following markdown text was generated from the corresponding powerpoint lecture file automatically. Errors and misformatting, therefore, do exist (a lot!)!

Summary:

This lecture covers 10 deep learning trends that go beyond classic machine learning:

Disclaimer: it is quite hard to make important topics of deep learning fit on a one-session schedule. We aim to make the content reasonably digestible in an introductory manner. We try to focus on a modularity view by introducing important variables to digest deep learning into chunks regarding data/ model architecture /tasks / training workflows and model characteristics. We think this teaching style provides students with context concerning those choices and helps them build a much deeper understanding.

Briefing on Recent Trends in Deep Neural Networks

Executive Summary

This document synthesizes an overview of ten significant trends in Deep Neural Networks (DNNs), based on a lecture from the University of Virginia’s CS 4774 Machine Learning course. The analysis is structured around the “Deep Learning in a Nutshell” framework, which deconstructs the machine learning process into core components: Task, Representation (Topology), Score Function, Search/Optimization, Data, Models/Parameters, Hyperparameters/Metrics, and Hardware.

The key trends indicate that innovation in deep learning is occurring across this entire spectrum. Advancements are not limited to model architecture but also encompass the types of problems being solved, the methods used for training and optimization, and the practical challenges of deployment. The ten identified trends are:

  1. Expanding Input Types: Moving beyond traditional grid-like data (images, sequences) to handle complex structures like graphs, trees, and sets.
  2. Symbolic Reasoning: Employing architectures like Neural Turing Machines (NTMs) for tasks requiring program induction and manipulation of symbolic data.
  3. Deep Generative Models: Creating realistic data for applications like image super-resolution, text-to-image synthesis, and DeepFakes, while also aiding in semi-supervised learning and handling missing data.
  4. Deep Reinforcement Learning (RL): Enabling agents to learn complex behaviors from raw observations by maximizing reward signals, exemplified by AlphaGo’s success.
  5. Meta-Learning and AGI: Shifting focus from narrow, task-specific AI towards “learning to learn” and the pursuit of Artificial General Intelligence (AGI), defined by autonomy and the ability to solve novel problems.
  6. Self-Supervised Pre-training: Using autoencoders and reconstruction loss to pre-train network layers on unlabeled data, improving feature detection and mitigating training issues like vanishing gradients.
  7. Generative Adversarial Networks (GANs): A powerful training workflow for generative tasks, with notable advancements seen in models like CycleGAN, Progressive GAN, and StyleGAN.
  8. Automated Machine Learning (AutoML): Developing methods to automate the optimization process and the search for effective DNN architectures, reducing manual effort.
  9. Robustness and Trustworthiness: A growing focus on understanding, verifying, and securing DNNs against adversarial attacks, as well as developing methods for model interpretation and bias detection.
  10. Hardware Adaptation and Efficiency: Compressing large models for deployment on resource-constrained hardware like IoT and mobile devices, using techniques such as pruning, quantization, and filter decomposition.

The “Deep Learning in a Nutshell” Framework

The lecture organizes deep learning concepts and trends within a modular framework that breaks the field into its constituent parts. This structure provides a holistic view of where innovation is occurring.

Component Description Examples from Source
Task The ultimate goal or problem to be solved. Prediction, Generation, Reinforcement, Reasoning, Classification
Data (X) The input fed into the system. 2D/3D Grids, 1D Grids (sequences), Graphs, Sets, Tabular Data
Representation (f()) The model’s topology or architecture that transforms the data. CNN, RNN, Transformer, GNN, NTM, Multilayer Network
Score Function (L()) A function that measures the model’s performance or error. Cross-Entropy, Mean Squared Error (MSE), Reconstruction Error
Search/Optimization The algorithm used to find the best model parameters by minimizing the score. Stochastic Gradient Descent (SGD), Backpropagation, Asynchronous SGD
Models, Parameters The learnable components of the model. Weights, Biases
Hyperparameter, Metrics Settings that configure the learning process and metrics to evaluate success. Network architecture choices, learning rate / Accuracy, F1 Score
Hardware The physical infrastructure used for training and deployment. GPU, TPU, Edge devices

Module I: Innovations in Representation and Architecture

This module focuses on trends related to the Representation component of the framework, specifically the types of data a DNN can process and the architectures designed for them.

Trend 1: DNNs for Graphs, Trees, and Sets

Deep learning is expanding beyond grid-structured data. New architectures are being developed to handle more complex and irregular data formats:

Trend 2: Symbolic Reasoning and Program Induction

This trend addresses tasks that require reasoning over symbolic inputs and outputs, such as computer programs.

Foundational Architectures

The lecture also recaps foundational architectures that remain central to the field:

Module II: Expanding the Scope of Solvable Tasks

This module explores trends related to the Task component, highlighting new types of problems that deep learning is being applied to.

Trend 3: Deep Generative Models

These models are designed to generate new, realistic data. Their applications and benefits include:

Trend 4: Deep Reinforcement Learning (RL)

Deep RL combines deep neural networks with reinforcement learning principles.

Trend 5: Meta-Learning and the Pursuit of Artificial General Intelligence (AGI)

This trend focuses on “learning to learn” and moving beyond task-specific “Narrow AI.”

Module III: Evolving Training and Optimization Workflows

This module covers trends related to the Search/Optimization and Score Function components, focusing on new ways to train models effectively.

Trend 6: Self-Supervised Learning and Autoencoders

This approach involves training models on unlabeled data by creating a supervisory signal from the data itself.

Trend 7: Generative Adversarial Networks (GANs)

GANs represent a specific training framework for generative models, involving a competition between a generator and a discriminator. Notable advanced GAN architectures cited include:

Trend 8: Automated Machine Learning (AutoML)

AutoML aims to automate aspects of the machine learning pipeline, particularly optimization and architecture design.

Module IV: Addressing Deployment and Real-World Challenges

This final module addresses trends related to the practical deployment of DNNs, including robustness and hardware efficiency.

Trend 9: Trust, Robustness, and Interpretability

As DNNs are deployed in critical systems, ensuring their reliability and understanding their behavior has become paramount.

Trend 10: Hardware-Aware DNNs and Model Compression

This trend is driven by the need to deploy powerful DNNs on resource-constrained hardware, such as the projected 20 billion connected IoT devices by 2020.