Silicon Photonics for Machine Learning: Training and Inference

B. J. Shastri(1)(2), M. J. Filipovich(1), Z. Guo(1), P. R. Prucnal(2), C. Huang(2), A. N. Tait(1), S. Shekhar(3), and V. J. Sorger(4)

(1) Department of Physics, Engineering Physics & Astronomy, Queen’s University, Kingston, ON K7L 3N6, Canada, shastri@ieee.org
(2) Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA
(3) Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
(4) Department of Electrical and Computer Engineering, George Washington University, Washington, DC V6T 1Z4, USA

Abstract Photonics neural networks employ optical device physics for neuron models, and optical interconnects for distributed, parallel, and analog processing for high-bandwidth, low-latency, and low-switching energy applications in AI and neuromorphic computing. We discuss silicon photonics for machine learning acceleration for inference and in situ training.

Advancements in machine learning (ML) and artificial intelligence (AI) technologies have enabled numerous applications, including sophisticated recommendation models, natural language processing, computer vision, augmented reality, and so on [1], [2]. The heavy dependence of ML algorithms training on large data sets has enabled the groundbreaking progress of these AI applications in different fields. The interconnection of neurons in artificial neural networks (ANNs) can be described by a matrix, with the processed data represented as a vector. Training on large data sets with deep neural networks results in large-scale dense matrix-vector multiplications. The improvement in the performance (i.e., accuracy) of many ML applications comes at the cost of higher computational power requirements [3]. There has been significant progress in the development of digital electronic application-specific integrated circuits (ASICs) known as AI accelerators that are dedicated to dense matrix computations [4], [5]. However, modern AI accelerators have seen two significant bottlenecks in energy efficiency: data transfer to and from memory and large matrix-vector multiplications. Both have imposed strict physical limitations on the scalability and performance of digital electronic AI accelerators.

Integrated photonic processors enabled by silicon photonics have shown promising capabilities in accelerating tensor (i.e., multidimensional vector and matrix) operations [6]–[9] by exploiting the high bandwidth of photonic devices (modulators and photodetectors), low latency, and minimal energy-delay product due to passive optical waveguides [10]. Some of these processors [7]–[9] are scalable and use the parallel nature of light through wavelength-division multiplexing (WDM) to achieve large-scale interconnects and massively parallel data processing and transfer. Recent developments have shown that the wavelength-multiplexed silicon photonic platforms operate with up to 7-bit precision [11] and, most recently, 9-bit precision [12] on each multiplication unit. However, recent studies in these photonic processors have also seen an increasing demand for a rigorous photonic programming scheme to facilitate efficient communication between photonic hardware and its control system [6], [7], [10], [13].

Over the ten years, several photonic neural networks [10]–[14] approaches have been proposed. This can be divided into feedforward and recurrent (including random recurrent, i.e., reservoir computing [15]–[17]), or coherent (single wavelength) [6], [18] and multiwavelength [7], [9], [19]–[22] approaches, or continuous-time networks and spiking networks, or integrated approaches and free-space. In this talk, we will briefly highlight some of these.

An area of machine learning that would benefit from the low power consumption and high information processing bandwidth enabled by photonics is the training of large neural networks. Several photonic architectures have been proposed for executing in-memory computation of neural network inference [6], [7], [19]. However, for the neural network to perform a practical task, the optimal network parameters (weights and biases) must first be determined using deep learning training algorithms. These algorithms have high computation and memory costs that challenge the current hardware platforms executing them [23]. The substantial energy required to train large neural networks using standard von Neumann architectures presents a high financial and environmental cost...
The recently proposed direct feedback alignment (DFA) supervised learning algorithm [25] has gained interest as a bio-plausible alternative to the popular backpropagation training algorithm [26]. The DFA algorithm is a supervised learning algorithm that propagates the error through fixed random feedback connections directly from the output layer to the hidden layers during the backward pass [26]. Unlike backpropagation, the DFA algorithm does not require the network layers to be updated sequentially during the backward pass, enabling the algorithm to be a suitable candidate for efficient parallelization using photonics. The training algorithm has been used to train neural networks using the MNIST, CIFAR-10, and CIFAR-100 datasets and yields comparable performance to backpropagation [26]. The DFA algorithm has also been shown to obtain performances comparable to fine-tuned backpropagation in applications requiring state-of-the-art deep learning networks, including natural language processing and neural view synthesis [27]. A recent theory suggests that training shallow networks with the DFA algorithm occurs in two steps: the first step is an alignment phase where the weights are modified to align the approximate gradient with the actual gradient of the loss function, which is followed by a memorization phase where the model focuses on fitting the data [28].

This talk will summarize our recently proposed silicon photonic architecture that uses an electro-optic circuit to calculate the gradient vector of each neural network layer in situ, the most computationally expensive operation performed during the backward pass. The proposed architecture exploits the speed (10s of GHz range in photonics but only 100s of MHz in electronics) and energy advantages of photonics to determine the gradient vector of each neural network layer in a single operational cycle.

The renaissance of neuromorphic photonics is enabled by the confluence of three areas (Fig. 1): technological advances in integrated photonics due to silicon photonics, algorithmic advances in machine learning algorithms, and advances in analog photonic signal processing. In the recent roadmap articles [10], [29], [30], we outlined some scientific and technological advances necessary to meet the challenges of envisioning a practical neuromorphic processor.

References


Fig. 1: The advent of neuromorphic photonics is due to the convergence of recent advances in photonic integration technology, the resurgence of scalable computing models (e.g., spiking, deep neural networks), and a large-scale silicon industrial ecosystem.


