Quantum Convolutional Neural Network

Introduction

We will look at how Quantum Convolutional Neural Networks QCNN can be used for deep neural networks, potentially paving the way for groundbreaking advancements in machine learning image recognition. Quantum computing can improve the time complexity of performing larger operations, such as convolution matrix multiplication, therefore, allowing for an increased number or size of convolution kernel, as well as exploring larger or more complex input structures.

To continue, we define and analyze each procedure of the QCNN, such as quantum forward propagation, quantum back propagation, and more, and see how quantum computing techniques can be applied to improve the operation. Let's begin!

Quantum Forward Propagation

Section Reference: Quantum Algorithms for Deep Convolutional Neural Networks by: Iordanis Kerenidis, Jonas Landman, Anupam Prakash. CNRS, IRIF, Universite Paris Diderot, Paris, France (Dated: November 5, 2019) \(^{[1]}\)

Quantum Convolutional Layer

The input for the quantum convolution layer is a \(3\)D tensor input \(X^{\ell} \in \mathbb{R}^{H^{\ell} \times W^{\ell} \times D^{\ell}}\) and the weights layer, filter layer, or kernel layer is a \(4\)D tensor \(K^{\ell} \in \mathbb{R}^{H \times W \times D^{\ell} \times D^{\ell+1}}\), where the input and kernel layer are both stored in QRAM. Given precision parameters \(\epsilon\), \(\Delta>0\), there exists a quantum algorithm that computes a quantum state that is \(\Delta\) close to \(|f(\bar{X}^{\ell+1})\rangle\) where \(X^{\ell+1}=X^{\ell} * K^{\ell}\), and \(f: \mathbb{R} \mapsto[0, C]\) is a non-linear activation function. Note, this function could be a sigmoidal, hyperbolic tangent, ReLU, softmax, or any other activation function and is rather dependent mainly on the task the activation function needs to complete. To continue, a classical appoximation for the computer quantum state exists where:

\(\left\|f\big(\bar{X}^{\ell+1}\big)-f\big(X^{\ell+1}\big)\right\|_{\infty} \leq \epsilon\)

The time complexity of the quantum algorithm that computes the output quantum state \(|f(\bar{X}^{\ell+1})\rangle\) is \(\widetilde{O}(M / \epsilon)\). In this equation, \(M\) denotes the maximum norm of the product state between one of the kernels and one of the tensor input regions in \(X^{\ell}\), where the size is \(HW D^{\ell}\). It is important to note, that \(\widetilde{O}\) is made to hide factors poly-logarithmic in \(\Delta\) with respect to the size of \(X^{\ell}\) and \(K^{\ell}\).

Given that a convolution product is equivalant to a matrix-matrix multiplication and the convolutional product between \(X^{\ell}\) and \(K^{\ell}\) is:

\(X_{i^{\ell+1}, j^{\ell+1}, d^{\ell+1}}^{\ell+1}=\) \(\sum_{i=0}^H \sum_{j=0}^W \sum_{d=0}^{D^{\ell}} K_{i, j, d, d^{\ell+1}}^{\ell} X_{i^{\ell+1}+i, j^{\ell+1}+j, d}^{\ell}\)

it is possible to reformulate this convolution product equation as a matrix product. A diagram of this reshaping of the input and kernel is given by the authors and depicited below:

Quantum Convolution Matrix Reshaping

Image Source: \([1]\)

To express this reformulation mathematically, we must do the following. First, the original \(3\)D tensor input input for the quantum convolution layer \(X^{\ell} \in \mathbb{R}^{H^{\ell} \times W^{\ell} \times D^{\ell}}\) and the \(4\)D tensor kernel layer \(K^{\ell} \in \mathbb{R}^{H \times W \times D^{\ell} \times D^{\ell+1}}\) needs to be reshaped to matrices. First, \(X^{\ell}\) needs to be expanded into a matrix \(A^{\ell} \in\) \(\mathbb{R}^{\left(H^{\ell+1} W^{\ell+1}\right) \times\left(H W D^{\ell}\right)}\), such that each row of \(A^{\ell}\) is a vectorized version of a subregion of \(X^{\ell}\). Next, the original kernel tensor \(K^{\ell}\) is reshaped into a matrix \(F^{\ell} \in\) \(\mathbb{R}^{\left(H W D^{\ell}\right) \times D^{\ell+1}}\), such that each column of \(F^{\ell}\) is a vectrozied version of one of the \(D^{\ell+1}\) kernels.

This matrix expression of the tensor input and kernel is needed for the convolution product \(X^{\ell} * K^{\ell}=X^{\ell+1}\) to be written as a matrix multiplication, such that each column of the output matrix \(Y^{\ell+1} \in \mathbb{R}^{\left(H^{\ell+1} W^{\ell+1}\right) \times D^{\ell+1}}\) is a first vectorized form of one of the \(D^{\ell+1}\) channels of \(X^{\ell+1}\). Later, we will dicuss how quantum computing, specifically quantum parallelism can not only be applied, but improve the time complexity of this step.

Lastly, quantum states proportional to the rows of input \(A^{\ell}\) and \(F^{\ell}\) are used, denoted \(|A_p^{\ell}\rangle\) and \(|F_q^{\ell}\rangle\) respectively. These quantum states are defined as:

\( \left|A_p^{\ell}\right\rangle=\) \(\frac{1}{\left\|A_p^{\ell}\right\|} \sum_{r=0}^{H W D^{\ell}-1} A_{p r}^{\ell}|r\rangle\)

\(\left|F_q^{\ell}\right\rangle=\) \(\frac{1}{\left\|F_q^{\ell}\right\|} \sum_{s=0}^{D^{\ell+1}-1} F_{s q}^{\ell}|s\rangle\)

and will continue to be used throughout this section.

Quantum Convolution

The authors describe procedure for the quantum convolutional in the following four sequential steps: inner product estimation, non linearity, quantum sampling, and then quantum random access memory update and pooling.

Inner Product Estimation

First we load input row vector \(A_p^{\ell}\) and kernel vector \(F_q^{\ell}\) into quantum states by quering QRAM in the following manner:

\(\left\{\begin{aligned} |p\rangle|0\rangle & \mapsto|p\rangle\left|A_p^{\ell}\right\rangle \\ |q\rangle|0\rangle & \mapsto|q\rangle\left|F_q^{\ell}\right\rangle \end{aligned}\right.\)

This is done so that following mapping can be perfromed with the two vectors:

\(\frac{1}{K} \sum_{p, q}|p\rangle|q\rangle \mapsto \) \(\frac{1}{K} \sum_{p, q}|p\rangle|q\rangle|\bar{P}_{p q}\rangle\left|g_{p q}\right\rangle\)

Here \(\bar{P}_{p q}\) is the inner product estimation of \(A_p^{\ell}\) and \(F_q^{\ell}\). The "true" value of the inner product is calculated as \(P_{p q}=\frac{1+\left\langle A_p^{\ell} \mid F_q^{\ell}\right\rangle}{2}\), such that the inner product estimation differs by a chosen constant \(\epsilon\). Here, the normalization factor is \(K=\sqrt{H^{\ell+1} W^{\ell+1} D^{\ell+1}}\) and \(|g_{pq}\rangle\) is some garbage state.

Non Linearity

Our obtained approximated convolution output \(\bar{P}_{p q}\) or \(\bar{Y}^{\ell+1}\) differs from the "true" convolution output \(Y_{p, q}^{\ell+1}=\left(A_p^{\ell}, F_q^{\ell}\right)\) by \(\epsilon\). Now, we apply a non-linear function \(f\) as a boolean circut giving the quantum state:

\(\frac{1}{K} \sum_{p, q}|p\rangle|q\rangle|f(\bar{Y}_{p,q}^{\ell+1})\rangle\left|g_{p q}\right\rangle\)

Quantum Sampling

Conditional Rotation and Amplitude Amplification

To procure the state below, states are conditionally rotated and the probabilistic amplitudes are amplified, such that we arrive at the state:

\(\frac{1}{K} \sum_{p, q} \alpha_{p q}^{\prime}|p\rangle|q\rangle|f(\bar{Y}_{p,q}^{\ell+1})\rangle\left|g_{p q}\right\rangle\)

Quantum tomography is performed with precision \(\eta\) so that all values and positions \((p, q, f(\bar{Y}_{p q}^{\ell+1}))\) are obtained with a high probability. Values above \(\eta\) are known exactly, while values that are less than or equal to \(\eta\) are set to \(0\).

QRAM Update and Pooling

Next, QRAM needs to updated with the value for the next layer, which is \(A^{\ell+1}\), while sampling. Pooling needs to be implemented in this step as well, either through a specific update or by using a QRAM data data structure. We will review quantum pooling more in depth later.

Quantum Pooling

At the end of layer \(\ell\), the pooling operation of size \(P\) is performed on the convolution layer output \(f(X^{\ell+1})\), yielding the tensor after pooling \(\tilde{X}^{\ell+1}\). Below, thee authors provide a figure shows a \(2\times 2\) tensor pooling such that different pooling regions having seperate colors:

QCNN Pooling Representation

Image Source: \([1]\)

Here, for some point at position \((i^{\ell+1}, j^{\ell+1}, d^{\ell+1})\) in \(f(X^{\ell+1})\), the pooling region it corresponds to is at postion \((\tilde{i}^{\ell+1}, \tilde{j}^{\ell+1}, \tilde{d}^{\ell+1})\) in \(\tilde{X}^{\ell+1}\), such that:

\(\left\{\begin{array}{l} \tilde{d}^{\ell+1}=d^{\ell+1} \\ \tilde{j}^{\ell+1}=\left\lfloor\frac{j^{\ell+1}}{P}\right\rfloor \\ \tilde{i}^{\ell+1}=\left\lfloor\frac{i^{\ell+1}}{P}\right\rfloor \end{array}\right.\)

This pooling operation occurs at the end of the layer during the QRAM update and pooling operation, such that the sampled values are stored according to the pooling layers. Note, the authors state this kind of pooling can be efficiently applied to their QCNN structure, which we will look at to continue.

Let's start by denoting output of layer \(\ell\) after quantum tomography \(\mathcal{X}^{\ell+1}\). Next, quantum pooling is applied to the yielding \(\tilde{\mathcal{X}}^{\ell+1}\), which has dimensions \(\frac{H^{\ell+1}}{P} \times \frac{W^{\ell+1}}{P} \times D^{\ell+1}\). \(\tilde{\mathcal{X}}^{\ell+1}\) will be used as the input for layer \(\ell + 1\) and the values for \(\tilde{\mathcal{X}}^{\ell+1}\) are stored in QRAM. The valyes are use to create trees \(\tilde{T}_{p^{\prime}}^{\ell+1}\), such that they relate to the matrix expansion \(\tilde{A}^{\ell+1}\) and that we will talk about later. It is important to note that \(\mathcal{X}^{\ell+1}\) cannot be know before quantum tomography is over. Thus, QRAM updating must be changed to pool in an online fashion each time a sample from \(|f(\bar{X}^{\ell+1})\rangle\) is drawn.

To continue.

Quantum Time Complexity

The quantum time complexity of one forward pass through the convolution layer \(\ell\), where \(\widetilde{O}\) hides the polylogarithmic factors, is:

\(\widetilde{O}\left(\frac{1}{\epsilon \eta^2} \cdot \frac{M \sqrt{C}}{\sqrt{\mathbb{E}\left(f\left(\bar{X}^{\ell+1}\right)\right)}}\right)\)

which can be written also as:

\(\widetilde{O}\left(\sigma H^{\ell+1} W^{\ell+1} D^{\ell+1} \cdot \frac{M \sqrt{C}}{\epsilon \sqrt{\mathbb{E}(f(\bar{X}^{\ell+1}))}}\right)\)

Here, \(\sigma\in[0,1]\) is the fraction of sampled elements, where the number of elements is size \(H^{\ell+1} W^{\ell+1} D^{\ell+1}\). Note, that the the running time of the classical CNN layer is:

\(\widetilde{O}\left(H^{\ell+1} W^{\ell+1} D^{\ell+1} \cdot H W D^{\ell}\right)\)

To continue.

Quantum Back Propagation

To continue.

Quantum Computing Preliminaries

You can read about quantum computing preliminaries on my website here.