High-Dimensional Quantum Feature Mapping

Quantum Principal Component Analysis

Quantum Principal Component Analysis qPCA identifies large eigenvalues of unknown density matrices utilizing corresponding eigenvectors in \(O(\log d)\). Where principal component analysis analyzes positive semi-definite Hermitian matrices by decomposing eigenvectors in relation to the largest eigenvalues in the matrix for dimensionality reduction. Improved computational complexity will hopefully allow new methods for state discrimination and cluster assignment for variational and kernel classification machine learning algorithms.

Classical Principal Component Analysis PCA is greatly used in signal processing and machine learning for dimension reduction with a time complexity of \(O(N^{3})\), where \(N\) is the dimension of the data. The current issue with this form of dimension reduction is that when the data is large, the classical PCA ends up becoming non-tractable.

A possible solution for dimensional reduction of large data is utilizing Quantum computing parallelism, where the qPCA algorithm can then run with a time complexity of \(O(N \operatorname{ploy}(\log N))\). The outputs of qPCA are the quantum states containing all eigenvalues and eigenvectors of the principal components of the data for sampling.

Consider a dataset \(\left\{\mathbf{x}_i\right\}_{i=1}^N\) with \(N\) data points where each data point is a \(D\)-dimensional column vector \(\mathbf{x}_i=\) \(\left(x_{i 1}, x_{i 2}, \cdots, x_{i D}\right)^T \in \mathbb{R}^D\). Through principal component analysis we can lower the dimensional space to a \(N\times D\) matrix \(X=\left(\mathbf{x}_1, \cdots, \mathbf{x}_N\right)^T\) while maximally preserving data variance. Where an eigendecomposition for a single matrix \(X\) is:

\(X=\sum_{j=1}^D \sigma_j\left|\mathbf{u}_j\right\rangle\left\langle\mathbf{v}_j\right|\)

such that \(\left\{\sigma_j \in \mathbb{R}_{\geq 0}\right\}_{j=1}^D\) are the singular eigenvalues of the matrix in descending order and \(\left\{\left|\mathbf{u}_j\right\rangle \in \mathbb{R}^N\right\}_{j=1}^D\) and \(\left\{\left|\mathbf{v}_j\right\rangle \in \mathbb{R}^D\right\}_{j=1}^D\) are, respectively, the left and right singular eigenvectors or principal components. For the rest of this section, we will look at why we need qPCA, how we can achieve it, and what is the procedure for the underlying algorithm.

High-Dimensional Quantum Data Reduction Algorithm

Here, we will analyze matrix dimensionality reduction using qPCA. Let \(\xi\) be the precision parameter and \(p\) denote variance retained after dimensionality reduction. Given efficient quantum access to the matrix \(A=U \Sigma V^T\) \(=\) \(\sum_i^r \sigma_i u_i v_i^T\in \mathbb{R}^{n \times m}\), let the top \(k\) right singular vectors \(\bar{V}^{(k)} \in \mathbb{R}^{m \times k}\), where \(\left\|V^{(k)}-\bar{V}^{(k)}\right\|\) \(\leq\) \(\frac{\xi \sqrt{p}}{\sqrt{2}}\). There exists a quantum algorithm that creates \(|\bar{Y}\rangle\) \(=\) \(\frac{1}{\|Y\|_F} \sum_i^n\left\|y_{i, \cdot}\right\||i\rangle\left|y_{i, \cdot}\right\rangle\) proportional to the projection of \(A\) in the PCA subspace. Where the algorithmic lower-bound probability \(1-1 / \operatorname{poly}(m)\) and with error \(\||Y\rangle-|\bar{Y}\rangle \|\) \(\leq\) \(\xi\) with the time complexity \(\widetilde{O}(1 / \sqrt{p})\). If, instead, we estimate the value of \(\|\bar{Y}\|_F\) to a realtive error of \(\eta\) then the time complexity becomes \(\widetilde{O}\left(\frac{1}{\sqrt{p} \eta}\right)\)

It is important to note what data is PCA-representable and qPCA-representable. First, we begin by defining PCA-representable data. Let a set of \(n\) data points be defined \(m\) coordinates and represented by the matrix \(A\), which was given earlier as \(\sum_i^r \sigma_i u_i v_i^T\in \mathbb{R}^{n \times m}\). Matrix or data \(A\) is PCA-representable if there exists \(p \in\left[\frac{1}{2}, 1\right]\), \(\varepsilon \in[0,1 / 2]\), \(\beta \in[p-\varepsilon, p+\varepsilon]\), and \(\alpha \in[0,1]\) such that:

\(\exists k \in O(1)\) where the variance retained after dimensionality reduction \(p=\frac{\sum_i^k \sigma_i^2}{\sum_i^m \sigma_i^2}\).
For at least \(\alpha n\) points \(a_{i}\), it is maintained that \(\frac{\left\|y_i\right\|}{\left\|a_i\right\|} \geq \beta\), where \(\left\|y_i\right\|\) \(=\) \(\sqrt{\sum_i^k\left|\left\langle a_i \mid v_j\right\rangle\right|^2}\left\|a_i\right\|\).

Note, this allows for algorithmic lower-bound probability of \(1-1 / \operatorname{poly}(m)\) we saw before.

Let's consider qPCA-representable case. Given \(a_{i}\) is a row of \(A\in\mathbb{R}^{n\times d}\), the time complexity is \(\frac{\left\|a_i\right\|}{\left\|\bar{y}_i\right\|}\) \(=\) \(\frac{1}{\beta}\) \(=\) \(O(1)\) with the a probability greater than \(\alpha\).

Now that we have defined the parameters, data assumptions, and goal for our qPCA algorithm, let's walk through the algorithm itself. The abstract concept is that the algorithm transforms the quantum state \(|\psi_{s}\rangle\) that holds the original data in quantum parallel to another quantum state \(|\psi_{e}\rangle\), that will store the new low-dimensional data points:

\({\displaystyle |\psi_s\rangle:=\frac{\sum_{i=1}^{N}|i\rangle \otimes \mathbf{x}_{i}}{||X||_{F}} }\) \(\mapsto\) \({\displaystyle |\psi_{e}\rangle:=\frac{\sum_{i=1}^{N}|i\rangle \otimes \mathbf{y}_{i}}{\|Y\|_F} }\)

such that, \(\|X\|_F\) and \(\|Y\|_F\) are the Frobenius norms of \(X\) and \(Y\). Let:

\(\mathbf{x}_i=\sum_{j=1}^D x_{i j}|j\rangle=||\mathbf{x}_i||\;|\mathbf{x}_{i}\rangle\)

\(\mathbf{y}_i=\sum_{j=1}^d y_{i j}|j\rangle=||\mathbf{y}_i||\;|\mathbf{y}_i\rangle\)

For the implementation of the quantum algorithm, first consider the data set \(\{\mathbf{x}_i\}_{i=1}^N\), or matrix \(X\), is stored in a quantum random access memory qRAM, such that qRAM takes \(|i\rangle|0\rangle|0\rangle \rightarrow|i\rangle\left|a_i\right\rangle|| \vec{a}_i|\rangle\). Next, using quantum access to the vectors and norms, we constuct the state \(\sum_i\left|\vec{a}_i\right|\left|e_i\right\rangle\left|a_i\right\rangle\), where the density matrix for the first register is exactly \(X\). Now, we must create \(n=O\left(t^2 \epsilon^{-1}\right)\) copies of \(X\), in order to have an accuracy \(\epsilon\) in time \(O(n\log d)\) by completing \(e^{-i X t}\) implementations of this process.

An important aspect to keep in mind about this algorithm is that the density matrix exponentiation is most effective when some of the eigenvalues are large. This is because we are utilizing quantum parallelism to amplify deviations in the probability amplitudes in order to reveal eigenvectors corresponding to the large eigenvalues of the unknown state. If all eigenvalues are of size \(O(1/d)\), then the time complexity increases to \(t=(d)\) to generate a transformation such that it rotates the input state \(\sigma\) to an orthogonal state.

Extracting Quantum Principal Components

Let's obtain the first \(d\) principal components in the quantum state form \(|v_{1}\rangle ,|v_{2}\rangle \)\(...|v_{d}\rangle \), where the variance proportion is:

\({\displaystyle \lambda_j:=\frac{\sigma_{j}^{2}}{\sum_{j=1}^{D} \sigma_{j}^{2}},}\) \(j=1,2, \cdots, D\)

From the singular value decomposition form of matrix \(X\), we can rewrite \(|\psi_{s}\rangle\) as:

\(|\psi_{s}\rangle =\sum_{j=1}^{D} \sqrt{\lambda_{j}} |\mathbf{u}_{j}\rangle |\mathbf{v}_{j}\rangle \)

The state of the second quantum register in QRAM is \(\rho=\operatorname{Tr}_{1} \left(|\psi_{s}\rangle \right)\)\( =\sum_{j=1}^{D} \lambda_{j}|\mathbf{v}_{j}\rangle \langle\mathbf{v}_{j}|\), equivalent to \(X^T X / \operatorname{Tr}\left (X^T X\right )\).

To continue.

Efficient Discrete Feature Encoding

To continue.