Quantum Embeddings for Machine Learning

Team 2, ENPH 353

Joshua Himmens and George Sleen

S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran, "Quantum embeddings for machine learning," arXiv preprint arXiv:2001.03622, 2020.

Quantum computers have extremely limited circuit depths
Variational quantum classifiers struggle to train embeddings
Embeddings can be trained to naturally distinguish classes
There are competing metrics to semantic embeddings

Background

Qubits 4 / 13

Quantum computers operate on qubits, the fundamental unit of quantum information

A qubit is a unit of quantum information similar to a classical bit
They exist as a superposition of the eigenvalues \(|0\rangle\) and \(|1\rangle\) on a 2D sphere
Qubits are operated on by performing rotations on the vector in the sphere

Qubits and Quantum Computers 5 / 13

Quantum computers operate on qubits, the fundamental unit of quantum information

Qubits are difficult to work with, they decohere (lose their state) extremely quickly
Qubits can interfere with each other

Classification and Hyperplanes 6 / 13

Classification of data is done by creating hyperplanes in a high dimensional Hilbert space

Each piece of data is embedded as a point in a Hilbert space
\(n\) qubits form a \(2^n\) dimensional Hilbert space
Hyperplanes are used to classify the embedded data

Implementations

Approaches 8 / 13

Two complementary approaches to quantum machine learning

Quantum Metric Learning

Training is done to optimize the embedding
The classifier is analytically chosen after training

Variational Quantum Classifiers (VQCs)

Classifier is a parametrized variational quantum circuit
Similar to traditional machine learning
The embedding is only weakly trainable

NISQ Computing 9 / 13

Neural embedding networks are too large for NISQ computers

Noisy intermediate scale quantum (NISQ) computers only permit limited depth due to qubit decoherence
Network scales cannot get large enough to make VQCs alone effective
Qubit decoherence times are on the order of 1ms

Loss Functions for Metric Learning 10 / 13

Quantum metric's loss function is a distance metric

Metric learning aims to:
- Maximize distance between unrelated data (Maximize trace distance)
- Minimize distance between related data (Maximize fidelity)

Distance Metric Implementations 11 / 13

The Hilbert-Schmidt distance metric is easier to optimize than the trace distance and gives similar results

Trace distance \(D_{\text{tr}}\) is essential to quantum information
Hilbert-Schmidt distance \(D_{HS}\) is closely related and much easier to compute
Much easier to compute means it's optimized better

\[D_{\text{tr}}(\rho, \sigma) = \tfrac{1}{2} Tr[\rho - \sigma]\] \[D_{\text{HS}}(\rho, \sigma) = Tr[(\rho - \sigma)^2]\]

\[\tfrac{1}{2} D_{\text{HS}} \leq D_{\text{tr}}^2 \leq r \, D_{\text{HS}}\] \[r = \frac{\text{rank}(\rho)\,\text{rank}(\sigma)}{\text{rank}(\rho) + \text{rank}(\sigma)}\]

SWAP procedure 12 / 13

The efficacy of a quantum embedding is measured by performing a SWAP operation to find the fidelity. Trace distance is measured directly

Fidelity

How tightly data is clustered
\(F = |\langle\phi|\psi\rangle|^2\)

Trace distance

How far apart data clusters are
\(D_{\text{tr}}(\rho, \sigma) = \tfrac{1}{2} Tr[\rho - \sigma]\)

The SWAP procedure is a simple quantum circuit that creates an interference pattern between the swapped and not swapped data. This is used to measure the fidelity of the data.

Questions!

Quantum computers have extremely limited circuit depths
Variational quantum classifiers struggle to train embeddings
Embeddings can be trained to naturally distinguish classes
There are competing metrics to semantic embeddings