Deep Reinforcement Learning Based Control of Coherent Transport by Adiabatic Passage of Spin Qubits
ORAL
Abstract
Several tasks, involving the temporal evolution of a system of qubits, require stochastic methods to identify the best sequence of gates and the interaction time among qubits. The great success of deep reinforcement learning (DRL) methods to identify the best strategy in problems involving a competition between short and long-term rewards, has suggested its application to quantum information (QI) as well.
To extend the application of DRL to the transfer of QI, we focus on Coherent Transport by Adiabatic Passage (CTAP) on a chain of three semiconductor quantum dots (QDs). This task is usually performed by the so-called counter-intuitive sequence of gate pulses, which can coherently transfer an electronic population from the first to the last site of an odd chain of QDs, leaving the central depopulated.
We apply a technique to find a near-optimal gate pulse sequence without explicitly providing any preliminary knowledge of the underlying physical system to the DRL agent. Using the advantage actor-critic algorithm, with a small neural network as a function approximator, we trained a DRL agent to select the best action during the evolution to achieve the same results previously found only by ansatz solutions. The method naturally extends to systems affected by dephasing and loss.
To extend the application of DRL to the transfer of QI, we focus on Coherent Transport by Adiabatic Passage (CTAP) on a chain of three semiconductor quantum dots (QDs). This task is usually performed by the so-called counter-intuitive sequence of gate pulses, which can coherently transfer an electronic population from the first to the last site of an odd chain of QDs, leaving the central depopulated.
We apply a technique to find a near-optimal gate pulse sequence without explicitly providing any preliminary knowledge of the underlying physical system to the DRL agent. Using the advantage actor-critic algorithm, with a small neural network as a function approximator, we trained a DRL agent to select the best action during the evolution to achieve the same results previously found only by ansatz solutions. The method naturally extends to systems affected by dephasing and loss.
–
Presenters
-
Riccardo Porotti
Dipartimento di Fisica, Università degli Studi di Milano
Authors
-
Riccardo Porotti
Dipartimento di Fisica, Università degli Studi di Milano
-
Dario Tamascelli
Dipartimento di Fisica, Università degli Studi di Milano
-
Marcello Restelli
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano
-
Enrico Prati
Istituto di Fotonica e Nanotecnologie, Consiglio Nazionale delle Ricerche