Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

ORAL

Abstract

A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits generating specified target quantum states from a fixed initial one, addressing a central challenge in the Noisy Intermediate-Scale Quantum (NISQ) era and future fault-tolerant quantum computing. We use a tabular Q-learning within a discretized quantum state space, to effectively manage the exponential growth of the space dimension. A hybrid reward is introduced, combining a static, domain-informed one that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures. This is a circuit-aware reward, in contrast to the current works on this topic, which are mostly fidelity-based. By leveraging sparse matrix representations and state-space discretization, the method enables navigating high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits. Moreover, extending the framework to a universal gate set still yields low depth circuits, highlighting the algorithm's adaptability. The results confirm that this RL approach, with our circuit-aware method, is resource-efficient in exploring the complex quantum state space and synthesizes optimized quantum circuits.

*Support from Spanish MICIN grant PID2021-122547NB-I00 and the"MADQuantum-CM" project funded by Comunidad de Madrid and by the Ministry for Digital Transformation and of Civil Service of the Spanish Government through the QUANTUM ENIA project call –Quantum Spain project, and by the European Union through the Recovery, Transformation and Resilience Plan Next Generation EU within the framework of the Digital Spain 2026 Agenda, the CAM Programa TEC-2024/COM-84 QUITEMAD-CM. M.A. M.-D. has been partially supported by the U.S. Army Research Office, Grant No. W911NF-14-1-0103 

Publication: https://arxiv.org/abs/2507.16641

Presenters

  • Sara Giordano

    • Universidad Complutense de Madrid (UCM)

Authors

  • Sara Giordano

    • Universidad Complutense de Madrid (UCM)
  • Kornikar Sen

    • Universidad Complutense de Madrid
  • Miguel A. Martin-Delgado

    • Universidad Complutense de Madrid