Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

Sara Giordano; Kornikar Sen; Miguel A. Martin-Delgado

Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

Oral-In-person

Abstract

A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits generating specified target quantum states from a fixed initial one, addressing a central challenge in the Noisy Intermediate-Scale Quantum (NISQ) era and future fault-tolerant quantum computing. We use a tabular Q-learning within a discretized quantum state space, to effectively manage the exponential growth of the space dimension. A hybrid reward is introduced, combining a static, domain-informed one that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures. This is a circuit-aware reward, in contrast to the current works on this topic, which are mostly fidelity-based. By leveraging sparse matrix representations and state-space discretization, the method enables navigating high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits. Moreover, extending the framework to a universal gate set still yields low depth circuits, highlighting the algorithm's adaptability. The results confirm that this RL approach, with our circuit-aware method, is resource-efficient in exploring the complex quantum state space and synthesizes optimized quantum circuits.

March 18, 2026, 5:42 PM – March 18, 2026, 5:54 PM

Publication: https://arxiv.org/abs/2507.16641

Presenters

Sara Giordano
- Universidad Complutense de Madrid (UCM)

Authors

Sara Giordano
- Universidad Complutense de Madrid (UCM)
Kornikar Sen
Miguel A. Martin-Delgado