Des-q: a quantum algorithm to construct and efficiently retrain decision trees for regression and binary classification
ORAL
Abstract
Decision trees are widely used due to their simplicity and interpretability. However, as data sizes grow, state-of-the-art methods for construction and retraining of decision trees become increasingly slow, scaling polynomially with the number of training examples (N). We introduce a novel quantum algorithm, named Des-q, to construct and retrain decision trees for regression and binary classification. Assuming small and periodic increments of training data, Des-q significantly reduces the time required for tree retraining, achieving a poly-logarithmic time complexity in N, even accounting for the time needed to load the new training examples into quantum-accessible memory.
Des-q divides the feature space into k distinct regions, whose anchor points are determined with a novel efficient quantum-supervised clustering method that leverages the unsupervised q-means algorithm by Kerenidis etal. Des-q first efficiently estimates the Pearson correlation between each feature vector and the labels with a novel quantum technique, then they are used in a weighted distance estimation to cluster the training examples in k disjoint regions based on nearest-neighbor matching, and then proceed to expand the tree using the same procedure. We benchmark the simulated version of Des-q for regression and binary classification on four data sets with numerical features. Des-q exhibits similar performance to the state-of-the-art decision tree while significantly speeding up the periodic tree retraining.
Des-q divides the feature space into k distinct regions, whose anchor points are determined with a novel efficient quantum-supervised clustering method that leverages the unsupervised q-means algorithm by Kerenidis etal. Des-q first efficiently estimates the Pearson correlation between each feature vector and the labels with a novel quantum technique, then they are used in a weighted distance estimation to cluster the training examples in k disjoint regions based on nearest-neighbor matching, and then proceed to expand the tree using the same procedure. We benchmark the simulated version of Des-q for regression and binary classification on four data sets with numerical features. Des-q exhibits similar performance to the state-of-the-art decision tree while significantly speeding up the periodic tree retraining.
–
Publication: https://arxiv.org/abs/2309.09976
Presenters
-
Romina Yalovetzky
JPMorgan Chase
Authors
-
Romina Yalovetzky
JPMorgan Chase
-
Niraj Kumar
JPMorgan Chase, JPMorgan Chase & Co.
-
Changhao Li
JPMorgan Chase, JP Morgan Chase
-
Pierre Minssen
JPMorgan Chase
-
Marco Pistoia
JP Morgan Chase, JPMorgan Chase