This repository contains the source code for investigating Quantum Process Tomography (QPT) using Pauli Strings (and SIC Strings). The goal of this project is to develop and analyze efficient strategies for reconstructing quantum processes (unitary maps) by optimizing the selection of preparation and measurement bases (either Pauli Strings or SIC Strings).
- Core Research & Simulations: Developed by Gabriel Batista.
- Pauli Rank Analysis: The notebooks in the
HeuristicStringsandPauliRank/directories were developed by Carlos Couto and are included here for completeness in finding heuristic strings and analyzing the rank properties of the Pauli bases. These notebooks were used to find heuristic string sets.
- Problem Statement
- Project Structure
- Installation
- Usage
- Simulations (one by one)
- Data Flow
- Main Results
- License
Quantum Process Tomography (QPT) is the procedure for characterizing the dynamics of a quantum system. However, QPT scales exponentially with the number of qubits, making it impractical for larger systems.
This project attempts to find more efficient tomography strategies by:
- Leveraging Pauli Strings and SIC Strings: Using Pauli bases or Symmetric Informationally Complete (SIC) bases to probe the system.
- Optimizing Measurement Strategies: Comparing different heuristics to each other and to random baselines.
- Noise Models: Running both ideal (no-noise) simulations and realistic shot-noise simulations.
- Machine Learning Reconstruction: Using gradient-based optimization to reconstruct the unitary matrix from measurement probabilities.
The repository is organized to separate simulation logic, utilities, and data artifacts.
.
βββ data/ # Simulation data (Git-ignored because of large files)
β βββ pickle/ # Raw data dictionaries (PyTorch tensors moved to CPU)
β βββ plots/ # Generated figures from src/plotting/
βββ HeuristicStrings/ # Tools to generate sequences of Pauli/SICs according to the correlation heuristic (Credit: Carlos Couto)
β βββ corrmat_paulis_sics.zip # WinRAR ZIP archive (4,496 KB)
β βββ HeuristicStrings.ipynb # Jupyter notebook
β βββ PauliStrings.nb # Wolfram Notebook (Not documented but included for completeness sake)
β βββ PauliStrings_SIC.nb # Wolfram Notebook (Not documented but included for completeness sake)
βββ PauliRank/ # Rank analysis tools (Credit: Carlos Couto)
β βββ PauliStrings_Rank.ipynb
βββ src/ # Source code
β βββ simulations/ # Executable scripts for running experiments
β β βββ num_ps_heuristic.py
β β βββ num_ps_random.py
β β βββ ...
β βββ plotting/ # Scripts to visualize data/pickle files
β βββ converters/ # Utilities for data format conversion
β βββ utils/ # Core reusable libraries
β βββ quantum_utils.py # Quantum simulator & Pauli string logic
β βββ ml_utils.py # Optimization loop & loss functions
β βββ data_utils.py # File I/O and tensor management
βββ requirements.txt # Python dependencies
- Python 3.10 or higher.
- A working Python environment (Virtualenv or Conda recommended).
- CUDA support (optional) if you wish to run simulations on GPU.
-
Clone the Repository
git clone https://github.com/GabrielCostaBatista/efficient-qpt.git cd efficient-qpt -
Install Dependencies It is recommended to create a virtual environment first.
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
Install the required packages:
pip install -r requirements.txt
The simulation scripts are located in src/simulations/. Each script corresponds to a different strategy for selecting Pauli strings (e.g., Random, Heuristic).
Standard Run:
python src/simulations/num_ps_heuristic.pyConfigured Run (CLI arguments):
python src/simulations/num_ps_heuristic.py \
--heuristic_file inputs/heuristic_strings/2_18_ps \
--num_qubits 2 \
--num_runs 30 \
--num_shots 1000 \
--min_strings 1 \
--step_size 1Note: Data is automatically saved to
data/pickle/. The code will automatically create these directories if they do not exist.
Most main simulation scripts support command-line arguments via argparse.
The DEFAULT_* globals at the top of each script are fallback defaults used when a CLI argument is not provided.
To inspect available options for a given script, use:
python src/simulations/num_ps_heuristic.py --helpCommon parameters exposed by CLI flags (with defaults taken from each script):
| Variable | Description | Default |
|---|---|---|
NUM_QUBITS |
Number of qubits in the system (e.g., 2, 3) | 2 |
NUM_RUNS |
Number of independent simulation runs to average | 30 |
NOISE |
Boolean (True/False) to enable shot noise simulation |
True |
NUM_SHOTS |
Number of shots per measurement if noise is enabled | 1000 |
MIN_STATE_STRINGS |
Minimum number of Pauli/SIC strings to use for reconstruction | 1 |
MAX_STATE_STRINGS |
Max number of Pauli/SIC strings to use for reconstruction | - |
SS_STEP_SIZE |
State String Step Size (by how much to increase the number of state strings when iterating from MIN_STATE_STRINGS to MAX_STATE_STRINGS) |
1 |
SHUFFLE_STRINGS |
Boolean (True/False) to enable shuffling of the state strings before each run |
True |
USE_PREVIOUS_GUESS |
Boolean (True/False) to use previous optimization result as start for next step |
True |
CHECKPOINTING_FREQUENCY |
Frequency of saving checkpoints (in number of runs) | 1 |
Once data is generated in data/pickle/, you can generate visualizations:
python src/plotting/num_ps_heuristic.pyThis will save figures to data/plots/.
This section documents each simulation script in src/simulations/ and briefly explains its purpose. Use the global configuration variables at the top of each script (e.g., NUM_QUBITS, NUM_RUNS, NOISE, NUM_SHOTS) to control behavior. All scripts save outputs to data/pickle/ and human-readable summaries to data/readable/.
-
num_ps_heuristic.py- Script for evaluating a Pauli String heuristic
Runs simulation for Pauli Strings selected using an heuristic. Sweeps the number of state strings fromMIN_STATE_STRINGStoMAX_STATE_STRINGS, performsNUM_RUNSindependent runs, and records reconstruction fidelity and other relevant information. -
num_ps_random.py- Script for evaluating Random Pauli Strings
Runs simulation for Pauli Strings selected uniformly at random. Sweeps the number of state strings fromMIN_STATE_STRINGStoMAX_STATE_STRINGS, performsNUM_RUNSindependent runs with different random Pauli Strings, and records reconstruction fidelity and other relevant information. -
num_ps_symmetric.py- Symmetry-Avoidance Heuristic
A variation of the heuristic strategy that prioritizes selecting "non-symmetric" Pauli strings first. It aims to maximize information gain by avoiding redundant symmetric measurements.Note: If you plan to use this script for a large number of qubits consider taking a look at the string generation method as there are more efficient ways of generating them.
-
num_sic_heuristic.py- Script for evaluating a SIC String heuristic
Runs simulation for SIC Strings (Symmetric Informationally Complete) selected using an heuristic. Sweeps the number of state strings fromMIN_STATE_STRINGStoMAX_STATE_STRINGS, performsNUM_RUNSindependent runs, and records reconstruction fidelity and other relevant information. -
num_sic_random.py- Script for evaluating Random SIC Strings
Runs simulation for SIC Strings selected uniformly at random (from the complete set). Sweeps the number of state strings fromMIN_STATE_STRINGStoMAX_STATE_STRINGS, performsNUM_RUNSindependent runs, and records reconstruction fidelity and other relevant information.
Legacy simulations are old simulations that were used to test ideas. These scripts can be used to understand which ideas were tested. These scripts rely on an old (but still supported) way of saving data (in .txt files) and may not support all of the most recent features available in the main simulations.
-
correlations.py- Correlation Matrix Analysis
Computes and prints the correlation matrix for a set of random unitaries. Used to analyze the statistical independence of different Pauli string measurements. -
rank_heuristic.py- Rank-based Analysis
Loads pre-computed sets of Pauli strings from JSON files (generated byPauliRank/notebooks) which have specific algebraic rank properties. Benchmarks these specific sets to correlate algebraic rank with reconstruction fidelity. -
variation_fidelity_distance.py/variation_fidelity_distance_2.py- Marginal Improvement Analysis
Investigates the "marginal utility" of adding a single Pauli string to a fixed base set. It correlates the improvement in fidelity with the geometric distance of the new string from the existing set. Version 2 uses a "minimum distance" metric. -
variation_fidelity_symmetric.py/variation_fidelity_symmetric_2.py- Symmetry Impact Analysis
Investigates the impact of adding symmetric vs. non-symmetric Pauli strings to a base set. Version 2 further distinguishes between "Total Symmetric" (global) and "Qubit-Wise Symmetric" (local) relationships to isolate their specific contributions to fidelity.
Notes:
- If you add a new simulation, follow existing I/O and config conventions so results are automatically saved and reproducible.
- To generate plots from a run, ensure its
.pklfiles are present indata/pickle/before invoking the plotting scripts.
- Simulator (
src/simulations/): Runs the simulations to generate data on how good is the recovery of a unitary depending on the set of strings that is used. - Checkpointing: Data is saved periodically to
data/pickle/as.pklfiles.- Note: The
data_utils.pymodule automatically moves PyTorch tensors to CPU before pickling to ensure you can open the data on any machine (even one without a GPU).
- Note: The
Disclaimer: These results indicate general trends, however, in some cases they might lack the statistical robustness that one would expect in a paper.
There are a total of
Using a random selection of Pauli Strings (noiseless), a fidelity β₯ 0.99 is first reached on average at:
| Qubits | Strings needed | Total strings ( |
Fraction |
|---|---|---|---|
| 1 | ~9 | 18 | ~50% |
| 2 | ~13 | 324 | ~4.0% |
| 3 | ~26 | 5832 | ~0.45% |
Note: with 1000 shots per string (noisy), convergence is slower and may not reach the same plateau within the same number of strings.
With noise (1000 shots per string):
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Without noise (infinite shots):
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
This heuristic builds a set of Pauli strings that are as mutually uncorrelated as possible, using two layers of logic:
- Greedy construction: Starting from an initial string or set, the algorithm grows the set one string at a time. At each step it scans all remaining candidates and picks whichever one is least correlated with the strings already in the set. Here, "least correlated" means the sum of its correlations with every current member is as small as possible. This is greedy in the sense that it never reconsiders past choices, it just always takes the locally best option available.
- Random restarts: Because a greedy approach is sensitive to where you start the algorithm repeats the construction many times from different random starting points and keeps whichever run produced the most uncorrelated final set. More iterations means a better chance of stumbling onto a good starting point, at the cost of more computation.
The key limitation worth noting is that greedily minimising correlation one string at a time does not guarantee the globally optimal set. It's possible that accepting a slightly more correlated string early on would open the door to much better choices later. The random restarts partially compensate for this, but don't fully solve it.
We define the correlation between two different strings as measuring how similar their measurement outcome distributions are, averaged over all possible quantum circuits (unitaries drawn from the Haar measure). This computation was done analytically, leading to an expression which solely depends on the preparation and measurement gates.
Inside the HeuristicStrings directory you can find the code to generate sets of strings following this heuristic, as well as undocumented Mathematica notebooks to generate the correlation matrices and string indices.
The following plots compares different string-selection strategies against their respective random baselines. Each panel shows average fidelity (Β± standard error) as a function of the number of strings used, for 1, 2, and 3 qubits.
Compares the two random baselines: Pauli Strings drawn uniformly at random vs SIC Strings drawn uniformly at random.
With noise (1000 shots):
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Without noise:
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Compares the Pauli String random baseline against the rank-based heuristic set (strings chosen to minimise the correlation).
With noise (1000 shots):
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Without noise:
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Compares the Pauli String random baseline against the symmetry-avoidance heuristic, where non-symmetric strings (those that differ in more than just preparation sign) are prioritized first.
With noise (1000 shots):
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Without noise:
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Compares the SIC String random baseline against the SIC heuristic set (strings chosen to minimise correlation within the SIC basis).
With noise (1000 shots):
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Without noise:
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
The rank of the Pauli measurement matrix can be used as a proxy for information gain: higher-rank configurations tend to yield better reconstructions for the same number of strings.
The trend is clearest at intermediate string counts (e.g. 8 strings), where rank differences are large enough to matter but the reconstruction has not yet converged. The rank heuristic outperforms a random baseline:
| 1 Qubit | 2 Qubits | 3 Qubits |
|---|---|---|
![]() |
![]() |
![]() |
Note: these rank-heuristic plots use the legacy text-based data and noiseless simulations. Beyond that the simulations were performed with less runs.
Although the heuristics (and SIC strings) seem to outperform random selections of strings for a single qubit, this advantage disappears when the number of qubits increases. One possible explanation for this phenomena is that what it matters is avoiding doing terrible choices of strings and not making good ones. When the number of qubits increases the ammount of total strings increases exponentially and possibily the % of bad choices reduces exponentially, which makes the choice of a bad set exponentially unlikely.
An interesting future direction to test this hypothesis is to optimise for the worst possible set of strings and check if it performs significantly worst than random.
The source code in this repository is available under the MIT License.
What this means:
- β You can use, copy, modify, and distribute this software for any purpose.
- β You can use it in commercial or private projects.
- π You simply need to include the original copyright notice and license text in any copy of the software/source.
See the LICENSE file for details.

































