HALRA

HALRA (High-performance ALRA) is a Python implementation of the ALRA algorithm for imputing missing values in single-cell RNA-seq data [1]. It is designed to operate efficiently on sparse matrices and scale to large datasets by preserving sparsity throughout the pipeline wherever possible.

HALRA performs:

Low-rank matrix reconstruction via randomized SVD
Gene-wise thresholding of reconstructed values
Per-gene rescaling to match observed statistics
Restoration of observed (nonzero) values

The goal is to denoise and impute dropout values while preserving biological signal.

Expected Input

HALRA supports two input types:

1. AnnData

.X should contain a dense NumPy array or a SciPy sparse matrix (CSR/CSC)
.obs_names and .var_names are used as cell and gene labels

2. Raw matrix + labels

matrix: NumPy ndarray or SciPy sparse matrix (cell x gene)
cells: list/array of cell names (length = n_rows)
genes: list/array of gene names (length = n_cols)

HALRA requires log-normalized count data, so either log normalize your data first or pass normalize=True to the halra function when imputing.

Installation

HALRA can be installed as a pip package but requires Python >= 3.10. Example:

conda create -n halra_env python=3.10
conda activate halra_env
pip install halra

Usage Example (AnnData)

import anndata as ad
from halra import halra

# Load your AnnData object
adata = ad.read_h5ad("anndata.h5ad")

# Run HALRA (this assumes .X is not already normalized)
adata_imputed = halra(adata, normalize=True)

# Result:
# adata_imputed.X now contains imputed values
# All metadata (.obs, .var, etc.) is preserved (filtered if needed)

Usage Example (10x)

import os
import pandas as pd
from scipy.io import mmread
from halra import halra

# Load 10x files
mtx_dir = "/path/to/dir"
matrix = mmread(os.path.join(mtx_dir, "matrix.mtx")).T
features = pd.read_csv(os.path.join(mtx_dir, "features.tsv"), sep="\t", header=None, usecols=[0])
barcodes = pd.read_csv(os.path.join(mtx_dir, "barcodes.tsv"), sep="\t", header=None)

# Run HALRA
imputed_matrix, cells, genes = halra(matrix, barcodes, features, normalize=True)

# Result:
# imputed_matrix contains imputed values
# cells and genes contain the filtered cell/gene labels

Dependency Notes

HALRA depends on:

numpy
scipy
scikit-learn (for randomized SVD)
anndata (>=0.10)

Current Limitations and Experimental Features

Reconstruction step is dense (SVD-based), which may limit scalability for extremely large datasets (>1M cells)
Distributed and HPC-oriented implementations of HALRA are under active development and can be found in the experimental/ directory. These are not yet part of the stable package API.

References

[1] Linderman, G. C. et al. Zero-preserving imputation of single-cell RNA-seq data. Nat Commun 13, (2022).

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
experimental/distributed		experimental/distributed
src/halra		src/halra
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HALRA

Expected Input

1. AnnData

2. Raw matrix + labels

Installation

Usage Example (AnnData)

Usage Example (10x)

Dependency Notes

Current Limitations and Experimental Features

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HALRA

Expected Input

1. AnnData

2. Raw matrix + labels

Installation

Usage Example (AnnData)

Usage Example (10x)

Dependency Notes

Current Limitations and Experimental Features

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages