This repository contains code and data to reproduce the results of the paper Effects of segmentation errors on downstream-analysis in highly-multiplexed tissue imaging. Ground truth datasets are stored in the data folder. Scripts to run experiments and create the figures can be found in src. Certain scripts require a GPU and an installation of CUDA. Make sure these are available on your machine. Also, some scripts should be run with Slurm.
You can install the required environment easily with
conda env create -f environment.yml
Some of the scripts exlpicitly expect a Slurm environment, specifically Slurm arrays. Here is an example script you can use to run these experiments:
#!/bin/bash
#SBATCH --partition=NAME # Request a specific partition
#SBATCH --ntasks=1 # Number of tasks (see below)
#SBATCH --cpus-per-task=16 # Number of CPU cores per task
#SBATCH --time=1-00:00 # Runtime in D-HH:MM
#SBATCH --mem=32
#SBATCH --array=1-8 # Choose the array size depending on the
# resources you can/want to spend.
# Larger arrays will enable better parallelization.
#SBATCH --output=log_msg/cpu_exp_hostname%j.out
#SBATCH --error=err_msg/cpu_exp_hostname_%j.err
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=YOUR_EMAIL
# Check if the Python file argument is provided
if [ -z "$1" ]; then
echo "Error: No Python file specified."
exit 1
fi
conda run -n perturbation python "$1" --current $SLURM_ARRAY_TASK_ID --max $SLURM_ARRAY_TASK_MAX
To run this on your cluster you have to submit it, e.g. with
sbatch cpu_experiment.sh PYTHON_SCRIPT.py
The corresponding Python script will be executed on all tasks/jobs of the Slurm array in parallel. Workload distribution is done by using the IDs of the SLurm array elements.