Skip to content

bionetslab/bosperrus-package

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BOSPERRUS: BOundary effects in SPatial graphs: ERRor modeling and Untangling Strategies

Tutorial — Getting Started with BosporusFlow

BOSPORUS (BOundary effects in Spatial graphs: error modeling and Untangling Strategies) detects and corrects boundary effects in spatial graphs. Nodes close to the border of a tissue sample or point cloud tend to have systematically lower centrality scores than nodes in the center — not because they are biologically different, but because they have fewer neighbours. BOSPORUS models this distance-dependent bias and optionally corrects for it.


Table of Contents

  1. Installation
  2. Key concepts
  3. Quick-start: one-liner with run_all
  4. Step-by-step walkthrough
  5. Reading the results
  6. Choosing a graph type
  7. Choosing a distance function
  8. Visualising the correction (optional)
  9. API reference summary

Installation

Within a conda environment, run

pip install git+https://github.com/bionetslab/bosporus-package.git

Pip dependencies (scikit-learn, scipy, numpy, pandas) are installed automatically. Within the environment, you also need to

conda install graph-tool -c conda-forge

Key concepts

Entry points / input types: Entry points

Like its namesake, the Flow class in the Bosporus package enables multiple entry points and flexible branching:
🟣 Given coordinates, a distance function, a graph operator (Delaunay, k-nearest neighbor, or radius nearest neighbor, and functions for node observations (degree, closeness, clustering coefficients, …).
🩷 Given coordinates, a distance function, an edge list, and functions for node observations (degree, closeness, clustering coefficients, …).
🟡 Given coordinates, a distance function, and observations.
☘️ Given distances and observations.

Distance functions:
Distance functions


Quick-start: one-liner with run_all

import numpy as np
from bosporus import BosporusFlow

rng = np.random.default_rng(42)
coords = rng.uniform(0, 100, size=(500, 2))   # 500 random 2-D points

flow = BosporusFlow(coords)

# run_all builds the graph, computes centralities, fits models, and corrects
best_fits = flow.run_all(
    graph_type="delaunay",
    params=None,
    distance_function=flow.compute_distance_to_convex_hull,
)

# Inspect results
print(flow.fit_quality[["measure", "best_fit_type", "observed_effect_strength"]])
print(flow.df.head())   # contains raw + corrected centrality columns

run_all returns a dictionary of the best Fit object for each centrality measure.


Step-by-step walkthrough

Step 1 – Create a toy dataset

We simulate a disc-shaped point cloud to produce an obvious boundary effect: nodes at the perimeter will always have fewer neighbours than nodes at the centre.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from bosporus import BosporusFlow

rng = np.random.default_rng(0)

# Sample uniformly inside a disc of radius 50
n = 600
angles = rng.uniform(0, 2 * np.pi, n)
radii  = 50 * np.sqrt(rng.uniform(0, 1, n))   # sqrt for uniform area sampling
coords = np.stack([radii * np.cos(angles),
                   radii * np.sin(angles)], axis=1)

Step 2 – Initialise BosporusFlow

flow = BosporusFlow(coords)

BosporusFlow takes a single argument: an (N, 2) or (N, 3) NumPy array of spatial coordinates. Internally it creates an empty DataFrame flow.df indexed by node ID.

Step 3 – Build the spatial graph

Choose one of three graph construction methods:

# Option A – Delaunay triangulation (recommended default, no hyperparameter)
flow.construct_graph("delaunay", params=None)

# Option B – k-nearest-neighbour graph (directed)
# flow.construct_graph("knn", params={"k": 6})

# Option C – radius-nearest-neighbour graph (undirected)
# flow.construct_graph("rnn", params={"r": 15.0})

The edge list is stored in flow.edge_list.

Step 4 – Compute centrality measures

measures = ["degree", "closeness", "betweenness", "harmonic", "clustering", "pagerank"]
flow.compute_centralities(measures=measures)

print(flow.df.head())

All six measures are now columns in flow.df.

Step 5 – Compute a boundary distance

BOSPORUS needs to know how far each node is from the boundary. The method returns the column key that was added to flow.df and that will be used for fitting.

# Distance to the convex hull of the point cloud
distance_key = flow.compute_distance_to_convex_hull()

# Alternatives (use at most one):
# distance_key = flow.compute_distance_to_rectangular_border()
# distance_key = flow.compute_distance_to_pointset(pointset=boundary_coords)
# distance_key = flow.compute_distance_to_mask(mask=binary_2d_array)

print(f"Distance column: '{distance_key}'")
print(flow.df[[distance_key]].describe())

Step 6 – Fit models and correct

from bosporus.fit import ConstantFit, PiecewiseLinearFit, ExponentialSaturationFit, MichaelisMentenFit

flow.fit_models(
    measures=measures,
    distance_key=distance_key,
    fits=[ConstantFit, PiecewiseLinearFit, ExponentialSaturationFit, MichaelisMentenFit],
    calculate_rel_ll_to_baseline=ConstantFit,   # ConstantFit = "no boundary effect" baseline
)

After this step:

  • flow.fit_quality — a DataFrame with one row per measure summarising the best model, AIC-based statistics, effect strength, and half-life.
  • flow.df — now also contains columns named "BOSPORUS corrected <measure>" for each centrality.

Reading the results

# Overview table
print(flow.fit_quality[[
    "measure",
    "best_fit_type",
    "observed_effect_strength",   # signed relative drop from border to centre
    "observed_half_life",         # distance at which ~50% of the effect has decayed
    "affected samples",           # fraction of nodes still in the boundary-affected zone
]].to_string(index=False))

observed_effect_strength is defined as (c_border − c_centre) / (c_border + c_centre). A value close to 0 means negligible boundary bias; a value near ±1 means strong bias.

best_fit_type tells you which model won the AIC comparison:

Model Interpretation
Constant Fit No detectable boundary effect
Piecewise Linear Fit Linear drop up to a breakpoint, then flat plateau
Exponential Saturation Fit Smooth exponential approach to a plateau
Michaelis-Menten Fit Hyperbolic saturation (analogous to enzyme kinetics)

To access the corrected values:

# All corrected columns at once
corrected_cols = [c for c in flow.df.columns if c.startswith("BOSPORUS corrected")]
print(flow.df[corrected_cols].head())

# Single measure
flow.df[["degree", "BOSPORUS corrected degree"]].head()

To access fit parameters directly:

degree_fit = flow.best_fits["degree"]
print(degree_fit.name)      # e.g. "Exponential Saturation Fit"
print(degree_fit.params)    # fitted parameter dict
print(degree_fit.AIC)

Choosing a graph type

Graph type Key parameter When to use
"delaunay" none Default choice; produces a natural triangulation without isolated nodes
"knn" k (int) When you want a fixed connectivity per node
"rnn" r (float, same units as coords) When biological interaction radius is known

Choosing a distance function

Method Use case
compute_distance_to_convex_hull() General default; works for any convex or near-convex tissue shape
compute_distance_to_rectangular_border() Rectangular imaging window / biopsy
compute_distance_to_pointset(pointset) Border defined by a set of explicit landmark coordinates
compute_distance_to_mask(mask) Binary image mask defining the tissue boundary

All four methods accept an optional distance_key string argument to rename the distance column in flow.df.


Visualising the correction (optional)

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

sc = axes[0].scatter(coords[:, 0], coords[:, 1],
                     c=flow.df["degree"], cmap="viridis", s=8)
axes[0].set_title("Raw degree centrality")
plt.colorbar(sc, ax=axes[0])

sc2 = axes[1].scatter(coords[:, 0], coords[:, 1],
                      c=flow.df["BOSPORUS corrected degree"], cmap="viridis", s=8)
axes[1].set_title("BOSPORUS-corrected degree centrality")
plt.colorbar(sc2, ax=axes[1])

plt.tight_layout()
plt.savefig("bosporus_correction.png", dpi=150)
plt.show()

Nodes near the border should appear more uniform after correction.


API reference summary

BosporusFlow(coordinates)

Method Signature Description
run_all (graph_type, params, distance_function, ...) Single-call convenience wrapper
construct_graph (graph_type, params) Build edge list
compute_centralities (measures) Populate flow.df with centrality columns
compute_distance_to_convex_hull (distance_key=None) Add convex-hull distance column
compute_distance_to_rectangular_border (distance_key=None) Add rectangular-border distance column
compute_distance_to_pointset (pointset, distance_key=None) Add distance-to-pointset column
compute_distance_to_mask (mask, distance_key=None) Add distance-to-mask column
fit_models (measures, distance_key, fits, ...) Fit all models, select best by AIC, correct centralities

Key attributes after a full run

Attribute Type Content
flow.df pd.DataFrame All node-level data: coordinates index, centralities, distances, corrected values
flow.fit_quality pd.DataFrame Per-measure model selection summary
flow.best_fits dict[str, Fit] Best Fit object per measure
flow.edge_list set Set of edges (frozensets or tuples)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages