ws-radseq

Population genomic analysis of white and common threespine sticklebacks from Nova Scotia, combining a labelled set of whole-genome sequenced (WGS) individuals with a broader random sample sequenced by RADseq. The analysis characterises population structure, identifies putative admixed individuals at sympatric sites, and provides genomic evidence discussed in the manuscript:

"Evidence of parental care as a novel reproductive isolating mechanism"
Behrens et al. (2026, full citation forthcoming.)

Overview

Two forms — referred to here as white (wht) and common (cmn) — co-occur at sympatric sites in Nova Scotia. The WGS data from Canal Lake (CL) and Cherry Burton Road (CB) (from Sumarli et al., full citation forthcoming) contain individuals with confirmed white/common identities, whereas the RAD-seq data are random samples of individuals of unknown identity at Canal Lake and Salmon River (two sites where whites/commons are sympatric). Allopatric common sites on Cape Breton (BC, MJ) are included in the prefiltered VCF but excluded from the mainland sNMF and the DAPC.

A joint SNP dataset was built from the merged RAD+WGS call set and used for:

Principal components analysis (PCA) — primary evidence of form-level structure
Ancestry estimation (sNMF) — mainland-only supplemental analysis identifying individuals with mixed ancestry
Discriminant analysis of principal components (DAPC) — posterior assignment probabilities used to quantify the degree of admixture in candidate hybrid individuals

Repository structure

ws-radseq/
├── data/
│   ├── vcf/
│   │   ├── raw/                          # Original merged RAD+WGS joint VCF
│   │   ├── ns_rad_prefiltered_no_M.vcf.gz  # Post-script 01: biallelic SNPs, autosomes + chrY
│   │   ├── ns_rad_filtered.vcf.gz          # Post-script 02: SNPfiltR-filtered, used for PCA
│   │   ├── ns_rad_filtered_v2.vcf.gz       # Post-script 04: split-depth filtered, used for DAPC
│   │   └── ns_rad_filtered_mainland.vcf.gz # Post-script 06: mainland-only, used for sNMF
│   ├── admixture/
│   │   ├── mainland_samples.txt            # Sample list for mainland subsetting
│   │   ├── ns_rad_pruned.{bed,bim,fam,raw} # LD-pruned full dataset (DAPC input)
│   │   ├── ns_rad_pruned_mainland.{bed,bim,fam,ped,geno}  # mainland dataset (sNMF input)
│   │   └── ns_rad_pruned_mainland.snmfProject + .snmf/    # sNMF results (K=1-6, 100 reps)
│   └── gds/                              # GDS format files for SNPRelate PCA
├── meta/
│   ├── popmap.txt                # Sample metadata: sample_id, population, species
│   ├── meta_df.txt               # Extended metadata
│   └── sex_info.txt / .csv       # Sex assignment data
├── envs/
│   └── admixture_env.yaml        # Conda environment (plink, bcftools, tabix)
├── figures/                      # All output figures (PDF)
└── *.R / *.sh                    # Analysis scripts (see below)

Scripts

Scripts are numbered in approximate execution order. Shell scripts (.sh) require WSL with the admixture_env conda environment; R scripts run natively on Windows.

Script	Purpose
`01_prefilter_vcfs.sh`	Initial joint VCF filtering: biallelic SNPs, max-missing 0.7, MAC ≥ 2; removes INFO fields for downstream compatibility
`02_filter_vcfs_snpfiltR.R`	SNPfiltR quality filters on the full joint dataset (depth, GQ, allele balance, missingness); produces VCF for PCA
`03_pca.R`	PCA via SNPRelate on the filtered VCF; produces the main-text PCA figure
`04_filter_rad_wgs.R`	SNPfiltR filtering with split depth thresholds (RAD cap 40×, WGS cap 80×); produces the joint-filtered VCF used by the DAPC pipeline
`05_plink_prep_rad_wgs.sh`	Converts filtered VCF to PLINK format, applies MAF ≥ 0.05 and LD pruning (r² < 0.2); outputs `ns_rad_pruned.*` used by DAPC
`06_filter_mainland.R`	As `04`, but restricted to mainland samples (Cape Breton allopatric commons dropped); used for the supplemental sNMF analysis
`07_snmf_prep_mainland.sh`	As `05`, but for mainland-only VCF with relaxed MAF (≥ 0.03) and no LD pruning to maximise power
`08_snmf_mainland.R`	sNMF ancestry estimation (K = 1–6, 100 replicates) on the mainland subset; produces supplemental bar plots and cross-entropy figure
`09_dapc_rad_wgs.R`	DAPC on the RAD+WGS intersection (allopatric BC/MJ excluded). Trained on the 40 WGS samples (20 CB/cmn + 20 CL/wht) with morphology labels; RAD samples assigned via `predict.dapc()` to avoid label circularity. SNPs restricted to ≤ 5% missingness in the RAD subset. Outputs scores, assignment, scatter, and combined figures
`10_extract_paper_values.R`	Utility: extracts mainland sNMF cross-entropy (ΔCE for K=1 vs K=2), K=2 Q scores, and DAPC posterior P(white) for the six hybrid candidates (CL30, CL34, FCL113_S14_L002, sal_riv27, sal_riv41, sal_riv42) for in-text reporting

Dependencies

For running the shell scripts used in the analysis, install the conda environment described in envs/admixture_env.yaml:

Conda (WSL)

Install via conda env create -f envs/admixture_env.yaml:

plink 1.90b6.21
bcftools
tabix

R packages

Bioconductor: LEA, SNPRelate, gdsfmt
CRAN: SNPfiltR, vcfR, adegenet, tidyverse, viridis, plotly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ws-radseq

Overview

Repository structure

Scripts

Dependencies

Conda (WSL)

R packages

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
envs		envs
figures		figures
meta		meta
.gitattributes		.gitattributes
.gitignore		.gitignore
01_prefilter_vcfs.sh		01_prefilter_vcfs.sh
02_filter_vcfs_snpfiltR.R		02_filter_vcfs_snpfiltR.R
03_pca.R		03_pca.R
04_filter_rad_wgs.R		04_filter_rad_wgs.R
05_plink_prep_rad_wgs.sh		05_plink_prep_rad_wgs.sh
06_filter_mainland.R		06_filter_mainland.R
07_snmf_prep_mainland.sh		07_snmf_prep_mainland.sh
08_snmf_mainland.R		08_snmf_mainland.R
09_dapc_rad_wgs.R		09_dapc_rad_wgs.R
10_extract_paper_values.R		10_extract_paper_values.R
LICENSE		LICENSE
README.md		README.md
ns_radseq.Rproj		ns_radseq.Rproj

Folders and files

Latest commit

History

Repository files navigation

ws-radseq

Overview

Repository structure

Scripts

Dependencies

Conda (WSL)

R packages

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages