Skip to content

chinlinCHEN/sdtm_adam_python_project

Repository files navigation

SDTM-to-ADaM (ADSL) Conversion in Python

This project demonstrates the conversion of SDTM (Study Data Tabulation Model) datasets into an ADaM ADSL (Subject-Level Analysis Dataset) using Python. The clinical trial data comes from the CDISC Pilot Project -- a study evaluating the drug Xanomeline at different dosages (placebo, 54 mg low dose, 81 mg high dose) for the treatment of Alzheimer's disease. The conversion procedure follows the specification of ADSL of ADaM's define.pdf and the mapped values with reference to the Control Terminology listed in SDTM's define.pdf.

Project Structure

.
├── code/
│   ├── adsl.py              # Main ADSL derivation program (ADSLProcessor class)
│   └── utils.py              # Helper functions for date conversion, flag derivation, etc.
├── output/
│   ├── adsl.csv              # Final ADSL output (CSV)
│   └── adsl.xpt              # Final ADSL output (XPT / SAS transport)
├── updated-pilot-submission-package/   # CDISC Pilot SDTM source data
└── reference-ranges/                   # Lab reference range data

Data Source

The input SDTM datasets are located under updated-pilot-submission-package/.../sdtm/ and loaded as .xpt (SAS transport) files via pyreadstat. The following SDTM domains are used:

Domain Description Role in ADSL
DM Demographics Base table: subject ID, age, sex, race, ethnicity, arm, reference dates
EX Exposure Treatment end date (last dose), cumulative dose calculation
DS Disposition Discontinuation reason, completion status, end-of-study visit
SC Subject Characteristics Education level (EDUCLVL)
QS Questionnaires MMSE total score, efficacy flag (ADAS-Cog, CIBIC+)
MH Medical History Disease onset date (primary Alzheimer's diagnosis)
SV Subject Visits Treatment start date (Visit 3), visit dates for dose intervals and completion flags
VS Vital Signs Baseline weight, height, and derived BMI

ADSL Processing Workflow

The core logic lives in code/adsl.py as the ADSLProcessor class. The process() method executes a sequential pipeline where each step merges additional derived variables onto the ADSL:

_initialize_adsl()         DM base columns + age groups + race code + education
        │
        ▼
_derive_treatment()        Treatment arms, dates, duration, cumulative/average dose
        │
        ▼
_derive_flags()            Population flags (ITT, Safety, Efficacy) + completion/discontinuation
        │
        ▼
_derive_baseline_vitals()  Baseline weight, height, BMI
        │
        ▼
_derive_disease_history()  Disease onset, duration, end-of-treatment visit
        │
        ▼
_derive_questionnaire()    MMSE total score

Step 1: Initialize ADSL (_initialize_adsl)

The ADSL is seeded from the DM domain. Selected columns form the skeleton of one-record-per-subject:

  • Identity: STUDYID, USUBJID, SUBJID, SITEID
  • Demographics: AGE, AGEU, SEX, RACE, ETHNIC
  • Arm: ARM (planned treatment arm)
  • Reference dates: RFSTDTC, RFENDTC

Derived in this step:

  • RFENDT -- reference end date converted to SAS numeric date (days since 1960-01-01)
  • AGEGR1 / AGEGR1N -- age grouped into <65, 65-80, >80 with numeric codes
  • RACEN -- numeric race code (WHITE=1, BLACK=2, AMERICAN INDIAN=3, ASIAN=4)
  • EDUCLVL -- education level merged from SC where SCTESTCD='EDLEVEL'

Step 2: Derive Treatment (_derive_treatment)

Maps treatment arm information and computes exposure-related variables:

  • TRT01P / TRT01A -- planned and actual treatment (copied from ARM)
  • TRT01PN / TRT01AN -- numeric dose codes (Placebo=0, Low Dose=54, High Dose=81)
  • TRTSDT -- date of first exposure, taken from SV at Visit 3 (first dosing visit), converted to SAS date
  • TRTEDT -- date of last exposure from the last EX record; falls back to the DS disposition date if the EX date is missing and the subject discontinued after Visit 3
  • TRTDUR -- treatment duration in days (TRTEDT - TRTSDT + 1)
  • CUMDOSE -- cumulative dose with special logic for the high-dose arm:
    • Placebo and Low Dose: TRT01PN * TRTDUR
    • High Dose: interval-based calculation (54 mg for interval 1, 81 mg for interval 2, 54 mg for interval 3) using visit dates from SV
  • AVGDD -- average daily dose (CUMDOSE / TRTDUR)

Screen Failure subjects are excluded after this step.

Step 3: Derive Population Flags (_derive_flags)

Assigns Y/N flags that define analysis populations and study completion:

  • ITTFL (Intent-to-Treat) -- Y if the subject is randomized to a valid arm (not Screen Failure or Not Assigned)
  • SAFFL (Safety) -- Y if ITTFL='Y' and TRTSDT is not missing (subject received treatment)
  • EFFFL (Efficacy) -- Y if SAFFL='Y' and the subject has at least one post-baseline ADAS-Cog record and one CIBIC+ record in QS with VISITNUM > 3
  • COMP8FL / COMP16FL / COMP24FL (Week 8/16/24 completers) -- Y if the subject has the corresponding visit in SV and TRTEDT >= visit date
  • DISCONFL (Discontinued) -- Y if DCREASCD is not "Completed"
  • DSRAEFL (Discontinued due to AE) -- Y if DCREASCD is "Adverse Event"
  • DTHFL (Death) -- carried over from DM

Discontinuation reason (DCREASCD, DCDECOD) is derived from DS where DSCAT='DISPOSITION EVENT'.

Step 4: Derive Baseline Vitals (_derive_baseline_vitals)

Extracts baseline measurements from the VS domain:

  • WEIGHTBL -- weight at baseline (Visit 3)
  • HEIGHTBL -- height at screening (Visit 1)
  • BMIBL -- calculated as WEIGHTBL / (HEIGHTBL_m)^2
  • BMIBLGR1 -- BMI grouped into <25, 25-<30, >=30

Step 5: Derive Disease History (_derive_disease_history)

Computes disease-related dates and durations:

  • DISONSDT -- date of disease onset from MH where MHCAT='PRIMARY DIAGNOSIS', converted to SAS date
  • VISIT1DT -- date of Visit 1 from SV, converted to SAS date
  • DURDIS -- disease duration in months ((VISIT1DT - DISONSDT + 1) / 30.44)
  • DURDSGR1 -- disease duration grouped as <12 or >=12 months
  • VISNUMEN -- end-of-treatment visit number from DS (capped at 12 if VISITNUM=13 for protocol completed)

Step 6: Derive Questionnaire (_derive_questionnaire)

  • MMSETOT -- MMSE (Mini-Mental State Examination) total score, computed as the sum of QSORRES from QS where QSCAT='MINI-MENTAL STATE'

How to Run

cd code
python adsl.py

This reads the SDTM .xpt files, runs the full derivation pipeline, and writes adsl.csv and adsl.xpt to the output/ directory.

Dependencies

  • Python 3.x
  • pandas
  • numpy
  • pyreadstat

Reference

About

This project demonstrates the conversion of SDTM (Study Data Tabulation Model) datasets into an ADaM ADSL (Subject-Level Analysis Dataset) using Python following ADaM specification and Control Terminology of CDISC

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors