This project demonstrates the conversion of SDTM (Study Data Tabulation Model) datasets into an ADaM ADSL (Subject-Level Analysis Dataset) using Python. The clinical trial data comes from the CDISC Pilot Project -- a study evaluating the drug Xanomeline at different dosages (placebo, 54 mg low dose, 81 mg high dose) for the treatment of Alzheimer's disease. The conversion procedure follows the specification of ADSL of ADaM's define.pdf and the mapped values with reference to the Control Terminology listed in SDTM's define.pdf.
.
├── code/
│ ├── adsl.py # Main ADSL derivation program (ADSLProcessor class)
│ └── utils.py # Helper functions for date conversion, flag derivation, etc.
├── output/
│ ├── adsl.csv # Final ADSL output (CSV)
│ └── adsl.xpt # Final ADSL output (XPT / SAS transport)
├── updated-pilot-submission-package/ # CDISC Pilot SDTM source data
└── reference-ranges/ # Lab reference range data
The input SDTM datasets are located under updated-pilot-submission-package/.../sdtm/ and loaded as .xpt (SAS transport) files via pyreadstat. The following SDTM domains are used:
| Domain | Description | Role in ADSL |
|---|---|---|
| DM | Demographics | Base table: subject ID, age, sex, race, ethnicity, arm, reference dates |
| EX | Exposure | Treatment end date (last dose), cumulative dose calculation |
| DS | Disposition | Discontinuation reason, completion status, end-of-study visit |
| SC | Subject Characteristics | Education level (EDUCLVL) |
| QS | Questionnaires | MMSE total score, efficacy flag (ADAS-Cog, CIBIC+) |
| MH | Medical History | Disease onset date (primary Alzheimer's diagnosis) |
| SV | Subject Visits | Treatment start date (Visit 3), visit dates for dose intervals and completion flags |
| VS | Vital Signs | Baseline weight, height, and derived BMI |
The core logic lives in code/adsl.py as the ADSLProcessor class. The process() method executes a sequential pipeline where each step merges additional derived variables onto the ADSL:
_initialize_adsl() DM base columns + age groups + race code + education
│
▼
_derive_treatment() Treatment arms, dates, duration, cumulative/average dose
│
▼
_derive_flags() Population flags (ITT, Safety, Efficacy) + completion/discontinuation
│
▼
_derive_baseline_vitals() Baseline weight, height, BMI
│
▼
_derive_disease_history() Disease onset, duration, end-of-treatment visit
│
▼
_derive_questionnaire() MMSE total score
The ADSL is seeded from the DM domain. Selected columns form the skeleton of one-record-per-subject:
- Identity:
STUDYID,USUBJID,SUBJID,SITEID - Demographics:
AGE,AGEU,SEX,RACE,ETHNIC - Arm:
ARM(planned treatment arm) - Reference dates:
RFSTDTC,RFENDTC
Derived in this step:
RFENDT-- reference end date converted to SAS numeric date (days since 1960-01-01)AGEGR1/AGEGR1N-- age grouped into<65,65-80,>80with numeric codesRACEN-- numeric race code (WHITE=1, BLACK=2, AMERICAN INDIAN=3, ASIAN=4)EDUCLVL-- education level merged from SC whereSCTESTCD='EDLEVEL'
Maps treatment arm information and computes exposure-related variables:
TRT01P/TRT01A-- planned and actual treatment (copied fromARM)TRT01PN/TRT01AN-- numeric dose codes (Placebo=0, Low Dose=54, High Dose=81)TRTSDT-- date of first exposure, taken from SV at Visit 3 (first dosing visit), converted to SAS dateTRTEDT-- date of last exposure from the last EX record; falls back to the DS disposition date if the EX date is missing and the subject discontinued after Visit 3TRTDUR-- treatment duration in days (TRTEDT - TRTSDT + 1)CUMDOSE-- cumulative dose with special logic for the high-dose arm:- Placebo and Low Dose:
TRT01PN * TRTDUR - High Dose: interval-based calculation (54 mg for interval 1, 81 mg for interval 2, 54 mg for interval 3) using visit dates from SV
- Placebo and Low Dose:
AVGDD-- average daily dose (CUMDOSE / TRTDUR)
Screen Failure subjects are excluded after this step.
Assigns Y/N flags that define analysis populations and study completion:
ITTFL(Intent-to-Treat) --Yif the subject is randomized to a valid arm (not Screen Failure or Not Assigned)SAFFL(Safety) --YifITTFL='Y'andTRTSDTis not missing (subject received treatment)EFFFL(Efficacy) --YifSAFFL='Y'and the subject has at least one post-baseline ADAS-Cog record and one CIBIC+ record in QS withVISITNUM > 3COMP8FL/COMP16FL/COMP24FL(Week 8/16/24 completers) --Yif the subject has the corresponding visit in SV andTRTEDT >= visit dateDISCONFL(Discontinued) --YifDCREASCDis not "Completed"DSRAEFL(Discontinued due to AE) --YifDCREASCDis "Adverse Event"DTHFL(Death) -- carried over from DM
Discontinuation reason (DCREASCD, DCDECOD) is derived from DS where DSCAT='DISPOSITION EVENT'.
Extracts baseline measurements from the VS domain:
WEIGHTBL-- weight at baseline (Visit 3)HEIGHTBL-- height at screening (Visit 1)BMIBL-- calculated asWEIGHTBL / (HEIGHTBL_m)^2BMIBLGR1-- BMI grouped into<25,25-<30,>=30
Computes disease-related dates and durations:
DISONSDT-- date of disease onset from MH whereMHCAT='PRIMARY DIAGNOSIS', converted to SAS dateVISIT1DT-- date of Visit 1 from SV, converted to SAS dateDURDIS-- disease duration in months ((VISIT1DT - DISONSDT + 1) / 30.44)DURDSGR1-- disease duration grouped as<12or>=12monthsVISNUMEN-- end-of-treatment visit number from DS (capped at 12 ifVISITNUM=13for protocol completed)
MMSETOT-- MMSE (Mini-Mental State Examination) total score, computed as the sum ofQSORRESfrom QS whereQSCAT='MINI-MENTAL STATE'
cd code
python adsl.pyThis reads the SDTM .xpt files, runs the full derivation pipeline, and writes adsl.csv and adsl.xpt to the output/ directory.
- Python 3.x
- pandas
- numpy
- pyreadstat
- CDISC sdtm-adam-pilot-project: https://github.com/cdisc-org/sdtm-adam-pilot-project