A Gleam implementation of a C compiler based on the book "Writing a C Compiler" by Nora Sandler. This implementation tries to be very close to the OCaml implementation. This compiler translates a subset of C to x86-64 assembly for Linux and macOS platforms.
- Lexical Analysis: Tokenizes C source code into identifiers, constants, keywords, and punctuation
- Syntax Analysis: Builds Abstract Syntax Trees (AST) from token streams
- Code Generation: Converts AST to x86-64 assembly instructions
- Assembly Emission: Outputs platform-specific assembly code for Linux and macOS
- Multi-stage Compilation: Stop at any compilation stage for debugging
- Cross-platform: Supports both Linux and macOS targets
This implementation currently supports a minimal but complete subset of C:
- Function definitions with
intreturn type voidparameter lists:int main(void)- Return statements with integer constants
- Integer literals (positive and negative)
- Basic C keywords:
int,return,void - Standard C punctuation:
(),{},;
- Variables and expressions
- Function parameters
- Control flow (
if,while,for) - Arithmetic operations
- Multiple functions
- Standard library functions
int main(void) {
return 42;
}nqcc_gleam/
βββ src/ # Source code
β βββ nqcc.gleam # Main entry point
β βββ cli.gleam # Command-line interface
β βββ compiler.gleam # Main compilation pipeline
β βββ lexer.gleam # Lexical analysis (source β tokens)
β βββ parser.gleam # Syntax analysis (tokens β AST)
β βββ codegen.gleam # Code generation (AST β assembly)
β βββ emitter.gleam # Assembly emission (assembly β file)
β βββ tokens.gleam # Token type definitions
β βββ ast.gleam # Abstract Syntax Tree types
β βββ assembly.gleam # Assembly instruction types
β βββ settings.gleam # Configuration and platform types
β βββ utils.gleam # Utility functions
βββ test/ # Comprehensive test suite
β βββ lexer_test.gleam # Lexer unit tests
β βββ parser_test.gleam # Parser unit tests
β βββ codegen_test.gleam # Code generator unit tests
β βββ emitter_test.gleam # Assembly emitter unit tests
β βββ compiler_test.gleam # Integration tests
β βββ nqcc_test.gleam # Test runner
β βββ README.md # Test documentation
βββ sample/ # Example programs
β βββ test_program.c # Simple test program
βββ .github/ # GitHub workflows
βββ build/ # Build artifacts
βββ gleam.toml # Project configuration
βββ README.md # This file
- Gleam (latest version)
- Erlang/OTP (version 24+)
- GCC or Clang (for preprocessing, assembling, and linking)
# Clone the repository
git clone <repository-url>
cd nqcc_gleam
# Install dependencies
gleam deps download
# Build the project
gleam build
# Run tests to verify installation
gleam test# Create escript executable using gleescript
gleam run -m gleescript
# Or export as Erlang shipment for distribution
gleam export erlang-shipment
# Run the standalone executable (after gleam run -m gleescript)
./nqcc --help
# Or run from erlang-shipment
cd build/erlang-shipment
./entrypoint.sh run --helpMethod 1: Escript (gleam run -m gleescript)
- Creates a single
nqccexecutable file in the project root - Requires Erlang on target system
- Direct executable:
./nqcc hello.c - Smaller footprint, easier to distribute
Method 2: Erlang Shipment (gleam export erlang-shipment)
- Creates a
build/erlang-shipmentdirectory with runtime - Requires Erlang on target system
- Run via entrypoint:
./entrypoint.sh run hello.c - Full runtime package, better for complex deployments
# Compile and create executable
gleam run hello.c
# This creates hello.s (assembly) and hello (executable)gleam run [OPTIONS] <source-file.c>--lex- Run lexer only (tokenization)--parse- Run lexer and parser only (AST generation)--codegen- Run through code generation (stop before assembly emission)-s- Generate assembly file only (don't create executable)- (no flag) - Complete compilation to executable
--target linux- Generate Linux-compatible assembly (default: osx)--target osx- Generate macOS-compatible assembly
-d- Debug mode (preserve intermediate files)
--clean- Clean all intermediate files (.s, .o, executable)--clean-ast- Clean AST output file (ast_output.txt)--clean-test-ast- Clean test AST output file (ast_output_test.txt)
--help- Show usage information
# Create a simple C program
echo 'int main(void) { return 42; }' > hello.c
# Compile to executable
gleam run hello.c
# Run the program
./hello
echo $? # Prints: 42# Lexical analysis only
gleam run --lex hello.c
# Parse to AST only
gleam run --parse hello.c
# Generate assembly only
gleam run -s hello.c
cat hello.s # View generated assembly# Generate Linux assembly on macOS
gleam run --target linux -s hello.c
# Generate macOS assembly
gleam run --target osx -s hello.c# Keep all intermediate files
gleam run -d hello.c
ls hello.* # Shows: hello.c, hello.i, hello.s, helloGenerate AST files:
# Create ast_output.txt (AST from parsing source file)
gleam run -- --parse sample/test_program.c
# Create ast_output_test.txt (AST from running tests)
gleam testClean AST files:
# Clean only ast_output.txt
gleam run -- --clean-ast sample/test_program.c
# Clean only ast_output_test.txt
gleam run -- --clean-test-ast sample/test_program.c
# Clean all intermediate files (.s, .o, executable) - does NOT clean AST files
gleam run -- --clean sample/test_program.cCheck AST files:
# See what AST files exist
ls -la ast_output*.txtThe compiler follows a traditional multi-stage compilation pipeline:
Source Code (.c)
β
Preprocessing (.i) # GCC preprocessor
β
Lexical Analysis # Tokenization
β
Syntax Analysis # AST generation
β
Code Generation # Assembly generation
β
Assembly Emission (.s) # Platform-specific output
β
Assembly & Linking # GCC assembler/linker
β
Executable
- Preprocessing: Uses GCC to handle
#includeand macros - Lexical Analysis: Converts source text into tokens
- Syntax Analysis: Builds Abstract Syntax Tree from tokens
- Code Generation: Converts AST to abstract assembly
- Assembly Emission: Generates platform-specific assembly text
- Assembly & Linking: Uses GCC to create executable
gleam testThe project includes comprehensive unit tests with 128 test cases covering:
- Lexer Tests: Token generation, whitespace handling, error cases
- Parser Tests: AST construction, syntax error detection
- Codegen Tests: Assembly instruction generation
- Emitter Tests: Platform-specific assembly output
- Integration Tests: End-to-end compilation pipeline
See test/README.md for detailed test documentation.
- Function returning another constant
- Builds an AST that represents a program with a function foo returning the constant value 100.
- Uses ast_printer to write that AST into a file named ast_output_constant.txt.
- Reads the file and compares its content with the expected string:
Function: foo
Return:
Constant: 100- Empty function (no return)
- Builds an AST that represents a program with a function empty that has no return value (NoOp).
- Uses ast_printer to write that AST into a file named ast_output_empty.txt.
- Reads the file and compares its content with the expected string:
Function: empty
NoOp- Function with multiple instructions
- Builds an AST that represents a program with a function multi containing a block of two return statements: one returning 1 and another returning 2.
- Uses ast_printer to write that AST into a file named ast_output_multi.txt.
- Reads the file and compares its content with the expected string:
Function: multi
Block:
Return:
Constant: 1
Return:
Constant: 2- Function returning an expression
- Builds an AST that represents a program with a function
exprreturning the expression5 + 7. - Uses ast_printer to write that AST into a file named ast_output_expr.txt.
- Reads the file and compares its content with the expected string:
- Builds an AST that represents a program with a function
Function: expr
Block:
Return:
BinaryOp: +
Constant: 5
Constant: 7- Program with multiple functions
- Builds an AST that represents a program with two functions:
firstreturning 10 andsecondreturning 20. - Uses ast_printer to write that AST into a file named ast_output_funcs.txt.
- Reads the file and compares its content with the expected string:
- Builds an AST that represents a program with two functions:
Function: first
Block:
Return:
Constant: 10
Function: second
Return:
Constant: 20-
Lexer (
lexer.gleam)- Converts source code strings to token lists
- Handles whitespace, keywords, identifiers, and literals
- Regex-based pattern matching with longest-match disambiguation
-
Parser (
parser.gleam)- Converts token streams to Abstract Syntax Trees
- Recursive descent parser for C grammar subset
- Comprehensive error reporting for syntax issues
-
Code Generator (
codegen.gleam)- Transforms AST to abstract assembly instructions
- Platform-independent instruction generation
- Simple instruction sequences (MOV + RET for returns)
-
Emitter (
emitter.gleam)- Converts abstract assembly to platform-specific text
- Handles Linux vs macOS assembly syntax differences
- Generates GNU assembler compatible output
-
Compiler (
compiler.gleam)- Orchestrates the entire compilation pipeline
- Manages intermediate files and cleanup
- Integrates with external tools (GCC for preprocessing/linking)
The compiler uses Gleam's strong type system to ensure correctness:
- Token: Lexical units (keywords, identifiers, literals, punctuation)
- AST: Abstract syntax tree nodes (programs, functions, statements, expressions)
- Assembly: Abstract assembly instructions (MOV, RET with operands)
- Settings: Compilation configuration (stages, platforms)
- Language Features: Extend AST types and update parser
- Instructions: Add new assembly instruction types
- Platforms: Extend emitter for new target platforms
- Optimizations: Modify code generator for better output
- Follow Gleam conventions and formatting
- Use descriptive variable names and comprehensive documentation
- Include unit tests for all new functionality
- Maintain separation of concerns between compilation stages
Use the debug flag and stage flags for troubleshooting:
# Debug lexer issues
gleam run --lex -d problematic.c
# Debug parser issues
gleam run --parse -d problematic.c
# Inspect generated assembly
gleam run -s -d working.c
cat working.s- Book: "Writing a C Compiler" by Nora Sandler
- Language: Gleam Language Guide
- Platform: Erlang/OTP Documentation
- Fork the repository
- Create a feature branch
- Add comprehensive tests for new functionality
- Ensure all tests pass:
gleam test - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Nora Sandler for the excellent "Writing a C Compiler" book
- The Gleam community for the fantastic language and ecosystem
- Contributors to the Gleam standard library and tooling