Skip to content

MicheDFresa/final

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NQCC - A C Compiler in Gleam

Gleam Assembly

A Gleam implementation of a C compiler based on the book "Writing a C Compiler" by Nora Sandler. This implementation tries to be very close to the OCaml implementation. This compiler translates a subset of C to x86-64 assembly for Linux and macOS platforms.

πŸš€ Features

  • Lexical Analysis: Tokenizes C source code into identifiers, constants, keywords, and punctuation
  • Syntax Analysis: Builds Abstract Syntax Trees (AST) from token streams
  • Code Generation: Converts AST to x86-64 assembly instructions
  • Assembly Emission: Outputs platform-specific assembly code for Linux and macOS
  • Multi-stage Compilation: Stop at any compilation stage for debugging
  • Cross-platform: Supports both Linux and macOS targets

πŸ“‹ Current Language Support

This implementation currently supports a minimal but complete subset of C:

βœ… Supported Features

  • Function definitions with int return type
  • void parameter lists: int main(void)
  • Return statements with integer constants
  • Integer literals (positive and negative)
  • Basic C keywords: int, return, void
  • Standard C punctuation: (), {}, ;

❌ Not Yet Supported

  • Variables and expressions
  • Function parameters
  • Control flow (if, while, for)
  • Arithmetic operations
  • Multiple functions
  • Standard library functions

πŸ“ Example Program

int main(void) {
    return 42;
}

πŸ“ Project Structure

nqcc_gleam/
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ nqcc.gleam         # Main entry point
β”‚   β”œβ”€β”€ cli.gleam          # Command-line interface
β”‚   β”œβ”€β”€ compiler.gleam     # Main compilation pipeline
β”‚   β”œβ”€β”€ lexer.gleam        # Lexical analysis (source β†’ tokens)
β”‚   β”œβ”€β”€ parser.gleam       # Syntax analysis (tokens β†’ AST)
β”‚   β”œβ”€β”€ codegen.gleam      # Code generation (AST β†’ assembly)
β”‚   β”œβ”€β”€ emitter.gleam      # Assembly emission (assembly β†’ file)
β”‚   β”œβ”€β”€ tokens.gleam       # Token type definitions
β”‚   β”œβ”€β”€ ast.gleam          # Abstract Syntax Tree types
β”‚   β”œβ”€β”€ assembly.gleam     # Assembly instruction types
β”‚   β”œβ”€β”€ settings.gleam     # Configuration and platform types
β”‚   └── utils.gleam        # Utility functions
β”œβ”€β”€ test/                   # Comprehensive test suite
β”‚   β”œβ”€β”€ lexer_test.gleam   # Lexer unit tests
β”‚   β”œβ”€β”€ parser_test.gleam  # Parser unit tests
β”‚   β”œβ”€β”€ codegen_test.gleam # Code generator unit tests
β”‚   β”œβ”€β”€ emitter_test.gleam # Assembly emitter unit tests
β”‚   β”œβ”€β”€ compiler_test.gleam # Integration tests
β”‚   β”œβ”€β”€ nqcc_test.gleam    # Test runner
β”‚   └── README.md          # Test documentation
β”œβ”€β”€ sample/                 # Example programs
β”‚   └── test_program.c     # Simple test program
β”œβ”€β”€ .github/               # GitHub workflows
β”œβ”€β”€ build/                 # Build artifacts
β”œβ”€β”€ gleam.toml            # Project configuration
└── README.md             # This file

πŸ› οΈ Installation

Prerequisites

  • Gleam (latest version)
  • Erlang/OTP (version 24+)
  • GCC or Clang (for preprocessing, assembling, and linking)

Setup

# Clone the repository
git clone <repository-url>
cd nqcc_gleam

# Install dependencies
gleam deps download

# Build the project
gleam build

# Run tests to verify installation
gleam test

Creating Standalone Executable

# Create escript executable using gleescript
gleam run -m gleescript

# Or export as Erlang shipment for distribution
gleam export erlang-shipment

# Run the standalone executable (after gleam run -m gleescript)
./nqcc --help

# Or run from erlang-shipment
cd build/erlang-shipment
./entrypoint.sh run --help

Standalone Executable Methods

Method 1: Escript (gleam run -m gleescript)

  • Creates a single nqcc executable file in the project root
  • Requires Erlang on target system
  • Direct executable: ./nqcc hello.c
  • Smaller footprint, easier to distribute

Method 2: Erlang Shipment (gleam export erlang-shipment)

  • Creates a build/erlang-shipment directory with runtime
  • Requires Erlang on target system
  • Run via entrypoint: ./entrypoint.sh run hello.c
  • Full runtime package, better for complex deployments

🎯 Usage

Basic Compilation

# Compile and create executable
gleam run hello.c

# This creates hello.s (assembly) and hello (executable)

Command-Line Options

gleam run [OPTIONS] <source-file.c>

Compilation Stage Flags

  • --lex - Run lexer only (tokenization)
  • --parse - Run lexer and parser only (AST generation)
  • --codegen - Run through code generation (stop before assembly emission)
  • -s - Generate assembly file only (don't create executable)
  • (no flag) - Complete compilation to executable

Platform Flags

  • --target linux - Generate Linux-compatible assembly (default: osx)
  • --target osx - Generate macOS-compatible assembly

Debug Flags

  • -d - Debug mode (preserve intermediate files)

Cleanup Flags

  • --clean - Clean all intermediate files (.s, .o, executable)
  • --clean-ast - Clean AST output file (ast_output.txt)
  • --clean-test-ast - Clean test AST output file (ast_output_test.txt)

Help

  • --help - Show usage information

πŸ“š Examples

1. Complete Compilation

# Create a simple C program
echo 'int main(void) { return 42; }' > hello.c

# Compile to executable
gleam run hello.c

# Run the program
./hello
echo $?  # Prints: 42

2. Stop at Different Stages

# Lexical analysis only
gleam run --lex hello.c

# Parse to AST only
gleam run --parse hello.c

# Generate assembly only
gleam run -s hello.c
cat hello.s  # View generated assembly

3. Cross-platform Compilation

# Generate Linux assembly on macOS
gleam run --target linux -s hello.c

# Generate macOS assembly
gleam run --target osx -s hello.c

4. Debug Mode

# Keep all intermediate files
gleam run -d hello.c
ls hello.*  # Shows: hello.c, hello.i, hello.s, hello

5. AST Generation and Cleanup

Generate AST files:

# Create ast_output.txt (AST from parsing source file)
gleam run -- --parse sample/test_program.c

# Create ast_output_test.txt (AST from running tests)
gleam test

Clean AST files:

# Clean only ast_output.txt
gleam run -- --clean-ast sample/test_program.c

# Clean only ast_output_test.txt  
gleam run -- --clean-test-ast sample/test_program.c

# Clean all intermediate files (.s, .o, executable) - does NOT clean AST files
gleam run -- --clean sample/test_program.c

Check AST files:

# See what AST files exist
ls -la ast_output*.txt

πŸ”„ Compilation Pipeline

The compiler follows a traditional multi-stage compilation pipeline:

Source Code (.c)
       ↓
   Preprocessing (.i)     # GCC preprocessor
       ↓
   Lexical Analysis       # Tokenization
       ↓
   Syntax Analysis        # AST generation
       ↓
   Code Generation        # Assembly generation
       ↓
   Assembly Emission (.s) # Platform-specific output
       ↓
   Assembly & Linking     # GCC assembler/linker
       ↓
   Executable

Stage Details

  1. Preprocessing: Uses GCC to handle #include and macros
  2. Lexical Analysis: Converts source text into tokens
  3. Syntax Analysis: Builds Abstract Syntax Tree from tokens
  4. Code Generation: Converts AST to abstract assembly
  5. Assembly Emission: Generates platform-specific assembly text
  6. Assembly & Linking: Uses GCC to create executable

πŸ§ͺ Testing

Run All Tests

gleam test

Test Coverage

The project includes comprehensive unit tests with 128 test cases covering:

  • Lexer Tests: Token generation, whitespace handling, error cases
  • Parser Tests: AST construction, syntax error detection
  • Codegen Tests: Assembly instruction generation
  • Emitter Tests: Platform-specific assembly output
  • Integration Tests: End-to-end compilation pipeline

Test Documentation

See test/README.md for detailed test documentation.

Tests added for the AST

  1. Function returning another constant
    • Builds an AST that represents a program with a function foo returning the constant value 100.
    • Uses ast_printer to write that AST into a file named ast_output_constant.txt.
    • Reads the file and compares its content with the expected string:
   Function: foo
    Return:
      Constant: 100
  1. Empty function (no return)
    • Builds an AST that represents a program with a function empty that has no return value (NoOp).
    • Uses ast_printer to write that AST into a file named ast_output_empty.txt.
    • Reads the file and compares its content with the expected string:
   Function: empty
     NoOp
  1. Function with multiple instructions
    • Builds an AST that represents a program with a function multi containing a block of two return statements: one returning 1 and another returning 2.
    • Uses ast_printer to write that AST into a file named ast_output_multi.txt.
    • Reads the file and compares its content with the expected string:
   Function: multi
    Block:
     Return:
      Constant: 1
     Return:
      Constant: 2
  1. Function returning an expression
    • Builds an AST that represents a program with a function expr returning the expression 5 + 7.
    • Uses ast_printer to write that AST into a file named ast_output_expr.txt.
    • Reads the file and compares its content with the expected string:
   Function: expr
    Block:
   Return:
     BinaryOp: +
       Constant: 5
       Constant: 7
  1. Program with multiple functions
    • Builds an AST that represents a program with two functions: first returning 10 and second returning 20.
    • Uses ast_printer to write that AST into a file named ast_output_funcs.txt.
    • Reads the file and compares its content with the expected string:
   Function: first
    Block:
    Return:
     Constant: 10
   Function: second
    Return:
     Constant: 20

πŸ—οΈ Architecture

Core Components

  1. Lexer (lexer.gleam)

    • Converts source code strings to token lists
    • Handles whitespace, keywords, identifiers, and literals
    • Regex-based pattern matching with longest-match disambiguation
  2. Parser (parser.gleam)

    • Converts token streams to Abstract Syntax Trees
    • Recursive descent parser for C grammar subset
    • Comprehensive error reporting for syntax issues
  3. Code Generator (codegen.gleam)

    • Transforms AST to abstract assembly instructions
    • Platform-independent instruction generation
    • Simple instruction sequences (MOV + RET for returns)
  4. Emitter (emitter.gleam)

    • Converts abstract assembly to platform-specific text
    • Handles Linux vs macOS assembly syntax differences
    • Generates GNU assembler compatible output
  5. Compiler (compiler.gleam)

    • Orchestrates the entire compilation pipeline
    • Manages intermediate files and cleanup
    • Integrates with external tools (GCC for preprocessing/linking)

Type System

The compiler uses Gleam's strong type system to ensure correctness:

  • Token: Lexical units (keywords, identifiers, literals, punctuation)
  • AST: Abstract syntax tree nodes (programs, functions, statements, expressions)
  • Assembly: Abstract assembly instructions (MOV, RET with operands)
  • Settings: Compilation configuration (stages, platforms)

🚧 Development

Adding New Features

  1. Language Features: Extend AST types and update parser
  2. Instructions: Add new assembly instruction types
  3. Platforms: Extend emitter for new target platforms
  4. Optimizations: Modify code generator for better output

Code Style

  • Follow Gleam conventions and formatting
  • Use descriptive variable names and comprehensive documentation
  • Include unit tests for all new functionality
  • Maintain separation of concerns between compilation stages

Debugging

Use the debug flag and stage flags for troubleshooting:

# Debug lexer issues
gleam run --lex -d problematic.c

# Debug parser issues
gleam run --parse -d problematic.c

# Inspect generated assembly
gleam run -s -d working.c
cat working.s

πŸ“– References

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add comprehensive tests for new functionality
  4. Ensure all tests pass: gleam test
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Nora Sandler for the excellent "Writing a C Compiler" book
  • The Gleam community for the fantastic language and ecosystem
  • Contributors to the Gleam standard library and tooling

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors