Skip to content

amine-kherroubi/lexy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

109 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lexy

C++ Platform License

Lexy is a lexical analyzer generator that compiles regular expression specifications (.lexy files) into efficient, table-driven C++ scanners.

Features

  • Standardized Build System: Uses CMake for cross-platform portability.
  • Professional CLI: Flexible input/output configuration via command-line arguments.
  • Automata Visualization: Optional generation of visual representations (Graphviz) of NFA and DFA construction stages.

Supported Platforms

Lexy is only supported on Linux.

Prerequisites

  • C++20 compatible compiler (e.g., g++ or clang++)
  • cmake (>= 3.10)
  • Graphviz (dot command, optional for visualization)

Installation

From Source

cmake -B build
cmake --build build
sudo cmake --install build  # Installs the 'lexy' binary to /usr/local/bin

Alternatively, you can run the lexy executable directly from the build directory.

Usage

# Using installed binary
lexy <input_spec.lexy> [options]

# Using local build
./build/lexy <input_spec.lexy> [options]

Options:

  • <input_spec.lexy>: Input specification file (required). Note: Special regex characters (e.g., +, {, }, (, ), !, ->) must be escaped with a backslash (e.g., "\+") in the specification file.
  • -o <dir>: Output directory for scanner code and visualizations (default: ./output).
  • -g: Enable automata graph generation (disabled by default).
  • -h: Show help message.

Example

You can test lexy using the provided sample files in the example folder.

Input Specification (example/sample_scanner.lexy):

FN         ::= "fn"
LET        ::= "let"
MUT        ::= "mut"
PRINT      ::= "println\!"
I32        ::= "i32"
RETURN     ::= "return"
ARROW      ::= "\->"
IDENTIFIER ::= "[a-zA-Z_][a-zA-Z0-9_]*"
INTEGER    ::= "[0-9]+"
ASSIGN     ::= "="
SEMICOLON  ::= ";"
LBRACE     ::= "\{"
RBRACE     ::= "\}"
LPAREN     ::= "\("
RPAREN     ::= "\)"
COLON      ::= ":"
PLUS       ::= "\+"
WHITESPACE ::= "[ \t\n]+"

Input Program (example/sample_program.rs):

fn add(mut x: i32) -> i32 {
let y = 10;
println!("Result");
return x + y;
}

Example Usage (Integrating the Scanner)

After generating the scanner (output/scanners/sample_scanner.cpp), you can integrate it into your own C++ project.

Here is a simple example program (test.cpp) to test the scanner:

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include "output/scanners/sample_scanner.cpp"

int main() {
    // Read the input program file
std::ifstream file("example/sample_program.rs");
if (!file.is_open()) {
std::cerr << "Failed to open input file." << std::endl;
return 1;
    }
std::stringstream buffer;
    buffer << file.rdbuf();
std::string content = buffer.str();

    // Initialize scanner
    Scanner scanner(content.c_str());

    // Scan and print tokens
    Token token;
while ((token = scanner.getNextToken()).type != -1) {
if (token.type == -2) {
std::cout << "Unknown token: " << token.lexeme << std::endl;
        } else {
std::cout << "Token: " << TOKEN_NAMES[token.type]
<< " | Lexeme: '" << token.lexeme << "'" << std::endl;
        }
    }

return 0;
}

Compiling and Running

To compile your test program along with the generated scanner:

# Compile the test program
g++ -std=c++20 test.cpp -o scanner_tester

# Run the tester
./scanner_tester

References

  • Aho, Sethi, Ullman - Compilers: Principles, Techniques, and Tools (Dragon Book)
  • Cooper & Torczon - Engineering a Compiler
  • Hopcroft, Motwani, Ullman - Introduction to Automata Theory

About

A C++ table-driven lexical analyzer generator. Compiles regex specifications into scanners with maximal munch tokenization.

Topics

Resources

License

Stars

Watchers

Forks

Contributors