Lexy is a lexical analyzer generator that compiles regular expression specifications (.lexy files) into efficient, table-driven C++ scanners.
- Standardized Build System: Uses CMake for cross-platform portability.
- Professional CLI: Flexible input/output configuration via command-line arguments.
- Automata Visualization: Optional generation of visual representations (Graphviz) of NFA and DFA construction stages.
Lexy is only supported on Linux.
- C++20 compatible compiler (e.g.,
g++orclang++) cmake(>= 3.10)- Graphviz (
dotcommand, optional for visualization)
cmake -B build
cmake --build build
sudo cmake --install build # Installs the 'lexy' binary to /usr/local/binAlternatively, you can run the lexy executable directly from the build directory.
# Using installed binary
lexy <input_spec.lexy> [options]
# Using local build
./build/lexy <input_spec.lexy> [options]Options:
<input_spec.lexy>: Input specification file (required). Note: Special regex characters (e.g.,+,{,},(,),!,->) must be escaped with a backslash (e.g.,"\+") in the specification file.-o <dir>: Output directory for scanner code and visualizations (default:./output).-g: Enable automata graph generation (disabled by default).-h: Show help message.
You can test lexy using the provided sample files in the example folder.
Input Specification (example/sample_scanner.lexy):
FN ::= "fn"
LET ::= "let"
MUT ::= "mut"
PRINT ::= "println\!"
I32 ::= "i32"
RETURN ::= "return"
ARROW ::= "\->"
IDENTIFIER ::= "[a-zA-Z_][a-zA-Z0-9_]*"
INTEGER ::= "[0-9]+"
ASSIGN ::= "="
SEMICOLON ::= ";"
LBRACE ::= "\{"
RBRACE ::= "\}"
LPAREN ::= "\("
RPAREN ::= "\)"
COLON ::= ":"
PLUS ::= "\+"
WHITESPACE ::= "[ \t\n]+"
Input Program (example/sample_program.rs):
fn add(mut x: i32) -> i32 {
let y = 10;
println!("Result");
return x + y;
}After generating the scanner (output/scanners/sample_scanner.cpp), you can integrate it into your own C++ project.
Here is a simple example program (test.cpp) to test the scanner:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include "output/scanners/sample_scanner.cpp"
int main() {
// Read the input program file
std::ifstream file("example/sample_program.rs");
if (!file.is_open()) {
std::cerr << "Failed to open input file." << std::endl;
return 1;
}
std::stringstream buffer;
buffer << file.rdbuf();
std::string content = buffer.str();
// Initialize scanner
Scanner scanner(content.c_str());
// Scan and print tokens
Token token;
while ((token = scanner.getNextToken()).type != -1) {
if (token.type == -2) {
std::cout << "Unknown token: " << token.lexeme << std::endl;
} else {
std::cout << "Token: " << TOKEN_NAMES[token.type]
<< " | Lexeme: '" << token.lexeme << "'" << std::endl;
}
}
return 0;
}To compile your test program along with the generated scanner:
# Compile the test program
g++ -std=c++20 test.cpp -o scanner_tester
# Run the tester
./scanner_tester- Aho, Sethi, Ullman - Compilers: Principles, Techniques, and Tools (Dragon Book)
- Cooper & Torczon - Engineering a Compiler
- Hopcroft, Motwani, Ullman - Introduction to Automata Theory