@rcs-lang/parser
Section titled “@rcs-lang/parser”ANTLR4-based grammar and parser for RCL (Rich Communication Language) that converts source code into Abstract Syntax Trees for further processing.
Installation
Section titled “Installation”bun add @rcs-lang/parserOverview
Section titled “Overview”This package provides the primary parser for RCL, built on ANTLR4 with support for:
- Python-like indentation - Significant whitespace handling
- Context-aware parsing - Proper scope and section handling
- Error recovery - Graceful handling of syntax errors
- Position tracking - Source locations for diagnostics
- Symbol extraction - Identifier and reference extraction
Quick Start
Section titled “Quick Start”import { AntlrRclParser } from '@rcs-lang/parser';
const parser = new AntlrRclParser();
// Initialize parser (required for ANTLR)await parser.initialize();
// Parse RCL source codeconst result = await parser.parse(` agent CoffeeShop displayName: "Coffee Shop Agent"
flow OrderFlow start: Welcome
on Welcome match @userInput "order coffee" -> ChooseSize "view menu" -> ShowMenu
messages Messages text Welcome "Hello! How can I help you today?"`);
if (result.success) { console.log('AST:', result.value.ast); console.log('Symbols:', result.value.symbols);} else { console.error('Parse error:', result.error);}Grammar Structure
Section titled “Grammar Structure”The RCL parser is built from two ANTLR4 grammar files:
Lexer Grammar (RclLexer.g4)
Section titled “Lexer Grammar (RclLexer.g4)”Defines tokenization rules for:
- Keywords -
agent,flow,messages,match, etc. - Literals - strings, numbers, booleans, atoms
- Identifiers - names and variables
- Operators -
->,:,..., etc. - Indentation - Python-like INDENT/DEDENT tokens
Parser Grammar (RclParser.g4)
Section titled “Parser Grammar (RclParser.g4)”Defines syntax rules for:
- Document structure - imports, sections, attributes
- Agent definitions - configuration and flows
- Flow definitions - states and transitions
- Message definitions - text, rich cards, carousels
- Value expressions - literals, variables, collections
Python-like Indentation
Section titled “Python-like Indentation”The parser handles significant whitespace through a custom lexer base class:
// RclLexerBase.ts provides indentation handlingclass RclLexerBase extends Lexer { // Generates INDENT/DEDENT tokens based on indentation levels // Maintains an indentation stack for proper nesting // Handles mixed tabs/spaces with error reporting}Example indentation handling:
agent MyAgent # Level 0 displayName: "..." # Level 1 - INDENT
flow Main # Level 1 start: Welcome # Level 2 - INDENT
on Welcome # Level 2 -> End # Level 3 - INDENT # Level 0 - DEDENT DEDENT DEDENTParser API
Section titled “Parser API”AntlrRclParser
Section titled “AntlrRclParser”Main parser class implementing the IParser interface:
class AntlrRclParser implements IParser { // Initialize parser (required for ANTLR setup) async initialize(): Promise<void>;
// Parse source code into AST async parse(source: string, fileName?: string): Promise<Result<ParseResult>>;
// Check if parser is initialized isInitialized(): boolean;}
interface ParseResult { ast: RclFile; // Root AST node symbols: SymbolTable; // Extracted symbols parseTree?: any; // Raw ANTLR parse tree (debug)}Symbol Extraction
Section titled “Symbol Extraction”The parser extracts symbols during parsing for language service features:
interface SymbolTable { agents: AgentSymbol[]; flows: FlowSymbol[]; messages: MessageSymbol[]; states: StateSymbol[]; variables: VariableSymbol[];}
interface AgentSymbol { name: string; displayName?: string; range: Range; flows: string[];}Error Handling
Section titled “Error Handling”The parser provides detailed error information:
// Parse errors include location informationconst result = await parser.parse(invalidSource);
if (!result.success) { const error = result.error; console.log(`Error at line ${error.line}, column ${error.column}`); console.log(`Message: ${error.message}`); console.log(`Context: ${error.context}`);}Common error types:
- Syntax errors - Invalid tokens or grammar violations
- Indentation errors - Inconsistent or invalid indentation
- Context errors - Invalid section nesting or structure
Integration with Compilation Pipeline
Section titled “Integration with Compilation Pipeline”The parser integrates seamlessly with the compilation pipeline:
import { RCLCompiler } from '@rcs-lang/compiler';import { AntlrRclParser } from '@rcs-lang/parser';
const compiler = new RCLCompiler({ parser: new AntlrRclParser()});
const result = await compiler.compile(source);Build Requirements
Section titled “Build Requirements”The parser requires Java to generate TypeScript files from ANTLR grammar:
Prerequisites
Section titled “Prerequisites”- Java 17 or later must be installed
- Run
./install-java.shfor installation instructions
Build Process
Section titled “Build Process”# Install dependenciesbun install
# Generate parser and build TypeScriptbun run buildThis process:
- Generates TypeScript parser files from ANTLR grammar
- Fixes import paths in generated files
- Compiles TypeScript to JavaScript
Generated Files
Section titled “Generated Files”The src/generated/ directory contains ANTLR-generated files:
RclLexer.ts- Tokenizer implementationRclParser.ts- Parser implementationRclParserListener.ts- Parse tree listener interfaceRclParserVisitor.ts- Parse tree visitor interface
⚠️ Do not edit generated files manually - they will be overwritten on rebuild.
Development
Section titled “Development”Testing Grammar Changes
Section titled “Testing Grammar Changes”# Test the parserbun test
# Trace token generationnode trace-tokens.js input.rcl
# Trace grammar rulesnode trace-comment.js input.rclGrammar Debugging
Section titled “Grammar Debugging”Use ANTLR tools for grammar development:
# Visualize parse tree (requires ANTLRWorks)antlr4-parse RclParser rclFile -gui input.rcl
# Generate parse tree textantlr4-parse RclParser rclFile -tree input.rclPerformance Considerations
Section titled “Performance Considerations”- Initialization cost - Call
initialize()once per parser instance - Memory usage - Parser instances hold ANTLR runtime state
- Parse speed - ~1000 lines/second for typical RCL files
- Error recovery - Performance degrades with many syntax errors