summaryrefslogtreecommitdiffhomepage
path: root/lexer.c
AgeCommit message (Collapse)Author
2021-07-09lexer: rename UT_ prefixed constants to UC_Jo-Philipp Wich
This is a cosmetic change to bring the code in line with the common prefix format of the other code in the tree. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-06-29lexer: transition into EOF state on unrecognized characterJo-Philipp Wich
The compiler will keep fetching tokens until hitting EOF, so ensure that the lexer produces EOF after an unrecognized character error. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25lexer: implement raw code modeJo-Philipp Wich
Enabling raw code mode allows writing ucode scripts without any template tag decorations (that is, without the need to provide an initial opening '{%' tag). Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25lexer: drop value union from keyword tableJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25lexer, compiler: separate TK_BOOL token into TK_TRUE and TK_FALSE tokensJo-Philipp Wich
The token type split allows us to drop the token value union in the reserved word list with a subsequent commit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25syntax: drop Infinity and NaN keywordsJo-Philipp Wich
Turn the Infinity and NaN keywords into global properties. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18syntax: introduce `const` supportJo-Philipp Wich
Introduce support for declaring constant variables through the `const` keyword. Variables declared with `const` follow the same scoping rules as `let` declared ones. In contrast to normal variables, `const` ones may not be assigned to after their declaration. Any attempt to do so will result in a syntax error during compilation. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18compiler, lexer: add NO_LEGACY define to disable legacy syntax featuresJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18syntax: implement `delete` as proper operatorJo-Philipp Wich
Turn `delete` into a proper operator mimicking ECMAScript semantics. Also ensure to transparently turn deprecated `delete(obj, propname)` function calls into `delete obj.propname` expressions during compilation. When strict mode is active, legacy delete() calls throw a syntax error instead. Finally drop the `delete()` function from the stdlib as it is shadowed by the delete operator syntax now. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-14lexer: skip interpreter line in any source bufferJo-Philipp Wich
Skip interpreter lines in any source buffer and handle the skipping in the lexer itself, to avoid reporting wrongly shifted token offsets to the compiler, resulting in wrong error locations and source contexts. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29lexer: fix infinite loop on parsing unterminated commentsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29lexer: fix infinite loop on parsing unterminated expression blocksJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29lexer: fix infinite loop when parsing regexp literal at EOFJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29compiler, lexer: improve lexical state handlingJo-Philipp Wich
- Instead of disambiguating division operator vs. regexp literal by looking at the preceeding token, raise a "no regexp" flag within the appropriate parser states to tell the lexer how to treat a forward slash when parsing the next token - Introduce another "no keyword" flag which disables parsing labels into keywords when reading the next token and set it in the appropriate parser states. This allows using reserved names in object declarations and property access expressions Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-27treewide: ISO C / pedantic complianceJo-Philipp Wich
- Shuffle typedefs to avoid need for non-compliant forward declarations - Fix non-compliant empty struct initializers - Remove use of braced expressions - Remove use of anonymous unions - Avoid `void *` pointer arithmetic - Fix several warnings reported by gcc -pedantic mode and clang 11 compilation Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-25treewide: rework internal data type systemJo-Philipp Wich
Instead of relying on json_object values internally, use custom types to represent the different ucode value types which brings a number of advantages compared to the previous approach: - Due to the use of tagged pointers, small integer, string and bool values can be stored directly in the pointer addresses, vastly reducing required heap memory - Ability to create circular data structures such as `let o; o = { test: o };` - Ability to register custom `tostring()` function through prototypes - Initial mark/sweep GC implementation to tear down circular object graphs on VM deinit The change also paves the way for possible future extensions such as constant variables and meta methods for custom ressource types. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-24treewide: fix issues reported by clang code analyzerJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-23lexer: fix incomplete struct initializersPetr Štetiar
Fixes bunch of following warnings: lexer.c:68:37: warning: missing field 'parse' initializer [-Wmissing-field-initializers] lexer.c:138:34: warning: missing field '' initializer [-Wmissing-field-initializers] Signed-off-by: Petr Štetiar <ynezz@true.cz>
2021-03-11lexer: fix infinite loop in lineinfo encoding when consuming large chunksJo-Philipp Wich
A logic flaw in the lineinfo encoding function led to an infinite tight loop when a buffer chunk with 128 byte or more got consumed, which may happen when parsing very long literals. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-11lexer: properly handle string escape sequences at buffer boundaryJo-Philipp Wich
While parsing string literals, actually consume the backslash introducing an escape sequence to prevent it from ending up in the produced string if the scanner is at the end of the buffer and the remaining buffer contents are flushed after the consumer loop. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-26lexer: improvementsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-17treewide: rewrite ucode interpreterJo-Philipp Wich
Replace the former AST walking interpreter implementation with a single pass bytecode compiler and a corresponding virtual machine. The rewrite lays the groundwork for a couple of improvements with will be subsequently implemented: - Ability to precompile ucode sources into binary byte code - Strippable debug information - Reduced runtime memory usage Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-12-06treewide: prevent stale pointer access in opcode handlersJo-Philipp Wich
Instead of obtaining and caching direct opcode pointers, use relative references when dealing with opcodes since direct or indirect calls to uc_execute_op() might lead to reallocations of the opcode array, shifting memory addresses and invalidating pointers taken before the invocation. Such stale pointer accesses could be commonly triggered when one part of the processed expression was a require() or include() call loading relatively large ucode sources. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30syntax: fix quirks when parsing octal sequencesJo-Philipp Wich
- Eliminate dead code left after regex literal parsing changes - Properly handle short octal sequences at end of string Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30syntax: recognize single-char escapes in regex literals againJo-Philipp Wich
Ensure that the single char escapes `\a`, `\b`, `\e`, `\f`, `\n`, `\r`, `\t` and `\v` keep working. Since they're not part of the POSIX extended regular expression spec, they're not handled by the RE engine so we need to substitute them by their actual byte value while parsing the literal. Fixes: ac5cb87 ("syntax: fix string and regex literal parsing quirks") Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30syntax: fix string and regex literal parsing quirksJo-Philipp Wich
- Do not interprete escape sequences in regexp literals - Do not improperly substitute control escape sequences such as `\n` or `\a` after a backslash Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-19treewide: rebrand to ucodeJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-15lexer: improve scanner performanceJo-Philipp Wich
Optimize the strncmp() based token lookup with an integer comparison approach which roughly cuts the time of the source code parsing phase in half. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-10lexer: accept "let" as synonym for "local"Jo-Philipp Wich
This brings the utpl script syntax closer to ES5/ES6 and allows to use existing syntax highlightings in IDEs and editors. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-05syntax: implement ES6-like arrow function syntaxJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-03syntax: implement ES6-like rest parameters for variadic functionsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-02syntax: support `elif` clauses for alternative `if` syntaxJo-Philipp Wich
In the alternative `if` syntax mode, support a specific `elif` keyword instead of requiring an `else` branch followed by a disjunct `if` statement. The advantage is that templates do not require error-prone redundant `endif` keywords in else-if ladders. After this change, the following example: {% if (...): %} One condition {% else if (...): %} Another condition {% else if (...): %} A third condition {% else %} Final condition {% endif; endif; endif %} ... can be simplified into: {% if (...): %} One condition {% elif (...): %} Another condition {% elif (...): %} A third condition {% else %} Final condition {% endif %} Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-14treewide: unify error handlingJo-Philipp Wich
Get rid of the distinction between lexer/parser errors and runtime exceptions, use exceptions everywhere instead. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-14treewide: rework source file and callstack handlingJo-Philipp Wich
- Keep an open FILE* reference to processed source files in order to be able to rewind and extract error context later - Build a proper call stack when invoking utpl functions - Report call stack in exceptions Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-14lexer: rewriteJo-Philipp Wich
Rewrite the lexer into a restartable state machine to support parsing from file streams without the need to read the entire source text into memory first. As a side effect, the length of labels and strings is unlimited now. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-05lexer: properly handle reserved `if` wordJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-02treewide: rework handling of memory allocation failuresJo-Philipp Wich
Instead of propagating failures to the caller, print a generic error message and terminate program execution through abort(). Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-24syntax: add regular expression supportJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-22syntax: introduce case statement supportJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-21syntax: introduce try/catch blocksJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-11syntax: introduce !== and === operatorsJo-Philipp Wich
Also treat "in" as relational operator. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-10treewide: implement default lstrip_blocks and trim_blocks behaviourJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-07ast, eval, lexer: keep track of overflows when parsing numbersJo-Philipp Wich
This allows number literals that exceed the range INT64_MIN..INT64_MAX to be truncated to the respective min and max values in a defined manner. It also makes it possible to have the expression `{{ -9223372036854775808 }}` actually result in `-9223372036854775808`. Since negation and number declaration are separate operations, the value would be first truncated to `9223372036854775807` and then negated, making it impossible to write a literal INT64_MIN value without tracking the overflow. Also fix the number parsing logic to not trucate intergers to 32bit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-07lexer: fix encoding of unicode surrogate pairsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-06treewide: refactor internal AST structuresJo-Philipp Wich
- unify operand and value tag structures - use a contiguous array for storing opcodes - use relative offsets for next and children ops - defer function creation to runtime - rework "this" context handling by storing context pointer in scope tags Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-02treewide: rename double and null value constructor functionsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-08-25treewide: add proper null value handlingJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-08-25treewide: introduce this keywordJo-Philipp Wich
Support a new keyword `this` which allows functions to access the context they're called upon. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-08-25lexer.c, eval.c: move T_EXCEPTION definition to lexer headerJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-08-21Initial commitJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>