summaryrefslogtreecommitdiffhomepage
path: root/compiler.c
AgeCommit message (Collapse)Author
2022-05-20compiler: fix segmentation fault on compiling unexpected unary expressionsJo-Philipp Wich
When compiling expressions followed by a unary operator, the compiler triggered a segmentation fault due to invoking an unset infix parser routine. Explicitly handle this case and raise a syntax error if such an invalid expression is encountered. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-04-13syntax: implement support for ES6 template literalsJo-Philipp Wich
Implement support for ECMAScript 6 template literals which allow simple interpolation of variable values into strings without resorting to `sprintf()` or manual string concatenation. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-03-07syntax: support add new operatorsJo-Philipp Wich
- Support ES2016 exponentiation (**) and exponentiation assignment (**=) - Support ES2020 nullish coalescing (??) and logical nullish assignment (??=) - Support ES2021 logical and assignment (&&=) and logical or assignment (||=) Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-02-11compiler: fix patchlist corruption on switch statement syntax errorsJo-Philipp Wich
When compiling a switch statement with duplicate `default` cases or a switch statement with syntax errors before the body block, two error handling cases were hit in the code that prematurely returned from the function without resetting the compiler's patchlist pointer away from the on-stack patchlist that had been set up for the switch statement. Upon processing a subsequent break or continue control statement, a realloc was performed on the then invalid patchlist contents, triggering a segmentation fault or libc assert. Solve this issue by not returning from the function but breaking the switch body parsing loop. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-02-08compiler: fix incorrect loop break targetsJo-Philipp Wich
When patching jump targets for break statments while compiling for-loop statments, we need jump beyond the instructions popping intermediate loop variables off the stack but before the pop instructions removing local loop body variables to prevent a stack position mismatch between compiler and vm. Before that change, local loop body variables remained on the stack, breaking the expected stack layout. Fixes: b3d758b compiler: ("fix for/break miscompilation") Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-02-07treewide: rework function memory modelJo-Philipp Wich
- Instead of treating individual program functions as managed ucode types, demote uc_function_t values to pointers into a uc_program_t entity - Promote uc_program_t to a managed type - Let uc_closure_t claim references to the owning program of the enclosed uc_function_t - Redefine public APIs uc_compile() and uc_vm_execute() APIs to return and expect an uc_program_t object respectively - Remove vallist indirection for function loading and let the compiler emit the function id directly when producing function construction code Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-29program: rename bytecode load/write functions, track path of executed fileJo-Philipp Wich
Extend source objects with a `runpath` field which contains the original path of the source being executed by the VM. When instantiating source objects from file paths, the `runpath` will be set to the `filename`. When instantiating source buffers using `uc_source_new_buffer()`, the runpath is initially unset. A new function `uc_source_runpath_set()` can be used to adjust the runtime path being associated with a source object. Extend bytecode loading logic to set the source buffer runtime path to the precompiled bytecode file path being loaded and executed. This is required for `sourcepath()` and relative paths in `include()` to function correctly when executing precompiled programs. Finally rename `uc_program_from_file()` and `uc_program_to_file()` to `uc_program_load()` and `uc_program_write()` respectively since the load part now operates on an `uc_source_t` input buffer instead of a plain `FILE *` handle. Adjust users of these API functions accordingly. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18syntax: drop legacy syntax supportJo-Philipp Wich
Drop support for the `local` keyword and `delete` function calls. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18build: support building without compile capabilitiesJo-Philipp Wich
Introduce a new default enable CMake option "COMPILE_SUPPORT" which allows to disable source code compilation in the ucode interpreter. Such an interpreter will only be able to load precompiled ucode files. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18program: implement support for precompiling source filesJo-Philipp Wich
- Introduce new command line flags `-o` and `-O` to write compiled program code into the specified output file - Add support for transparently executing precompiled files, the lexical analyzing and com,pilation phase is skipped in this case Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18source: refactor source file handlingJo-Philipp Wich
- Move source object pointer into program entity which is referenced by each function - Move lineinfo related routines into source.c and use them from lexer.c since lineinfo encoding does not belong into the lexical analyzer. - Implement initial infrastructure for detecting source file type, this is required later to differentiate between plaintext and precompiled bytecode files Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18compiler, vm: use a program wide constant listJo-Philipp Wich
Instead of storing constant values per function, maintain a global program wide list for all constant values within the current compilation unit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18types: add initial infrastructure for function serializationJo-Philipp Wich
- Introduce a new "program" entity which holds the list of functions created during compilation - Instead of storing pointers to the in-memory function representation in the constant list, store the index of the function within the program's function list - When loading functions from the constant list, retrieve the function by index from the program entity Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-04treewide: rework numeric value handlingJo-Philipp Wich
- Parse integer literals as unsigned numeric values in order to be able to represent the entire unsigned 64bit value range - Stop parsing minus-prefixed integer literals as negative numbers but treat them as separate minus operator followed by a positive integer instead - Only store unsigned numeric constants in bytecode - Rework numeric comparison logic to be able to handle full 64bit unsigned integers - If possible, yield unsigned 64 bit results for additions - Simplify numeric value conversion API - Compile code with -fwrapv for defined signed overflow semantics Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-12-01syntax: disallow keywords in object property shorthand notationJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-10-11syntax: introduce optional chaining operatorsJo-Philipp Wich
Introduce new operators `?.`, `?.[…]` and `?.(…)` to simplify looking up deeply nested property chain in a secure manner. The `?.` operator behaves like the `.` property access operator but yields `null` if the left hand side is `null` or not an object. Like `?.`, the `?.[…]` operator behaves like the `[…]` computed property access but yields `null` if the left hand side is `null` or neither an object or array. Finally the `?.(…)` operator behaves like the function call operator `(…)` but yields `null` if the left hand side is `null` or not a callable function. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-09-19compiler: properly handle jumps to offset 0Jo-Philipp Wich
When compiling certain expressions as first statement of an ucode program, e.g. a while loop in raw mode, a jump instruction to offset zero is emitted which was incorrectly treated as placeholder by the compiler. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11treewide: harmonize function namingJo-Philipp Wich
- Ensure that most functions follow the subject_verb naming schema - Move type related function from value.c to types.c - Rename value.c to vallist.c Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11treewide: move header files into dedicated directoryJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11treewide: consolidate typedef namingJo-Philipp Wich
Ensure that all custom typedef and vector declaration type names end with a "_t" suffix. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11treewide: eliminate dead code and unused functionsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-05compiler: don't segfault on invalid declaration expressionsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-06-08compiler: improve mapping of binary operator tokens to instructionsJo-Philipp Wich
Instead of relying on a switch/case mapping of token values to corresponding VM instructions, infer the instruction number arithmetically. This shrinks the compiled size on x86/64 by about 250 bytes. Also emit I_LE and I_GE instructions for `<=` and `>=` comparisons instead of transforming these into I_GT and I_LT negations. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25lexer, compiler: separate TK_BOOL token into TK_TRUE and TK_FALSE tokensJo-Philipp Wich
The token type split allows us to drop the token value union in the reserved word list with a subsequent commit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18syntax: introduce `const` supportJo-Philipp Wich
Introduce support for declaring constant variables through the `const` keyword. Variables declared with `const` follow the same scoping rules as `let` declared ones. In contrast to normal variables, `const` ones may not be assigned to after their declaration. Any attempt to do so will result in a syntax error during compilation. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18compiler, lexer: add NO_LEGACY define to disable legacy syntax featuresJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18syntax: implement `delete` as proper operatorJo-Philipp Wich
Turn `delete` into a proper operator mimicking ECMAScript semantics. Also ensure to transparently turn deprecated `delete(obj, propname)` function calls into `delete obj.propname` expressions during compilation. When strict mode is active, legacy delete() calls throw a syntax error instead. Finally drop the `delete()` function from the stdlib as it is shadowed by the delete operator syntax now. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-11compiler: fix local for-loop initializer variable declarationsJo-Philipp Wich
In a loop statement like `for (let x = 1, y = 2; ...)` the initialization statement was incorrectly interpreted as `let x = 1; y = 2` instead of the correct `let ..., y = 2`, triggering reference error exceptions in strict mode. Solve the issue by continue parsing the rest of the comma expression seqence as declaration list expression when the initializer is compiled in local mode. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-11compiler: properly parse slashes in parenthesized division expressionsJo-Philipp Wich
Due to the special code path parsing the leading label portion of a parenthesized expression, slashes following a label were improperly treated as regular expression literal delimitters, emitting a syntax error when an otherwise valid expression such as `a / 1` was being parsed as first sub expression of a parenthesized expression. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-05compiler: properly handle break/continue in nested scopesJo-Philipp Wich
When emitting byte code for break or continue statements, ensure that local variables in all containing scopes up to the loop body scope are popped, not just those in the same scope the statement is located in. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-04compiler: properly handle keyword in parenthesized property access expressionJo-Philipp Wich
Due to the special code path parsing the leading label portion of a parenthesized expression, keywords following a property access operator (TK_DOT, `.`) weren't properly handled, emitting a syntax error when an otherwise valid expression such as `value.default` was being parsed as first sub expression of a parenthesized expression. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-04compiler: fix stack mismatch on compiling `use strict` statementsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-04syntax: implement support for 'use strict' pragmaJo-Philipp Wich
Support per-file and per-function `"use strict";` statement to opt into strict variable handling from ucode source code. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29compiler: fix segfault on parsing invalid pre/post increment expressionsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29compiler, lexer: improve lexical state handlingJo-Philipp Wich
- Instead of disambiguating division operator vs. regexp literal by looking at the preceeding token, raise a "no regexp" flag within the appropriate parser states to tell the lexer how to treat a forward slash when parsing the next token - Introduce another "no keyword" flag which disables parsing labels into keywords when reading the next token and set it in the appropriate parser states. This allows using reserved names in object declarations and property access expressions Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-27treewide: ISO C / pedantic complianceJo-Philipp Wich
- Shuffle typedefs to avoid need for non-compliant forward declarations - Fix non-compliant empty struct initializers - Remove use of braced expressions - Remove use of anonymous unions - Avoid `void *` pointer arithmetic - Fix several warnings reported by gcc -pedantic mode and clang 11 compilation Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-25treewide: rework internal data type systemJo-Philipp Wich
Instead of relying on json_object values internally, use custom types to represent the different ucode value types which brings a number of advantages compared to the previous approach: - Due to the use of tagged pointers, small integer, string and bool values can be stored directly in the pointer addresses, vastly reducing required heap memory - Ability to create circular data structures such as `let o; o = { test: o };` - Ability to register custom `tostring()` function through prototypes - Initial mark/sweep GC implementation to tear down circular object graphs on VM deinit The change also paves the way for possible future extensions such as constant variables and meta methods for custom ressource types. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-24treewide: fix issues reported by clang code analyzerJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-31compiler: actually expand block scope fix to for/while alt syntaxJo-Philipp Wich
Fixes: 97bf297 ("compiler: ensure that alternative if/for/while syntax has own block scope") Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-31compiler: ensure that alternative if/for/while syntax has own block scopeJo-Philipp Wich
The `if ...: endif`, `for ...: endfor`, `while ...: endwhile` etc. syntax statements are supposed to have their own lexical scope, like curly brace blocks in normal statements. Without this, local variable declarations within such blocks would incorrectly shift stack offsets for the remainder of the program. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-29compiler: rework switch statement code generationJo-Philipp Wich
- Initialize stack slots belonging to skipped local variable declarations - Group switch case value tests after switch statement body Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-24compiler: fix for/break miscompilationJo-Philipp Wich
When patching jump targets for break statments while compiling for-loop statments, jump beyond the instructions popping intermediate loop variables off the stack to fix a stack position mismatch between compiler and vm. Before that change, local loop body variables got popped twice, breaking the expected stack layout. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-23compiler: fix another try/catch miscompilationJo-Philipp Wich
When skipping over the catch block of a try/catch statement, make sure to emit the jump after the try scope variables have been popped off the stack in order to prevent a stack position mismatch between compiler and vm. Fixes: 9ad9afb ("compiler: fix try/catch miscompilation") Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-12compiler: fix parsing of arrow functions with single expression bodyJo-Philipp Wich
Ensure that an arrow function body expression is parsed with P_ASSIGN precedence to not greedily consume comma expressions. This ensures that an expression like () => 1, 2 is parsed as function [() => 1], integer [2] and not as function [() => 1, 2]. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-11compiler: fix switch case->default fallthroughJo-Philipp Wich
Simplify handling of default case in switch statements. Instead of jumping over the default block, simply record the start address of the block since the initial switch jump is patched into the first non-default case already. This also leads to slightly smaller bytecode. Previously, when a case branch fell through into a default block, it did hit the default skip jump which jumped back into the first case which then fell through into the default skip jump, leading to an endless loop. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-26compiler: fix try/catch miscompilationJo-Philipp Wich
When skipping catch blocks with exception variables, jump beyond the instruction popping the exception variable off the stack to fix a stack position mismatch between compiler and vm. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-26lexer: improvementsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-17syntax: support ES2015 computed property namesJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-17syntax: support ES2015 shorthand property namesJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-17treewide: rewrite ucode interpreterJo-Philipp Wich
Replace the former AST walking interpreter implementation with a single pass bytecode compiler and a corresponding virtual machine. The rewrite lays the groundwork for a couple of improvements with will be subsequently implemented: - Ability to precompile ucode sources into binary byte code - Strippable debug information - Reduced runtime memory usage Signed-off-by: Jo-Philipp Wich <jo@mein.io>