summaryrefslogtreecommitdiffhomepage
path: root/compiler.c
AgeCommit message (Collapse)Author
2022-08-24lib: introduce helper function for indenting error messagesJo-Philipp Wich
Factor out the nested syntax error message indentation logic into a separate helper procedure for reuse in other places. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-08-06compiler: add import statement support for dynamic extensionsJo-Philipp Wich
Utilize the new I_DYNLINK vm opcode to support import statements referring to dynamic extension modules. During compilation, the compiler will try to infer the type of the imported module from the resolved file path; if it ends with `.so`, the module is assumed to by a dynamic extension and loading/binding of the module is deferred to runtime using I_DYNLINK opcodes. Additionally, the `-c` cli option gained support for a new compiler flag `dynlink=...` which allows forcing a particular module name expression to be treated as dynamic extension. This is useful to e.g. force resolving `import { x } from "foo"` to a dynamic extension `foo.so` loaded at runtime even if a plain `foo.uc` exists in the search path during compilation or if no such module is available at build time. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-08-06compiler: don't treat offset 0 special at syntax errorsJo-Philipp Wich
If a compile error is raised at offset 0, try to resolve line and character position anyway. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-08-05compiler: improve formatting of nested syntax error messagesJo-Philipp Wich
Indent inner messages and prepend them with a vertical bar to increase visual separation of messages. Also include file name in source context output when the compiled program contains more than one source file. Adjust affected testcase outputs accordingly. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-08-05compiler: rework export index allocationJo-Philipp Wich
The current implementation of the module export offset tracking was inadequate and failed to properly handle larger module dependency graphs. In order to properly support nested module imports/exports, the following changes have been introduced: - Gather export slots during module compilation and emit corresponding export opcodes as one contiguous block at the end of the module function body, right before the final return. This ensures that interleaved imports of other modules do not place foreign exports between our module exports. - Track the number of program wide allocated export slots in order to derive per-module-source offsets for the global VM export list. - Derive import opcode source index from the module source export offset and the index of the requested name within the module source export name list. - Improve error reporting for circular module imports. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-08-05compiler: fix deriving module path from source runpathJo-Philipp Wich
The current implementation of `uc_compiler_canonicalize_path()` used the entire runtime path of the source object as path prefix, not just the directory part of it. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-08-05compiler: enforce stricter module compilation rulesJo-Philipp Wich
- Disallow toplevel `return` statements in module functions - Disallow `export` statements in non-module functions Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-08-05compiler: add a flag denoting module functionsJo-Philipp Wich
Introduce a further uc_function_t structure member indicating whether the underlying function is a module constructor. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-07-30compiler: add support for import/export statementsJo-Philipp Wich
This commit introduces syntax level support for ES6 style module import and export statements. Imports are resolved at compile time and the corresponding module code is compiled into the main program. Also add testcases to cover import and export statement semantics. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-07-30compiler: resolve predeclared upvaluesJo-Philipp Wich
Do not require a parent function compiler reference to lookup an already declared (potentially unresolved) upvalue in the current scope. Instead, search the named upvalues in the current function scope in case there is no parent compiler reference. This is required for the upcoming module support which will use unresolved upvalues to realize import/export functionality. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-07-30compiler: require a name in function declarationsJo-Philipp Wich
So far we allowed anonymous toplevel function expressions which makes little sense since those can't be used for anything. Require toplevel function declarations to be named and turn a missing name into a compile time syntax error. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-07-30compiler: fix reported source position in inc/dec operator errorJo-Philipp Wich
Report the proper source location when raising an error due to an increment/decrement operation on a constant value. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-07-30program: add infrastructure to handle multiple sources per programJo-Philipp Wich
The upcoming module support requires maintaining multiple source objects within the same program, so add the necessary infrastructure for it. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-06-30compiler: fix stack mismatch on continue statements nested in switchesJo-Philipp Wich
When compiling continue statements nested in switches, the compiler only emitted pop statements for the local variables in the switch body scope, but not for the locals in the scope(s) leading up to the containing loop body. Extend the compilers internal patchlist structure to keep track of the type of scope tied to the patchlist and extend `continue` statement compilation logic to select the appropriate parent patch list in order to determine the amount of locals (stack slots) to clear before the emitted jump instruction. As a result, the `uc_compiler_backpatch()` implementation can be simplified somewhat since we do not need to propagate entries to parent lists anymore. Also add a further regression test case to cover this issue. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-06-27compiler: fix stack mismatch on nonmatching switch statements with localsJo-Philipp Wich
When a switch statement containing cases with local variable declarations and no default case is evalulated and none of the the cases matched, the local variable slots were never initialized but got popped off the stack when execution resumed after the switch scope, leading to a mismatch in stack layout between compiler and runtime, causing local variables to yield wrong values or a stack underflow triggering a segmentation fault. Solve this issue by patching the last conditional case match jump to hop beyond the local variable pop instructions when no default case is defined. Also extend the regression test case dealing with other switch related stack mismatch issues to cover this particular problem as well. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-05-20compiler: fix segmentation fault on compiling unexpected unary expressionsJo-Philipp Wich
When compiling expressions followed by a unary operator, the compiler triggered a segmentation fault due to invoking an unset infix parser routine. Explicitly handle this case and raise a syntax error if such an invalid expression is encountered. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-04-13syntax: implement support for ES6 template literalsJo-Philipp Wich
Implement support for ECMAScript 6 template literals which allow simple interpolation of variable values into strings without resorting to `sprintf()` or manual string concatenation. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-03-07syntax: support add new operatorsJo-Philipp Wich
- Support ES2016 exponentiation (**) and exponentiation assignment (**=) - Support ES2020 nullish coalescing (??) and logical nullish assignment (??=) - Support ES2021 logical and assignment (&&=) and logical or assignment (||=) Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-02-11compiler: fix patchlist corruption on switch statement syntax errorsJo-Philipp Wich
When compiling a switch statement with duplicate `default` cases or a switch statement with syntax errors before the body block, two error handling cases were hit in the code that prematurely returned from the function without resetting the compiler's patchlist pointer away from the on-stack patchlist that had been set up for the switch statement. Upon processing a subsequent break or continue control statement, a realloc was performed on the then invalid patchlist contents, triggering a segmentation fault or libc assert. Solve this issue by not returning from the function but breaking the switch body parsing loop. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-02-08compiler: fix incorrect loop break targetsJo-Philipp Wich
When patching jump targets for break statments while compiling for-loop statments, we need jump beyond the instructions popping intermediate loop variables off the stack but before the pop instructions removing local loop body variables to prevent a stack position mismatch between compiler and vm. Before that change, local loop body variables remained on the stack, breaking the expected stack layout. Fixes: b3d758b compiler: ("fix for/break miscompilation") Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-02-07treewide: rework function memory modelJo-Philipp Wich
- Instead of treating individual program functions as managed ucode types, demote uc_function_t values to pointers into a uc_program_t entity - Promote uc_program_t to a managed type - Let uc_closure_t claim references to the owning program of the enclosed uc_function_t - Redefine public APIs uc_compile() and uc_vm_execute() APIs to return and expect an uc_program_t object respectively - Remove vallist indirection for function loading and let the compiler emit the function id directly when producing function construction code Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-29program: rename bytecode load/write functions, track path of executed fileJo-Philipp Wich
Extend source objects with a `runpath` field which contains the original path of the source being executed by the VM. When instantiating source objects from file paths, the `runpath` will be set to the `filename`. When instantiating source buffers using `uc_source_new_buffer()`, the runpath is initially unset. A new function `uc_source_runpath_set()` can be used to adjust the runtime path being associated with a source object. Extend bytecode loading logic to set the source buffer runtime path to the precompiled bytecode file path being loaded and executed. This is required for `sourcepath()` and relative paths in `include()` to function correctly when executing precompiled programs. Finally rename `uc_program_from_file()` and `uc_program_to_file()` to `uc_program_load()` and `uc_program_write()` respectively since the load part now operates on an `uc_source_t` input buffer instead of a plain `FILE *` handle. Adjust users of these API functions accordingly. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18syntax: drop legacy syntax supportJo-Philipp Wich
Drop support for the `local` keyword and `delete` function calls. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18build: support building without compile capabilitiesJo-Philipp Wich
Introduce a new default enable CMake option "COMPILE_SUPPORT" which allows to disable source code compilation in the ucode interpreter. Such an interpreter will only be able to load precompiled ucode files. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18program: implement support for precompiling source filesJo-Philipp Wich
- Introduce new command line flags `-o` and `-O` to write compiled program code into the specified output file - Add support for transparently executing precompiled files, the lexical analyzing and com,pilation phase is skipped in this case Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18source: refactor source file handlingJo-Philipp Wich
- Move source object pointer into program entity which is referenced by each function - Move lineinfo related routines into source.c and use them from lexer.c since lineinfo encoding does not belong into the lexical analyzer. - Implement initial infrastructure for detecting source file type, this is required later to differentiate between plaintext and precompiled bytecode files Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18compiler, vm: use a program wide constant listJo-Philipp Wich
Instead of storing constant values per function, maintain a global program wide list for all constant values within the current compilation unit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18types: add initial infrastructure for function serializationJo-Philipp Wich
- Introduce a new "program" entity which holds the list of functions created during compilation - Instead of storing pointers to the in-memory function representation in the constant list, store the index of the function within the program's function list - When loading functions from the constant list, retrieve the function by index from the program entity Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-04treewide: rework numeric value handlingJo-Philipp Wich
- Parse integer literals as unsigned numeric values in order to be able to represent the entire unsigned 64bit value range - Stop parsing minus-prefixed integer literals as negative numbers but treat them as separate minus operator followed by a positive integer instead - Only store unsigned numeric constants in bytecode - Rework numeric comparison logic to be able to handle full 64bit unsigned integers - If possible, yield unsigned 64 bit results for additions - Simplify numeric value conversion API - Compile code with -fwrapv for defined signed overflow semantics Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-12-01syntax: disallow keywords in object property shorthand notationJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-10-11syntax: introduce optional chaining operatorsJo-Philipp Wich
Introduce new operators `?.`, `?.[…]` and `?.(…)` to simplify looking up deeply nested property chain in a secure manner. The `?.` operator behaves like the `.` property access operator but yields `null` if the left hand side is `null` or not an object. Like `?.`, the `?.[…]` operator behaves like the `[…]` computed property access but yields `null` if the left hand side is `null` or neither an object or array. Finally the `?.(…)` operator behaves like the function call operator `(…)` but yields `null` if the left hand side is `null` or not a callable function. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-09-19compiler: properly handle jumps to offset 0Jo-Philipp Wich
When compiling certain expressions as first statement of an ucode program, e.g. a while loop in raw mode, a jump instruction to offset zero is emitted which was incorrectly treated as placeholder by the compiler. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11treewide: harmonize function namingJo-Philipp Wich
- Ensure that most functions follow the subject_verb naming schema - Move type related function from value.c to types.c - Rename value.c to vallist.c Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11treewide: move header files into dedicated directoryJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11treewide: consolidate typedef namingJo-Philipp Wich
Ensure that all custom typedef and vector declaration type names end with a "_t" suffix. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11treewide: eliminate dead code and unused functionsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-05compiler: don't segfault on invalid declaration expressionsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-06-08compiler: improve mapping of binary operator tokens to instructionsJo-Philipp Wich
Instead of relying on a switch/case mapping of token values to corresponding VM instructions, infer the instruction number arithmetically. This shrinks the compiled size on x86/64 by about 250 bytes. Also emit I_LE and I_GE instructions for `<=` and `>=` comparisons instead of transforming these into I_GT and I_LT negations. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25lexer, compiler: separate TK_BOOL token into TK_TRUE and TK_FALSE tokensJo-Philipp Wich
The token type split allows us to drop the token value union in the reserved word list with a subsequent commit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18syntax: introduce `const` supportJo-Philipp Wich
Introduce support for declaring constant variables through the `const` keyword. Variables declared with `const` follow the same scoping rules as `let` declared ones. In contrast to normal variables, `const` ones may not be assigned to after their declaration. Any attempt to do so will result in a syntax error during compilation. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18compiler, lexer: add NO_LEGACY define to disable legacy syntax featuresJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18syntax: implement `delete` as proper operatorJo-Philipp Wich
Turn `delete` into a proper operator mimicking ECMAScript semantics. Also ensure to transparently turn deprecated `delete(obj, propname)` function calls into `delete obj.propname` expressions during compilation. When strict mode is active, legacy delete() calls throw a syntax error instead. Finally drop the `delete()` function from the stdlib as it is shadowed by the delete operator syntax now. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-11compiler: fix local for-loop initializer variable declarationsJo-Philipp Wich
In a loop statement like `for (let x = 1, y = 2; ...)` the initialization statement was incorrectly interpreted as `let x = 1; y = 2` instead of the correct `let ..., y = 2`, triggering reference error exceptions in strict mode. Solve the issue by continue parsing the rest of the comma expression seqence as declaration list expression when the initializer is compiled in local mode. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-11compiler: properly parse slashes in parenthesized division expressionsJo-Philipp Wich
Due to the special code path parsing the leading label portion of a parenthesized expression, slashes following a label were improperly treated as regular expression literal delimitters, emitting a syntax error when an otherwise valid expression such as `a / 1` was being parsed as first sub expression of a parenthesized expression. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-05compiler: properly handle break/continue in nested scopesJo-Philipp Wich
When emitting byte code for break or continue statements, ensure that local variables in all containing scopes up to the loop body scope are popped, not just those in the same scope the statement is located in. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-04compiler: properly handle keyword in parenthesized property access expressionJo-Philipp Wich
Due to the special code path parsing the leading label portion of a parenthesized expression, keywords following a property access operator (TK_DOT, `.`) weren't properly handled, emitting a syntax error when an otherwise valid expression such as `value.default` was being parsed as first sub expression of a parenthesized expression. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-04compiler: fix stack mismatch on compiling `use strict` statementsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-04syntax: implement support for 'use strict' pragmaJo-Philipp Wich
Support per-file and per-function `"use strict";` statement to opt into strict variable handling from ucode source code. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29compiler: fix segfault on parsing invalid pre/post increment expressionsJo-Philipp Wich
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29compiler, lexer: improve lexical state handlingJo-Philipp Wich
- Instead of disambiguating division operator vs. regexp literal by looking at the preceeding token, raise a "no regexp" flag within the appropriate parser states to tell the lexer how to treat a forward slash when parsing the next token - Introduce another "no keyword" flag which disables parsing labels into keywords when reading the next token and set it in the appropriate parser states. This allows using reserved names in object declarations and property access expressions Signed-off-by: Jo-Philipp Wich <jo@mein.io>