Age | Commit message (Collapse) | Author |
|
The compiler emitted incorrect bytecode for logical assignment operations
on property expressions. The generated instructions left the stack in an
unclean state when the assignment condition was not fulfilled, causing a
stack layout mismatch between compiler and vm, leading to undefined
variable accesses and other non-deterministic behavior.
Solve this issue by rewriting the bytecode generation to yield an
instruction sequence that does not leave garbage on the stack.
The implementation is not optimal yet, as an expression in the form
`obj.prop ||= val` will load `obj.prop` twice. This is acceptable for
now as the load operation has no side effect, but should be solved in
a better way by introducing new instructions that allow for swapping
stack slots, allowing the vm to operate on a copy of the loaded value.
Also rewrite the corresponding test case to trigger a runtime error
on code versions before this fix.
Fixes: fdc9b6a ("compiler: fix `??=`, `||=` and `&&=` logical assignment semantics")
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Follow ES6 semantics and ensure that arrow functions with a block body
don't implicitly return the value of the last executed statement.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When compiling logical assignment expressions, ensure that the right hand
side of the assignment is not evaluated when the assignment condition is
unfulfilled.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Track last emitted statement type in compiled code and only generate final
`return null` opcodes if there is no preceeding `return` statement.
Also use this statement tracking to avoid emitting invalid return opcodes
for arrow function bodies with trailing empty statements.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Factor out the nested syntax error message indentation logic into a
separate helper procedure for reuse in other places.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Utilize the new I_DYNLINK vm opcode to support import statements referring
to dynamic extension modules.
During compilation, the compiler will try to infer the type of the imported
module from the resolved file path; if it ends with `.so`, the module is
assumed to by a dynamic extension and loading/binding of the module is
deferred to runtime using I_DYNLINK opcodes.
Additionally, the `-c` cli option gained support for a new compiler flag
`dynlink=...` which allows forcing a particular module name expression
to be treated as dynamic extension. This is useful to e.g. force resolving
`import { x } from "foo"` to a dynamic extension `foo.so` loaded at runtime
even if a plain `foo.uc` exists in the search path during compilation or if
no such module is available at build time.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
If a compile error is raised at offset 0, try to resolve line and
character position anyway.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Indent inner messages and prepend them with a vertical bar to increase
visual separation of messages. Also include file name in source context
output when the compiled program contains more than one source file.
Adjust affected testcase outputs accordingly.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
The current implementation of the module export offset tracking was
inadequate and failed to properly handle larger module dependency
graphs. In order to properly support nested module imports/exports,
the following changes have been introduced:
- Gather export slots during module compilation and emit corresponding
export opcodes as one contiguous block at the end of the module
function body, right before the final return. This ensures that
interleaved imports of other modules do not place foreign exports
between our module exports.
- Track the number of program wide allocated export slots in order
to derive per-module-source offsets for the global VM export list.
- Derive import opcode source index from the module source export
offset and the index of the requested name within the module source
export name list.
- Improve error reporting for circular module imports.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
The current implementation of `uc_compiler_canonicalize_path()` used the
entire runtime path of the source object as path prefix, not just the
directory part of it.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Disallow toplevel `return` statements in module functions
- Disallow `export` statements in non-module functions
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Introduce a further uc_function_t structure member indicating whether the
underlying function is a module constructor.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
This commit introduces syntax level support for ES6 style module import
and export statements. Imports are resolved at compile time and the
corresponding module code is compiled into the main program.
Also add testcases to cover import and export statement semantics.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Do not require a parent function compiler reference to lookup an already
declared (potentially unresolved) upvalue in the current scope. Instead,
search the named upvalues in the current function scope in case there is
no parent compiler reference.
This is required for the upcoming module support which will use unresolved
upvalues to realize import/export functionality.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
So far we allowed anonymous toplevel function expressions which makes
little sense since those can't be used for anything.
Require toplevel function declarations to be named and turn a missing
name into a compile time syntax error.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Report the proper source location when raising an error due to an
increment/decrement operation on a constant value.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
The upcoming module support requires maintaining multiple source objects
within the same program, so add the necessary infrastructure for it.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When compiling continue statements nested in switches, the compiler only
emitted pop statements for the local variables in the switch body scope,
but not for the locals in the scope(s) leading up to the containing loop
body.
Extend the compilers internal patchlist structure to keep track of the
type of scope tied to the patchlist and extend `continue` statement
compilation logic to select the appropriate parent patch list in order
to determine the amount of locals (stack slots) to clear before the
emitted jump instruction.
As a result, the `uc_compiler_backpatch()` implementation can be simplified
somewhat since we do not need to propagate entries to parent lists anymore.
Also add a further regression test case to cover this issue.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When a switch statement containing cases with local variable declarations
and no default case is evalulated and none of the the cases matched, the
local variable slots were never initialized but got popped off the stack
when execution resumed after the switch scope, leading to a mismatch in
stack layout between compiler and runtime, causing local variables to
yield wrong values or a stack underflow triggering a segmentation fault.
Solve this issue by patching the last conditional case match jump to hop
beyond the local variable pop instructions when no default case is defined.
Also extend the regression test case dealing with other switch related
stack mismatch issues to cover this particular problem as well.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When compiling expressions followed by a unary operator, the compiler
triggered a segmentation fault due to invoking an unset infix parser
routine.
Explicitly handle this case and raise a syntax error if such an
invalid expression is encountered.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Implement support for ECMAScript 6 template literals which allow simple
interpolation of variable values into strings without resorting to
`sprintf()` or manual string concatenation.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Support ES2016 exponentiation (**) and exponentiation assignment (**=)
- Support ES2020 nullish coalescing (??) and logical nullish assignment (??=)
- Support ES2021 logical and assignment (&&=) and logical or assignment (||=)
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When compiling a switch statement with duplicate `default` cases or a switch
statement with syntax errors before the body block, two error handling cases
were hit in the code that prematurely returned from the function without
resetting the compiler's patchlist pointer away from the on-stack patchlist
that had been set up for the switch statement.
Upon processing a subsequent break or continue control statement, a realloc
was performed on the then invalid patchlist contents, triggering a
segmentation fault or libc assert.
Solve this issue by not returning from the function but breaking the switch
body parsing loop.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When patching jump targets for break statments while compiling for-loop
statments, we need jump beyond the instructions popping intermediate loop
variables off the stack but before the pop instructions removing local
loop body variables to prevent a stack position mismatch between compiler
and vm.
Before that change, local loop body variables remained on the stack,
breaking the expected stack layout.
Fixes: b3d758b compiler: ("fix for/break miscompilation")
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Instead of treating individual program functions as managed ucode types,
demote uc_function_t values to pointers into a uc_program_t entity
- Promote uc_program_t to a managed type
- Let uc_closure_t claim references to the owning program of the enclosed
uc_function_t
- Redefine public APIs uc_compile() and uc_vm_execute() APIs to return and
expect an uc_program_t object respectively
- Remove vallist indirection for function loading and let the compiler
emit the function id directly when producing function construction code
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Extend source objects with a `runpath` field which contains the original
path of the source being executed by the VM.
When instantiating source objects from file paths, the `runpath` will be
set to the `filename`. When instantiating source buffers using
`uc_source_new_buffer()`, the runpath is initially unset.
A new function `uc_source_runpath_set()` can be used to adjust the runtime
path being associated with a source object.
Extend bytecode loading logic to set the source buffer runtime path to the
precompiled bytecode file path being loaded and executed. This is required
for `sourcepath()` and relative paths in `include()` to function correctly
when executing precompiled programs.
Finally rename `uc_program_from_file()` and `uc_program_to_file()` to
`uc_program_load()` and `uc_program_write()` respectively since the load
part now operates on an `uc_source_t` input buffer instead of a plain
`FILE *` handle.
Adjust users of these API functions accordingly.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Drop support for the `local` keyword and `delete` function calls.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Introduce a new default enable CMake option "COMPILE_SUPPORT" which
allows to disable source code compilation in the ucode interpreter.
Such an interpreter will only be able to load precompiled ucode files.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Introduce new command line flags `-o` and `-O` to write compiled program
code into the specified output file
- Add support for transparently executing precompiled files, the
lexical analyzing and com,pilation phase is skipped in this case
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Move source object pointer into program entity which is referenced by
each function
- Move lineinfo related routines into source.c and use them from lexer.c
since lineinfo encoding does not belong into the lexical analyzer.
- Implement initial infrastructure for detecting source file type,
this is required later to differentiate between plaintext and
precompiled bytecode files
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Instead of storing constant values per function, maintain a global program
wide list for all constant values within the current compilation unit.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Introduce a new "program" entity which holds the list of functions
created during compilation
- Instead of storing pointers to the in-memory function representation
in the constant list, store the index of the function within the
program's function list
- When loading functions from the constant list, retrieve the function
by index from the program entity
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Parse integer literals as unsigned numeric values in order to be able
to represent the entire unsigned 64bit value range
- Stop parsing minus-prefixed integer literals as negative numbers but
treat them as separate minus operator followed by a positive integer
instead
- Only store unsigned numeric constants in bytecode
- Rework numeric comparison logic to be able to handle full 64bit
unsigned integers
- If possible, yield unsigned 64 bit results for additions
- Simplify numeric value conversion API
- Compile code with -fwrapv for defined signed overflow semantics
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Introduce new operators `?.`, `?.[…]` and `?.(…)` to simplify looking up
deeply nested property chain in a secure manner.
The `?.` operator behaves like the `.` property access operator but yields
`null` if the left hand side is `null` or not an object.
Like `?.`, the `?.[…]` operator behaves like the `[…]` computed property
access but yields `null` if the left hand side is `null` or neither an
object or array.
Finally the `?.(…)` operator behaves like the function call operator `(…)`
but yields `null` if the left hand side is `null` or not a callable
function.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When compiling certain expressions as first statement of an ucode
program, e.g. a while loop in raw mode, a jump instruction to offset
zero is emitted which was incorrectly treated as placeholder by the
compiler.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Ensure that most functions follow the subject_verb naming schema
- Move type related function from value.c to types.c
- Rename value.c to vallist.c
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Ensure that all custom typedef and vector declaration type names end with
a "_t" suffix.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Instead of relying on a switch/case mapping of token values to corresponding
VM instructions, infer the instruction number arithmetically.
This shrinks the compiled size on x86/64 by about 250 bytes.
Also emit I_LE and I_GE instructions for `<=` and `>=` comparisons instead
of transforming these into I_GT and I_LT negations.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
The token type split allows us to drop the token value union in the
reserved word list with a subsequent commit.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Introduce support for declaring constant variables through the `const`
keyword. Variables declared with `const` follow the same scoping rules
as `let` declared ones.
In contrast to normal variables, `const` ones may not be assigned to
after their declaration. Any attempt to do so will result in a syntax
error during compilation.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Turn `delete` into a proper operator mimicking ECMAScript semantics.
Also ensure to transparently turn deprecated `delete(obj, propname)`
function calls into `delete obj.propname` expressions during compilation.
When strict mode is active, legacy delete() calls throw a syntax error
instead.
Finally drop the `delete()` function from the stdlib as it is shadowed
by the delete operator syntax now.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
In a loop statement like `for (let x = 1, y = 2; ...)` the initialization
statement was incorrectly interpreted as `let x = 1; y = 2` instead of the
correct `let ..., y = 2`, triggering reference error exceptions in strict
mode.
Solve the issue by continue parsing the rest of the comma expression
seqence as declaration list expression when the initializer is compiled
in local mode.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Due to the special code path parsing the leading label portion of a
parenthesized expression, slashes following a label were improperly
treated as regular expression literal delimitters, emitting a syntax
error when an otherwise valid expression such as `a / 1` was being
parsed as first sub expression of a parenthesized expression.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When emitting byte code for break or continue statements, ensure that local
variables in all containing scopes up to the loop body scope are popped,
not just those in the same scope the statement is located in.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Due to the special code path parsing the leading label portion of a
parenthesized expression, keywords following a property access operator
(TK_DOT, `.`) weren't properly handled, emitting a syntax error when an
otherwise valid expression such as `value.default` was being parsed as
first sub expression of a parenthesized expression.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|