Age | Commit message (Collapse) | Author |
|
Tweak the token stream reported by the lexer in order to make it more useful
for alternative, non-compilation downstream parse processes such as code
intelligence gathering within a language server implementation.
- Instead of silently discarding source code comments in the lexing phase,
emit TK_COMMENT tokens which is useful to e.g. parse type annotations and
other structured information.
- Do not silently discard TK_LSTM tokens but report them to downstream
parsers instead.
- Do not silently emit TK_RSTM tokens as TK_SCOL but report them as-is to
downstrem parsers.
- Adjust the byte code compiler to properly deal with the changed token
reporting by discarding incoming TK_COMMENT and TK_LSTM tokens and by
remapping read TK_RSTM tokens to the TK_SCOL type.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Report end position for emitted tokens. This is required to reliably
determine the token length, e.g. for downstream code intelligence
use cases
- Fix start offset of continued template literal string tokens.
Previously the start offset of a literal string following a `${...}`
placeholder expressions was shifted by one byte
- Report proper start offset of `TK_LEXP` tokens.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
ECMAScript allows using `as` and `from` as identifiers so follow suit
and don't treat them specially while parsing. Extend the compiler logic
instead to check for TK_LABEL tokens with the expected value to properly
parse import and export statements.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Get rid of most __APPLE__ guards by introducing a central platform.c unit
providing drop-in replacements for missing APIs.
Also move system signal definitions into the new platform file to be able
to share them with the upcoming debug library.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Avoid reporting a nonexisting final line by not counting the EOF character
as physical newline.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Ensure that regexp extension escapes are consistently handled;
substitute `\d`, `\D`, `\s`, `\S`, `\w` and `\W` with `[[:digit:]]`,
`[^[:digit:]]`, `[[:space:]]`, `[^[:space:]]`, `[[:alnum:]_]` and
`[^[:alnum:]_]` character classes respectively since not all POSIX
regexp implementations implement all of those extensions
- Preserve `\b`, `\B`, `\<` and `\>` boundary matches
Fixes: a45f2a3 ("lexer: improve regex literal handling")
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Do not treat slashes within bracket expressions as delimitters
- Do not escape slashes when stringifying regex sources
- Allow all escape sequence types in regex literals
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Add support for the `import`, `export`, `from` and `as` keywords used in
module import and export statements.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Use nested switches instead of lookup tables to detect tokens
- Simplify input buffer logic
- Reduce amount of intermediate states
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
When a template was parsed with global block left stripping disabled, then
any text preceding an expression or statement block start tag was incorrectly
prepended to the first token value of the block, leading to syntax errors in
the compiler.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Recognize new number literal prefixes `0o` and `0O` for octal as well
as `0b` and `0B` for binary number literals
- Treat number literals with leading zeros as octal while parsing but
as decimal ones on implicit number conversions, means `012` will yield
`10` while `+"012"` or `"012" + 0` will yield `12`
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Implement support for ECMAScript 6 template literals which allow simple
interpolation of variable values into strings without resorting to
`sprintf()` or manual string concatenation.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Support ES2016 exponentiation (**) and exponentiation assignment (**=)
- Support ES2020 nullish coalescing (??) and logical nullish assignment (??=)
- Support ES2021 logical and assignment (&&=) and logical or assignment (||=)
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Drop support for the `local` keyword and `delete` function calls.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Introduce a new default enable CMake option "COMPILE_SUPPORT" which
allows to disable source code compilation in the ucode interpreter.
Such an interpreter will only be able to load precompiled ucode files.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Move source object pointer into program entity which is referenced by
each function
- Move lineinfo related routines into source.c and use them from lexer.c
since lineinfo encoding does not belong into the lexical analyzer.
- Implement initial infrastructure for detecting source file type,
this is required later to differentiate between plaintext and
precompiled bytecode files
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Parse integer literals as unsigned numeric values in order to be able
to represent the entire unsigned 64bit value range
- Stop parsing minus-prefixed integer literals as negative numbers but
treat them as separate minus operator followed by a positive integer
instead
- Only store unsigned numeric constants in bytecode
- Rework numeric comparison logic to be able to handle full 64bit
unsigned integers
- If possible, yield unsigned 64 bit results for additions
- Simplify numeric value conversion API
- Compile code with -fwrapv for defined signed overflow semantics
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Introduce new operators `?.`, `?.[…]` and `?.(…)` to simplify looking up
deeply nested property chain in a secure manner.
The `?.` operator behaves like the `.` property access operator but yields
`null` if the left hand side is `null` or not an object.
Like `?.`, the `?.[…]` operator behaves like the `[…]` computed property
access but yields `null` if the left hand side is `null` or neither an
object or array.
Finally the `?.(…)` operator behaves like the function call operator `(…)`
but yields `null` if the left hand side is `null` or not a callable
function.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Ensure that most functions follow the subject_verb naming schema
- Move type related function from value.c to types.c
- Rename value.c to vallist.c
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Ensure that all custom typedef and vector declaration type names end with
a "_t" suffix.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
This is a cosmetic change to bring the code in line with the common prefix
format of the other code in the tree.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
The compiler will keep fetching tokens until hitting EOF, so ensure that
the lexer produces EOF after an unrecognized character error.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Enabling raw code mode allows writing ucode scripts without any template
tag decorations (that is, without the need to provide an initial opening
'{%' tag).
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
The token type split allows us to drop the token value union in the
reserved word list with a subsequent commit.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Turn the Infinity and NaN keywords into global properties.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Introduce support for declaring constant variables through the `const`
keyword. Variables declared with `const` follow the same scoping rules
as `let` declared ones.
In contrast to normal variables, `const` ones may not be assigned to
after their declaration. Any attempt to do so will result in a syntax
error during compilation.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Turn `delete` into a proper operator mimicking ECMAScript semantics.
Also ensure to transparently turn deprecated `delete(obj, propname)`
function calls into `delete obj.propname` expressions during compilation.
When strict mode is active, legacy delete() calls throw a syntax error
instead.
Finally drop the `delete()` function from the stdlib as it is shadowed
by the delete operator syntax now.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Skip interpreter lines in any source buffer and handle the skipping in the
lexer itself, to avoid reporting wrongly shifted token offsets to the
compiler, resulting in wrong error locations and source contexts.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Instead of disambiguating division operator vs. regexp literal by looking
at the preceeding token, raise a "no regexp" flag within the appropriate
parser states to tell the lexer how to treat a forward slash when parsing
the next token
- Introduce another "no keyword" flag which disables parsing labels into
keywords when reading the next token and set it in the appropriate parser
states. This allows using reserved names in object declarations and
property access expressions
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Shuffle typedefs to avoid need for non-compliant forward declarations
- Fix non-compliant empty struct initializers
- Remove use of braced expressions
- Remove use of anonymous unions
- Avoid `void *` pointer arithmetic
- Fix several warnings reported by gcc -pedantic mode and clang 11 compilation
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Instead of relying on json_object values internally, use custom types to
represent the different ucode value types which brings a number of
advantages compared to the previous approach:
- Due to the use of tagged pointers, small integer, string and bool
values can be stored directly in the pointer addresses, vastly
reducing required heap memory
- Ability to create circular data structures such as
`let o; o = { test: o };`
- Ability to register custom `tostring()` function through prototypes
- Initial mark/sweep GC implementation to tear down circular object
graphs on VM deinit
The change also paves the way for possible future extensions such as
constant variables and meta methods for custom ressource types.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Fixes bunch of following warnings:
lexer.c:68:37: warning: missing field 'parse' initializer [-Wmissing-field-initializers]
lexer.c:138:34: warning: missing field '' initializer [-Wmissing-field-initializers]
Signed-off-by: Petr Štetiar <ynezz@true.cz>
|
|
A logic flaw in the lineinfo encoding function led to an infinite tight
loop when a buffer chunk with 128 byte or more got consumed, which may
happen when parsing very long literals.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
While parsing string literals, actually consume the backslash introducing an
escape sequence to prevent it from ending up in the produced string if the
scanner is at the end of the buffer and the remaining buffer contents are
flushed after the consumer loop.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Replace the former AST walking interpreter implementation with a single pass
bytecode compiler and a corresponding virtual machine.
The rewrite lays the groundwork for a couple of improvements with will be
subsequently implemented:
- Ability to precompile ucode sources into binary byte code
- Strippable debug information
- Reduced runtime memory usage
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Instead of obtaining and caching direct opcode pointers, use relative
references when dealing with opcodes since direct or indirect calls to
uc_execute_op() might lead to reallocations of the opcode array, shifting
memory addresses and invalidating pointers taken before the invocation.
Such stale pointer accesses could be commonly triggered when one part
of the processed expression was a require() or include() call loading
relatively large ucode sources.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Eliminate dead code left after regex literal parsing changes
- Properly handle short octal sequences at end of string
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Ensure that the single char escapes `\a`, `\b`, `\e`, `\f`, `\n`,
`\r`, `\t` and `\v` keep working. Since they're not part of the POSIX
extended regular expression spec, they're not handled by the RE engine
so we need to substitute them by their actual byte value while parsing
the literal.
Fixes: ac5cb87 ("syntax: fix string and regex literal parsing quirks")
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
- Do not interprete escape sequences in regexp literals
- Do not improperly substitute control escape sequences such as
`\n` or `\a` after a backslash
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|
|
Optimize the strncmp() based token lookup with an integer comparison
approach which roughly cuts the time of the source code parsing phase
in half.
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
|