ucode - The ucode Scripting Language

Age	Commit message (Collapse)	Author
2023-11-06	syntax: don't treat `as` and `from` as reserved keywords	Jo-Philipp Wich
	ECMAScript allows using `as` and `from` as identifiers so follow suit and don't treat them specially while parsing. Extend the compiler logic instead to check for TK_LABEL tokens with the expected value to properly parse import and export statements. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2023-08-09	treewide: consolidate platform specific code in platform.c	Jo-Philipp Wich
	Get rid of most __APPLE__ guards by introducing a central platform.c unit providing drop-in replacements for missing APIs. Also move system signal definitions into the new platform file to be able to share them with the upcoming debug library. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2023-07-12	lexer: don't count EOF token as newline	Jo-Philipp Wich
	Avoid reporting a nonexisting final line by not counting the EOF character as physical newline. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-10-05	lexer: fixes for regex literal parsing	Jo-Philipp Wich
	- Ensure that regexp extension escapes are consistently handled; substitute `\d`, `\D`, `\s`, `\S`, `\w` and `\W` with `[[:digit:]]`, `[^[:digit:]]`, `[[:space:]]`, `[^[:space:]]`, `[[:alnum:]_]` and `[^[:alnum:]_]` character classes respectively since not all POSIX regexp implementations implement all of those extensions - Preserve `\b`, `\B`, `\<` and `\>` boundary matches Fixes: a45f2a3 ("lexer: improve regex literal handling") Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-10-04	lexer: improve regex literal handling	Jo-Philipp Wich
	- Do not treat slashes within bracket expressions as delimitters - Do not escape slashes when stringifying regex sources - Allow all escape sequence types in regex literals Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-07-28	lexer: recognize module related keywords	Jo-Philipp Wich
	Add support for the `import`, `export`, `from` and `as` keywords used in module import and export statements. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-07-28	lexer: rewrite token scanner	Jo-Philipp Wich
	- Use nested switches instead of lookup tables to detect tokens - Simplify input buffer logic - Reduce amount of intermediate states Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-07-12	lexer: fix parsing with disabled block left stripping	Jo-Philipp Wich
	When a template was parsed with global block left stripping disabled, then any text preceding an expression or statement block start tag was incorrectly prepended to the first token value of the block, leading to syntax errors in the compiler. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-06-01	syntax: adjust number literal parsing and string to number conversion	Jo-Philipp Wich
	- Recognize new number literal prefixes `0o` and `0O` for octal as well as `0b` and `0B` for binary number literals - Treat number literals with leading zeros as octal while parsing but as decimal ones on implicit number conversions, means `012` will yield `10` while `+"012"` or `"012" + 0` will yield `12` Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-04-13	syntax: implement support for ES6 template literals	Jo-Philipp Wich
	Implement support for ECMAScript 6 template literals which allow simple interpolation of variable values into strings without resorting to `sprintf()` or manual string concatenation. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-03-07	syntax: support add new operators	Jo-Philipp Wich
	- Support ES2016 exponentiation () and exponentiation assignment (=) - Support ES2020 nullish coalescing (??) and logical nullish assignment (??=) - Support ES2021 logical and assignment (&&=) and logical or assignment (\|\|=) Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18	syntax: drop legacy syntax support	Jo-Philipp Wich
	Drop support for the `local` keyword and `delete` function calls. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18	build: support building without compile capabilities	Jo-Philipp Wich
	Introduce a new default enable CMake option "COMPILE_SUPPORT" which allows to disable source code compilation in the ucode interpreter. Such an interpreter will only be able to load precompiled ucode files. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-18	source: refactor source file handling	Jo-Philipp Wich
	- Move source object pointer into program entity which is referenced by each function - Move lineinfo related routines into source.c and use them from lexer.c since lineinfo encoding does not belong into the lexical analyzer. - Implement initial infrastructure for detecting source file type, this is required later to differentiate between plaintext and precompiled bytecode files Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2022-01-04	treewide: rework numeric value handling	Jo-Philipp Wich
	- Parse integer literals as unsigned numeric values in order to be able to represent the entire unsigned 64bit value range - Stop parsing minus-prefixed integer literals as negative numbers but treat them as separate minus operator followed by a positive integer instead - Only store unsigned numeric constants in bytecode - Rework numeric comparison logic to be able to handle full 64bit unsigned integers - If possible, yield unsigned 64 bit results for additions - Simplify numeric value conversion API - Compile code with -fwrapv for defined signed overflow semantics Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-12-01	syntax: disallow keywords in object property shorthand notation	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-10-11	syntax: introduce optional chaining operators	Jo-Philipp Wich
	Introduce new operators `?.`, `?.[…]` and `?.(…)` to simplify looking up deeply nested property chain in a secure manner. The `?.` operator behaves like the `.` property access operator but yields `null` if the left hand side is `null` or not an object. Like `?.`, the `?.[…]` operator behaves like the `[…]` computed property access but yields `null` if the left hand side is `null` or neither an object or array. Finally the `?.(…)` operator behaves like the function call operator `(…)` but yields `null` if the left hand side is `null` or not a callable function. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11	treewide: harmonize function naming	Jo-Philipp Wich
	- Ensure that most functions follow the subject_verb naming schema - Move type related function from value.c to types.c - Rename value.c to vallist.c Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11	treewide: move header files into dedicated directory	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-11	treewide: consolidate typedef naming	Jo-Philipp Wich
	Ensure that all custom typedef and vector declaration type names end with a "_t" suffix. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-09	lexer: rename UT_ prefixed constants to UC_	Jo-Philipp Wich
	This is a cosmetic change to bring the code in line with the common prefix format of the other code in the tree. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-06-29	lexer: transition into EOF state on unrecognized character	Jo-Philipp Wich
	The compiler will keep fetching tokens until hitting EOF, so ensure that the lexer produces EOF after an unrecognized character error. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25	lexer: implement raw code mode	Jo-Philipp Wich
	Enabling raw code mode allows writing ucode scripts without any template tag decorations (that is, without the need to provide an initial opening '{%' tag). Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25	lexer: drop value union from keyword table	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25	lexer, compiler: separate TK_BOOL token into TK_TRUE and TK_FALSE tokens	Jo-Philipp Wich
	The token type split allows us to drop the token value union in the reserved word list with a subsequent commit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25	syntax: drop Infinity and NaN keywords	Jo-Philipp Wich
	Turn the Infinity and NaN keywords into global properties. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18	syntax: introduce `const` support	Jo-Philipp Wich
	Introduce support for declaring constant variables through the `const` keyword. Variables declared with `const` follow the same scoping rules as `let` declared ones. In contrast to normal variables, `const` ones may not be assigned to after their declaration. Any attempt to do so will result in a syntax error during compilation. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18	compiler, lexer: add NO_LEGACY define to disable legacy syntax features	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18	syntax: implement `delete` as proper operator	Jo-Philipp Wich
	Turn `delete` into a proper operator mimicking ECMAScript semantics. Also ensure to transparently turn deprecated `delete(obj, propname)` function calls into `delete obj.propname` expressions during compilation. When strict mode is active, legacy delete() calls throw a syntax error instead. Finally drop the `delete()` function from the stdlib as it is shadowed by the delete operator syntax now. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-14	lexer: skip interpreter line in any source buffer	Jo-Philipp Wich
	Skip interpreter lines in any source buffer and handle the skipping in the lexer itself, to avoid reporting wrongly shifted token offsets to the compiler, resulting in wrong error locations and source contexts. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29	lexer: fix infinite loop on parsing unterminated comments	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29	lexer: fix infinite loop on parsing unterminated expression blocks	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29	lexer: fix infinite loop when parsing regexp literal at EOF	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29	compiler, lexer: improve lexical state handling	Jo-Philipp Wich
	- Instead of disambiguating division operator vs. regexp literal by looking at the preceeding token, raise a "no regexp" flag within the appropriate parser states to tell the lexer how to treat a forward slash when parsing the next token - Introduce another "no keyword" flag which disables parsing labels into keywords when reading the next token and set it in the appropriate parser states. This allows using reserved names in object declarations and property access expressions Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-27	treewide: ISO C / pedantic compliance	Jo-Philipp Wich
	- Shuffle typedefs to avoid need for non-compliant forward declarations - Fix non-compliant empty struct initializers - Remove use of braced expressions - Remove use of anonymous unions - Avoid `void *` pointer arithmetic - Fix several warnings reported by gcc -pedantic mode and clang 11 compilation Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-25	treewide: rework internal data type system	Jo-Philipp Wich
	Instead of relying on json_object values internally, use custom types to represent the different ucode value types which brings a number of advantages compared to the previous approach: - Due to the use of tagged pointers, small integer, string and bool values can be stored directly in the pointer addresses, vastly reducing required heap memory - Ability to create circular data structures such as `let o; o = { test: o };` - Ability to register custom `tostring()` function through prototypes - Initial mark/sweep GC implementation to tear down circular object graphs on VM deinit The change also paves the way for possible future extensions such as constant variables and meta methods for custom ressource types. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-24	treewide: fix issues reported by clang code analyzer	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-23	lexer: fix incomplete struct initializers	Petr Štetiar
	Fixes bunch of following warnings: lexer.c:68:37: warning: missing field 'parse' initializer [-Wmissing-field-initializers] lexer.c:138:34: warning: missing field '' initializer [-Wmissing-field-initializers] Signed-off-by: Petr Štetiar <ynezz@true.cz>
2021-03-11	lexer: fix infinite loop in lineinfo encoding when consuming large chunks	Jo-Philipp Wich
	A logic flaw in the lineinfo encoding function led to an infinite tight loop when a buffer chunk with 128 byte or more got consumed, which may happen when parsing very long literals. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-11	lexer: properly handle string escape sequences at buffer boundary	Jo-Philipp Wich
	While parsing string literals, actually consume the backslash introducing an escape sequence to prevent it from ending up in the produced string if the scanner is at the end of the buffer and the remaining buffer contents are flushed after the consumer loop. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-26	lexer: improvements	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-17	treewide: rewrite ucode interpreter	Jo-Philipp Wich
	Replace the former AST walking interpreter implementation with a single pass bytecode compiler and a corresponding virtual machine. The rewrite lays the groundwork for a couple of improvements with will be subsequently implemented: - Ability to precompile ucode sources into binary byte code - Strippable debug information - Reduced runtime memory usage Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-12-06	treewide: prevent stale pointer access in opcode handlers	Jo-Philipp Wich
	Instead of obtaining and caching direct opcode pointers, use relative references when dealing with opcodes since direct or indirect calls to uc_execute_op() might lead to reallocations of the opcode array, shifting memory addresses and invalidating pointers taken before the invocation. Such stale pointer accesses could be commonly triggered when one part of the processed expression was a require() or include() call loading relatively large ucode sources. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30	syntax: fix quirks when parsing octal sequences	Jo-Philipp Wich
	- Eliminate dead code left after regex literal parsing changes - Properly handle short octal sequences at end of string Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30	syntax: recognize single-char escapes in regex literals again	Jo-Philipp Wich
	Ensure that the single char escapes `\a`, `\b`, `\e`, `\f`, `\n`, `\r`, `\t` and `\v` keep working. Since they're not part of the POSIX extended regular expression spec, they're not handled by the RE engine so we need to substitute them by their actual byte value while parsing the literal. Fixes: ac5cb87 ("syntax: fix string and regex literal parsing quirks") Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30	syntax: fix string and regex literal parsing quirks	Jo-Philipp Wich
	- Do not interprete escape sequences in regexp literals - Do not improperly substitute control escape sequences such as `\n` or `\a` after a backslash Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-19	treewide: rebrand to ucode	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-15	lexer: improve scanner performance	Jo-Philipp Wich
	Optimize the strncmp() based token lookup with an integer comparison approach which roughly cuts the time of the source code parsing phase in half. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-10	lexer: accept "let" as synonym for "local"	Jo-Philipp Wich
	This brings the utpl script syntax closer to ES5/ES6 and allows to use existing syntax highlightings in IDEs and editors. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-05	syntax: implement ES6-like arrow function syntax	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>