ucode - The ucode Scripting Language

Age	Commit message (Collapse)	Author
2021-07-11	treewide: consolidate typedef naming	Jo-Philipp Wich
	Ensure that all custom typedef and vector declaration type names end with a "_t" suffix. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-07-09	lexer: rename UT_ prefixed constants to UC_	Jo-Philipp Wich
	This is a cosmetic change to bring the code in line with the common prefix format of the other code in the tree. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-06-29	lexer: transition into EOF state on unrecognized character	Jo-Philipp Wich
	The compiler will keep fetching tokens until hitting EOF, so ensure that the lexer produces EOF after an unrecognized character error. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25	lexer: implement raw code mode	Jo-Philipp Wich
	Enabling raw code mode allows writing ucode scripts without any template tag decorations (that is, without the need to provide an initial opening '{%' tag). Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25	lexer: drop value union from keyword table	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25	lexer, compiler: separate TK_BOOL token into TK_TRUE and TK_FALSE tokens	Jo-Philipp Wich
	The token type split allows us to drop the token value union in the reserved word list with a subsequent commit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-25	syntax: drop Infinity and NaN keywords	Jo-Philipp Wich
	Turn the Infinity and NaN keywords into global properties. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18	syntax: introduce `const` support	Jo-Philipp Wich
	Introduce support for declaring constant variables through the `const` keyword. Variables declared with `const` follow the same scoping rules as `let` declared ones. In contrast to normal variables, `const` ones may not be assigned to after their declaration. Any attempt to do so will result in a syntax error during compilation. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18	compiler, lexer: add NO_LEGACY define to disable legacy syntax features	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-18	syntax: implement `delete` as proper operator	Jo-Philipp Wich
	Turn `delete` into a proper operator mimicking ECMAScript semantics. Also ensure to transparently turn deprecated `delete(obj, propname)` function calls into `delete obj.propname` expressions during compilation. When strict mode is active, legacy delete() calls throw a syntax error instead. Finally drop the `delete()` function from the stdlib as it is shadowed by the delete operator syntax now. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-05-14	lexer: skip interpreter line in any source buffer	Jo-Philipp Wich
	Skip interpreter lines in any source buffer and handle the skipping in the lexer itself, to avoid reporting wrongly shifted token offsets to the compiler, resulting in wrong error locations and source contexts. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29	lexer: fix infinite loop on parsing unterminated comments	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29	lexer: fix infinite loop on parsing unterminated expression blocks	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29	lexer: fix infinite loop when parsing regexp literal at EOF	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-29	compiler, lexer: improve lexical state handling	Jo-Philipp Wich
	- Instead of disambiguating division operator vs. regexp literal by looking at the preceeding token, raise a "no regexp" flag within the appropriate parser states to tell the lexer how to treat a forward slash when parsing the next token - Introduce another "no keyword" flag which disables parsing labels into keywords when reading the next token and set it in the appropriate parser states. This allows using reserved names in object declarations and property access expressions Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-27	treewide: ISO C / pedantic compliance	Jo-Philipp Wich
	- Shuffle typedefs to avoid need for non-compliant forward declarations - Fix non-compliant empty struct initializers - Remove use of braced expressions - Remove use of anonymous unions - Avoid `void *` pointer arithmetic - Fix several warnings reported by gcc -pedantic mode and clang 11 compilation Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-25	treewide: rework internal data type system	Jo-Philipp Wich
	Instead of relying on json_object values internally, use custom types to represent the different ucode value types which brings a number of advantages compared to the previous approach: - Due to the use of tagged pointers, small integer, string and bool values can be stored directly in the pointer addresses, vastly reducing required heap memory - Ability to create circular data structures such as `let o; o = { test: o };` - Ability to register custom `tostring()` function through prototypes - Initial mark/sweep GC implementation to tear down circular object graphs on VM deinit The change also paves the way for possible future extensions such as constant variables and meta methods for custom ressource types. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-24	treewide: fix issues reported by clang code analyzer	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-04-23	lexer: fix incomplete struct initializers	Petr Štetiar
	Fixes bunch of following warnings: lexer.c:68:37: warning: missing field 'parse' initializer [-Wmissing-field-initializers] lexer.c:138:34: warning: missing field '' initializer [-Wmissing-field-initializers] Signed-off-by: Petr Štetiar <ynezz@true.cz>
2021-03-11	lexer: fix infinite loop in lineinfo encoding when consuming large chunks	Jo-Philipp Wich
	A logic flaw in the lineinfo encoding function led to an infinite tight loop when a buffer chunk with 128 byte or more got consumed, which may happen when parsing very long literals. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-03-11	lexer: properly handle string escape sequences at buffer boundary	Jo-Philipp Wich
	While parsing string literals, actually consume the backslash introducing an escape sequence to prevent it from ending up in the produced string if the scanner is at the end of the buffer and the remaining buffer contents are flushed after the consumer loop. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-26	lexer: improvements	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2021-02-17	treewide: rewrite ucode interpreter	Jo-Philipp Wich
	Replace the former AST walking interpreter implementation with a single pass bytecode compiler and a corresponding virtual machine. The rewrite lays the groundwork for a couple of improvements with will be subsequently implemented: - Ability to precompile ucode sources into binary byte code - Strippable debug information - Reduced runtime memory usage Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-12-06	treewide: prevent stale pointer access in opcode handlers	Jo-Philipp Wich
	Instead of obtaining and caching direct opcode pointers, use relative references when dealing with opcodes since direct or indirect calls to uc_execute_op() might lead to reallocations of the opcode array, shifting memory addresses and invalidating pointers taken before the invocation. Such stale pointer accesses could be commonly triggered when one part of the processed expression was a require() or include() call loading relatively large ucode sources. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30	syntax: fix quirks when parsing octal sequences	Jo-Philipp Wich
	- Eliminate dead code left after regex literal parsing changes - Properly handle short octal sequences at end of string Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30	syntax: recognize single-char escapes in regex literals again	Jo-Philipp Wich
	Ensure that the single char escapes `\a`, `\b`, `\e`, `\f`, `\n`, `\r`, `\t` and `\v` keep working. Since they're not part of the POSIX extended regular expression spec, they're not handled by the RE engine so we need to substitute them by their actual byte value while parsing the literal. Fixes: ac5cb87 ("syntax: fix string and regex literal parsing quirks") Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-30	syntax: fix string and regex literal parsing quirks	Jo-Philipp Wich
	- Do not interprete escape sequences in regexp literals - Do not improperly substitute control escape sequences such as `\n` or `\a` after a backslash Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-19	treewide: rebrand to ucode	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-15	lexer: improve scanner performance	Jo-Philipp Wich
	Optimize the strncmp() based token lookup with an integer comparison approach which roughly cuts the time of the source code parsing phase in half. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-10	lexer: accept "let" as synonym for "local"	Jo-Philipp Wich
	This brings the utpl script syntax closer to ES5/ES6 and allows to use existing syntax highlightings in IDEs and editors. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-05	syntax: implement ES6-like arrow function syntax	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-03	syntax: implement ES6-like rest parameters for variadic functions	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-11-02	syntax: support `elif` clauses for alternative `if` syntax	Jo-Philipp Wich
	In the alternative `if` syntax mode, support a specific `elif` keyword instead of requiring an `else` branch followed by a disjunct `if` statement. The advantage is that templates do not require error-prone redundant `endif` keywords in else-if ladders. After this change, the following example: {% if (...): %} One condition {% else if (...): %} Another condition {% else if (...): %} A third condition {% else %} Final condition {% endif; endif; endif %} ... can be simplified into: {% if (...): %} One condition {% elif (...): %} Another condition {% elif (...): %} A third condition {% else %} Final condition {% endif %} Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-14	treewide: unify error handling	Jo-Philipp Wich
	Get rid of the distinction between lexer/parser errors and runtime exceptions, use exceptions everywhere instead. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-14	treewide: rework source file and callstack handling	Jo-Philipp Wich
	- Keep an open FILE* reference to processed source files in order to be able to rewind and extract error context later - Build a proper call stack when invoking utpl functions - Report call stack in exceptions Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-14	lexer: rewrite	Jo-Philipp Wich
	Rewrite the lexer into a restartable state machine to support parsing from file streams without the need to read the entire source text into memory first. As a side effect, the length of labels and strings is unlimited now. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-05	lexer: properly handle reserved `if` word	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-10-02	treewide: rework handling of memory allocation failures	Jo-Philipp Wich
	Instead of propagating failures to the caller, print a generic error message and terminate program execution through abort(). Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-24	syntax: add regular expression support	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-22	syntax: introduce case statement support	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-21	syntax: introduce try/catch blocks	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-11	syntax: introduce !== and === operators	Jo-Philipp Wich
	Also treat "in" as relational operator. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-10	treewide: implement default lstrip_blocks and trim_blocks behaviour	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-07	ast, eval, lexer: keep track of overflows when parsing numbers	Jo-Philipp Wich
	This allows number literals that exceed the range INT64_MIN..INT64_MAX to be truncated to the respective min and max values in a defined manner. It also makes it possible to have the expression `{{ -9223372036854775808 }}` actually result in `-9223372036854775808`. Since negation and number declaration are separate operations, the value would be first truncated to `9223372036854775807` and then negated, making it impossible to write a literal INT64_MIN value without tracking the overflow. Also fix the number parsing logic to not trucate intergers to 32bit. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-07	lexer: fix encoding of unicode surrogate pairs	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-06	treewide: refactor internal AST structures	Jo-Philipp Wich
	- unify operand and value tag structures - use a contiguous array for storing opcodes - use relative offsets for next and children ops - defer function creation to runtime - rework "this" context handling by storing context pointer in scope tags Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-09-02	treewide: rename double and null value constructor functions	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-08-25	treewide: add proper null value handling	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-08-25	treewide: introduce this keyword	Jo-Philipp Wich
	Support a new keyword `this` which allows functions to access the context they're called upon. Signed-off-by: Jo-Philipp Wich <jo@mein.io>
2020-08-25	lexer.c, eval.c: move T_EXCEPTION definition to lexer header	Jo-Philipp Wich
	Signed-off-by: Jo-Philipp Wich <jo@mein.io>