summaryrefslogtreecommitdiffhomepage
path: root/docs/README.md
blob: 660f175ad5f7020458d972b7b234d23e1bc76a1f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
# The ucode Scripting Language

The ucode language is a tiny general purpose scripting language featuring a
syntax closely resembling ECMAScript. It can be used in a stand-alone manner
by using the ucode command line interpreter or embedded into host applications
by linking libucode and utilizing its C language API. Additionally, ucode can
be invoked in template mode where control flow and expression logic statements
are embedded in Jinja-like markup blocks.

Besides aiming for small size, the major design goals of ucode are the ability
to trivially read and write JSON data, good embeddability into C applications,
template capabilities for output formatting, extensiblity through loadable
native extension modules and a straightforward set of built-in functions
mimicking those found in the Perl 5 language.

## History and Motivation

In spring 2021 it has been decided to rewrite the OpenWrt firewall framework on
top of nftables with the goal to replace the then current C application with a
kind of preprocessor generating nftables rulesets using a set of templates
instead of relying on built-in hardcoded rules like its predecessor.

That decision spurred the development of *ucode*, initially meant to be a
simple template processor solely for the OpenWrt nftables firewall but quickly
evolving into a general purpose scripting language suitable for a wider range
of system scripting tasks.

Despite OpenWrt predominantly relying on POSIX shell and Lua as system
scripting languages already, a new solution was needed to accomodate the needs
of the new firewall implementation; mainly the ability to efficiently deal with
JSON data and complex data structures such as arrays and dictionaries and the
ability to closely interface with OpenWrt's *ubus* message bus system.

Throughout the design process of the new firewall and its template processor,
the following design goals were defined for the *ucode* scripting language:

 - Ability to embed code logic fragments such as control flow statements,
   function calls or arithmetic expressions into plain text templates, using
   a block syntax and functionality roughly inspired by Jinja templates
 - Built-in support for JSON data parsing and serialization, without the need
   for external libraries
 - Distinct array and object types (compared to Lua's single table datatype)
 - Distinct integer and float types and guaranteed 64bit integer range
 - Built-in support for bit operations
 - Built-in support for (POSIX) regular expressions
 - A comprehensive set of built-in standard functions, inspired by the core
   functions found in the Perl 5 interpreter
 - Staying as close to ECMAScript syntax as possible due to higher developer
   familiarity and to be able to reuse existing tooling such as editor syntax
   highlighting
 - Bindings for all relevant Linux and OpenWrt APIs, such as *ubus*, *uci*,
   *uloop*, *netlink* etc.
 - Procedural, synchronous programming flow
 - Very small executable size (the interpreter and runtime is currently around
   64KB on ARM Cortex A9)
 - Embeddability into C host applications

Summarized, *ucode* can be described as synchronous ECMAScript without the
object oriented standard library.


## Installation

### OpenWrt

In OpenWrt 22.03 and later, *ucode* should already be preinstalled. If not,
it can be installed via the package manager, using the `opkg install ucode`
command.

### MacOS

To build on MacOS, first install *cmake* and *json-c* via
[Homebrew](https://brew.sh/), then clone the ucode repository and execute
*cmake* followed by *make*:

    $ brew install cmake json-c
    $ git clone https://github.com/jow-/ucode.git
    $ cd ucode/
    $ cmake -DUBUS_SUPPORT=OFF -DUCI_SUPPORT=OFF -DULOOP_SUPPORT=OFF .
    $ make
    $ sudo make install

### Debian

The ucode repository contains build recipes for Debian packages, to build .deb
packages for local installation, first install required development packages,
then clone the repository and invoke *dpkg-buildpackage* to produce the binary
package files:

    $ sudo apt-get install build-essential devscripts debhelper libjson-c-dev cmake pkg-config
    $ git clone https://github.com/jow-/ucode.git
    $ cd ucode/
    $ dpkg-buildpackage -b -us -uc
    $ sudo dpkg -i ../ucode*.deb ../libucode*.deb

### Other Linux systems

To install ucode from source on other systems, ensure that the json-c library
and associated development headers are installed, then clone and compile the
ucode repository:

    $ git clone https://github.com/jow-/ucode.git
    $ cd ucode/
    $ cmake -DUBUS_SUPPORT=OFF -DUCI_SUPPORT=OFF -DULOOP_SUPPORT=OFF .
    $ make
    $ sudo make install


## Syntax

### Template mode

By default, *ucode* is executed in *raw mode*, means it expects a given source
file to only contain script code. By invoking the ucode interpreter with the
`-T` flag or by using the `utpl` alias, the *ucode* interpreter is switched
into *template mode* where the source file is expected to be a plaintext file
containing *template blocks* containing ucode script expressions or comments.

#### Block types

There are three kinds of blocks; *expression blocks*, *statement blocks* and
*comment blocks*. The former two embed code logic using ucode's JavaScript-like
syntax while the latter comment block type is simply discarded during
processing.


##### 1. Statement block

Statement blocks are enclosed in an opening `{%` and a closing `%}` tag and
may contain any number of script code statements, even entire programs.

It is allowed to omit the closing `%}` of a statement block to parse the
entire remaining source text after the opening tag as ucode script.

By default, statement blocks produce no output and the entire block is
reduced to an empty string during template evaluation but contained script
code might invoke functions such as `print()` to explicitly output contents.

For example the following template would result in `The epoch is odd` or
`The epoch is even`, depending on the current epoch value:

`The epoch is {% if (time() % 2): %}odd{% else %}even{% endif %}!`


##### 2. Expression block

Expression blocks are enclosed in an opening `{{` and a closing `}}` tag and
may only contain a single expression statement (multiple expressions may be
chained with comma). The implicit result of the rightmost evaluated expression
is used as output when processing the block.

For example the template `Hello world, {{ getenv("USER") }}!` would result in
the output "Hello world, user!" where `user` would correspond to the name of
the current user executing the ucode interpreter.


##### 3. Comment block

Comment blocks, which are denoted with an opening `{#` and a closing `#}` tag
may contain arbitrary text except the closing `#}` tag itself. Comments blocks
are completely stripped during processing and are replaced with an empty string.

The following example template would result in the output "Hello world":

`Hello {# mad #}word`


#### Whitespace handling

Each block start tag may be suffixed with a dash to strip any whitespace
before the block and likewise any block end tag may be prefixed with a dash
to strip any whitespace following the block.

Without using whitespace stripping, the following example:

```
This is a first line
{% for (x in [1, 2, 3]): %}
This is item {{ x }}.
{% endfor %}
This is the last line
```

Would result in the following output:

```
This is a first line

This is item 1.
This is item 2.
This is item 3.

This is the last line
```

By adding a trailing dash to apply whitespace stripping after the block, the
empty lines can be eliminated:

```
This is a first line
{% for (x in [1, 2, 3]): -%}
This is item {{ x }}.
{% endfor -%}
This is the last line
```

Output:

```
This is a first line
This is item 1.
This is item 2.
This is item 3.
This is the last line
```

By applying whitespace stripping before the block, all lines can be joined
into a single output line:

```
This is a first line
{%- for (x in [1, 2, 3]): -%}
This is item {{ x }}.
{%- endfor -%}
This is the last line
```

Output:

```
This is a first lineThis is item 1.This is item 2.This is item 3.This is the last line
```

### Script syntax

The ucode script language - used either within statement and expression blocks
or throughout the entire file in *raw mode*, uses untyped variables and employs
a simplified JavaScript like syntax.

The language implements function scoping and differentiates between local and
global variables. Each function has its own private scope while executing and
local variables declared inside a function are not accessible in the outer
calling scope.

#### 1. Data types

Ucode supports seven different basic types as well as two additional special
types; function values and ressource values. The supported types are:

 - Boolean values (`true` or `false`)
 - Integer values (`-9223372036854775808` to `+9223372036854775807`)
 - Double values (`-1.7e308` to `+1.7e308`)
 - String values (e.g. `'Hello world!'` or `"Sunshine \u2600!"`)
 - Array values (e.g. `[1, false, "foo"]`)
 - Object values (e.g. `{ foo: true, "bar": 123 }`)
 - Null value (`null`)

Ucode utilizes reference counting to manage memory used for variables and values
and frees data automatically as soon as values go out of scope.

Numeric values are either stored as signed 64bit integers or as IEEE 756 double
value. Conversion between integer and double values can happen implicitly, e.g.
through numeric operations, or explicitely, e.g. by invoking functions such as
`int()`.

#### 2. Variables

Variable names must start with a letter or an underscore and may only contain
the characters `A`..`Z`, `a`..`z`, `0`..`9` or `_`. By prefixing a variable
name with the keyword `let`, it is declared in the local block scope only
and not visible outside anymore.

Variables may also be declared using the `const` keyword. Such variables follow
the same scoping rules as `let` declared ones but they cannot be modified after
they have been declared. Any attempt to do so will result in a syntax error
during compilation.

```javascript
{%

  a = 1;  // global variable assignment

  function test() {
    let b = 2;  // declare `b` as local variable
    a = 2;        // overwrite global a
  }

  test();

  print(a, "\n");  // outputs "2"
  print(b, "\n");  // outputs nothing

  const c = 3;
  print(c, "\n");  // outputs "3"

  c = 4;           // raises syntax error
  c++;             // raises syntax error

  const d;         // raises syntax error, const variables must
                   // be initialized at declaration time

%}
```

#### 3. Control statements

Similar to JavaScript, ucode supports `if`, `for` and `while` statements to
control execution flow.

##### 3.1. Conditional statement

If/else blocks can be used to execute statements depending on a condition.

```javascript
{%

  user = getenv("USER");

  if (user == "alice") {
      print("Hello Alice!\n");
  }
  else if (user == "bob") {
      print("Hello Bob!\n");
  }
  else {
      print("Hello guest!\n");
  }

%}
```

If only a single statement is wrapped by an if or else branch, the enclosing
curly braces may be omitted:

```javascript
{%

  if (rand() == 3)
      print("This is quite unlikely\n");

%}
```

##### 3.2. Loop statements

Ucode script supports three different flavors of loop control statements; a
`while` loop that executes enclosed statements as long as the loop condition is
fulfilled, a `for in` loop that iterates keys of objects or items of arrays and
a counting `for` loop that is a variation of the `while` loop.

```javascript
{%

  i = 0;
  arr = [1, 2, 3];
  obj = { Alice: 32, Bob: 54 };

  // execute as long as condition is true
  while (i < length(arr)) {
      print(arr[i], "\n");
      i++;
  }

  // execute for each item in arr
  for (n in arr) {
      print(n, "\n");
  }

  // execute for each key in obj
  for (person in obj) {
      print(person, " is ", obj[person], " years old.\n");
  }

  // execute initialization statement (j = 0) once
  // execute as long as condition (j < length(arr)) is true
  // execute step statement (j++) after each iteration
  for (j = 0; j < length(arr); j++) {
      print(arr[j], "\n");
  }

%}
```

##### 3.3. Alternative syntax

Since conditional statements and loops are often used for template formatting
purposes, e.g. to repeat a specific markup for each item of a list, ucode
supports an alternative syntax that does not require curly braces to group
statements but that uses explicit end keywords to denote the end of the control
statement body for better readability instead.

The following two examples first illustrate the normal syntax, followed by the
alternative syntax that is more suitable for statement blocks:

```
Printing a list:
{% for (n in [1, 2, 3]) { -%}
  - Item #{{ n }}
{% } %}
```

The alternative syntax replaces the opening curly brace (`{`) with a colon
(`:`) and the closing curly brace (`}`) with an explicit `endfor` keyword:

```
Printing a list:
{% for (n in [1, 2, 3]): -%}
  - Item #{{ n }}
{% endfor %}
```

For each control statement type, a corresponding alternative end keyword is defined:

  - `if (...): ... endif`
  - `for (...): ... endfor`
  - `while (...): ... endwhile`


#### 4. Functions

Ucode scripts may define functions to group repeating operations into reusable
operations. Functions can be both declared with a name, in which case they're
automatically registered in the current scope, or anonymously which allows
assigning the resulting value to a variable, e.g. to build arrays or objects of
functions:

```javascript
{%

  function duplicate(n) {
       return n * 2;
  }

  let utilities = {
      concat: function(a, b) {
          return "" + a + b;
      },
      greeting: function() {
          return "Hello, " + getenv("USER") + "!";
      }
  };

-%}

The duplicate of 2 is {{ duplicate(2) }}.
The concatenation of 'abc' and 123 is {{ utilities.concat("abc", 123) }}.
Your personal greeting is: {{ utilities.greeting() }}.
```

##### 4.1. Alternative syntax

Function declarations support the same kind of alternative syntax as defined
for control statements (3.3.)

The alternative syntax replaces the opening curly brace (`{`) with a colon
(`:`) and the closing curly brace (`}`) with an explicit `endfunction`
keyword:

```
{% function printgreeting(name): -%}
  Hallo {{ name }}, nice to meet you.
{% endfunction -%}

<h1>{{ printgreeting("Alice") }}</h1>
```


#### 5. Operators

Similar to JavaScript and C, ucode scripts support a range of different
operators to manipulate values and variables.

##### 5.1. Arithmetic operations

The operators `+`, `-`, `*`, `/`, `%`, `++` and `--` allow to perform
additions, substractions, multiplications, divisions, modulo, increment or
decrement operations respectively where the result depends on the type of
involved values.

The `++` and `--` operators are unary, means that they only apply to one
operand. The `+` and `-` operators may be used in unary context to either
convert a given value to a numeric value or to negate a given value.

If either operand of the `+` operator is a string, the other one is converted
to a string value as well and a concatenated string is returned.

All other arithmetic operators coerce their operands into numeric values.
Fractional values are converted to doubles, other numeric values to integers.

If either operand is a double, the other one is converted to a double value as
well and a double result is returned.

Divisions by zero result in the special double value `Infinity`. If an operand
cannot be converted to a numeric value, the result of the operation is the
special double value `NaN`.

```javascript
{%
  a = 2;
  b = 5.2;
  s1 = "125";
  s2 = "Hello world";

  print(+s1);      // 125
  print(+s2);      // NaN
  print(-s1);      // -125
  print(-s2);      // NaN
  print(-a);       // -2

  print(a++);      // 2 (Return value of a, then increment by 1)
  print(++a);      // 4 (Increment by 1, then return value of a)

  print(b--);      // 5.2 (Return value of b, then decrement by 1)
  print(--b);      // 3.2 (Decrement by 1, then return value of b)

  print(4 + 8);    // 12
  print(7 - 4);    // 3
  print(3 * 3);    // 9

  print(10 / 4);   // 2 (Integer division)
  print(10 / 4.0); // 2.5 (Double division)
  print(10 / 0);   // Infinity

  print(10 % 7);   // 3
  print(10 % 7.0); // NaN (Modulo is undefined for non-integers)
%}
```

##### 5.2. Bitwise operations

The operators `&`, `|`, `^`, `<<`, `>>` and `~` allow to perform bitwise and,
or, xor, left shift, right shift and complement operations respectively.

The `~` operator is unary, means that is only applies to one operand.

```javascript
{%
  print(0 & 0, 0 & 1, 1 & 1);  // 001
  print(0 | 0, 0 | 1, 1 | 1);  // 011
  print(0 ^ 0, 0 ^ 1, 1 ^ 1);  // 010
  print(10 << 2);              // 40
  print(10 >> 2);              // 2
  print(~15);                  // -16 (0xFFFFFFFFFFFFFFF0)
%}
```

An important property of bitwise operators is that they're coercing their
operand values to whole integers:

```javascript
{%
  print(12.34 >> 0);   // 12
  print(~(~12.34));    // 12
%}
```

##### 5.3. Relational operations

The operators `==`, `!=`, `<`, `<=`, `>` and `>=` test whether their operands
are equal, inequal, lower than, lower than/equal to, higher than or higher
than/equal to each other respectively.

If both operands are strings, their respective byte values are compared, if
both are objects or arrays, their underlying memory addresses are compared.

In all other cases, both operands are coerced into numeric values and the
resulting values are compared with each other.

This means that comparing values of different types will coerce them both to
numbers.

The result of the relational operation is a boolean indicating truishness.

```javascript
{%
  print(123 == 123);     // true
  print(123 == "123");   // true!
  print(123 < 456);      // true
  print(123 > 456);      // false
  print(123 != 456);     // true
  print(123 != "123");   // false!
  print({} == {});       // false (two different anonymous objects)
  a = {}; print(a == a); // true (same object)
%}
```

##### 5.4. Logical operations

The operators `&&`, `||`, `??` and `!` test whether their operands are all true,
partially true, null or false respectively.

In the case of `&&` the rightmost value is returned while `||` results in the
first truish and `??` in the first non-null value.

The unary `!` operator will result in `true` if the operand is not trueish,
otherwise it will result in `false`.

Operands are evaluated from left to right while testing truishness, which means
that expressions with side effects, such as function calls, are only executed
if the preceeding condition was satisifed.

```javascript
{%
  print(1 && 2 && 3);                 // 3
  print(1 || 2 || 3);                 // 1
  print(2 > 1 && 3 < 4);              // true
  print(doesnotexist ?? null ?? 42);  // 42
  print(1 ?? 2 ?? 3);                 // 1
  print(!false);                      // true
  print(!true);                       // false

  res = test1() && test2();  // test2() is only called if test1() returns true
%}
```

##### 5.5. Assignment operations

In addition to the basic assignment operator `=`, most other operators have a
corresponding shortcut assignment operator which reads the specified variable,
applies the operation and operand to it, and writes it back.

The result of assignment expressions is the assigned value.

```javascript
{%
  a = 1;     // assign 1 to variable a
  a += 2;    // a = a + 2;
  a -= 3;    // a = a - 3;
  a *= 4;    // a = a * 4;
  a /= 5;    // a = a / 5;
  a %= 6;    // a = a % 6;
  a &= 7;    // a = a & 7;
  a |= 8;    // a = a | 8;
  a ^= 9;    // a = a ^ 9;
  a <<= 10;  // a = a << 10;
  a >>= 11;  // a = a >> 11;
  a &&= 12;  // a = a && 12;
  a ||= 13;  // a = a || 13;
  a ??= 14;  // a = a ?? 14;

  print(a = 2);  // 2
%}
```

##### 5.6. Miscellaneous operators

Besides the operators described so far, ucode script also supports a `delete`
operator which removes a property from an object value.

```javascript
{%
  a = { test: true };

  delete a.test;         // true
  delete a.notexisting;  // false

  print(a);              // { }
%}
```