Oforth Tutorial 1 : Basics

I - Oforth interpreter

Oforth interpreter is launched using --i option. If installation is ok (otherwise check OFORTH_PATH variable), you should see something like that, which means that oforth interpreter is now waiting for input :

/oforth>oforth --i
Oforth V1.2.0 , Copyright (c) 2014-2018 Franck Bensusan
Oforth comes with ABSOLUTELY NO WARRANTY; for details, see LICENSE.TXT file

Console detected (--i option), Loading package : D:\Home\Oforth\Oforth/packs/console.pkg
- Float support : yes
- Multi thread : yes
- TCP support : yes
- FFI support : yes

Memory used
- Code : 64 Ko ( 29388 bytes free)
- Dict+Heap : 64 Ko ( 14160 bytes free)
- Total : 128 Ko
ok
>

Interpreter is very simple : a word (separator is space) is read and performed before reading next word. For instance, if you write :

> 1 1.2 "aaaa"


Interpreter reads 1 and performs it, reads 1.2 and performs it, reads "aaaa" and performs it, and so on until there is no more word to read and perform.

Leaving the interpreter is done using #bye function.

> bye


II - Data stack

Unlike most languages, Oforth uses a stack, the data stack, to hold created objects, parameters and returned values. It is a Last In First Out stack.

To do so, Oforth uses a RPN notation : parameters are all pushed on the data stack before calling a function or method. The  function uses thoses parameters to do its job, then, if necessary, pushes its return value on the data stack. This can be strange at the beginning, but this allow to keep interpreter as simple as possible : no need for instructions separator, no need to read an entire instruction to decide what to do, the interpreter just reads a word at a time and performs it.

For example, in order to perform "1 + 2", you write :

>1 2 +

- "1" is read and performed, and the result is to push Integer 1 on the data stack

- "2" is read and performed, and the result is to push Integer 2 on the stack : now 2 integers are on the stack.

- "+" is read and performed,  and the result is to remove 2 objects from the stack, add them and push the result on the data stack.

After that, the data stack has only one object : integer 3. Function #.s can be used to print the data stack :

>.s
[1] (Integer) 3
ok

This means that the data stack has one element, that it is an integer and that its value is 3.

Remember that the data stack is a LIFO stack, so if you enter now :

>1.2 "aaa" "bbb" Integer 12 13 + .s
[1] (Integer) 25
[2] (Class) Integer
[3] (String) bbb
[4] (String) aaa
[5] (Float) 1.2
[6] (Integer) 3
ok

We see that :

1) Integer 3 (the result of 1+2) is now very far from the top of the stack : it is now at position 6

2) 12 and 13 have been added and 25 is on the top of the stack (position 1), without changing objects under them.

3) All objects on the stack have a class and a value.

4) Everything is an object. For instance, Integer is a class but it is also an object that can be pushed on the stack like any other object. Integer is an instance of class Class (at position 2 on this stack).

5) If we go a little deeper into the interpreter work : if a word is detected as an object, it is pushed on the stack and if the word is detected as a function or method, it is executed immediatly.

To finish with the stack, let's clear it, using #clr function

>clr .s
Empty
ok

Now the data stack is empty.

Using RPN is not natural and requires some practice. Let's calculate : sqrt(1 + ln(12.4))

>1 12.4 ln + sqrt .s
[1] (Float) 1.87555231135018
ok

1 has been pushed on the stack, then 12.4, then 12.4 has been replaced by ln(12.4) and 1 is still here, #+ added those two numbers, then #sqrt has replaced it with its square root, which, at the end, is the only object remaining on the stack. As you can see, with RPN, no parenthesis is needed : the stack holds all intermediary results.

If you want to print the stack after each command, you can use #.show function. Default is to not print the stack after each command.

You can also use #. to print AND consume the top of the stack.

III - Hello, World

Now that we are a little more familiar with the data stack, we can write our first word.

>"Hello, World" .
Hello, World ok

String "Hello, World" is pushed on the stack and #. removes it and send it to the console. So what remains is to write this code in a function.

A function is just a named piece of code that will be executed when the interpreter reads the function's name. A function declaration begins with : followed by function name, and ends with ; . Between, are the function instructions that will be executed when this function is called.

: hello
    "Hello, World" . ;
ok
>hello
Hello, World ok

Quite simple, but we must understand that #hello, like all functions or methods, has used the data stack : it pushed a string on it, then removed it.

The data stack is shared by all functions : you can imagine it as a pile of plates. When #hello runs and pushes a string on the stack, it adds a plate to the pile. And when #hello runs #., the string is removed from the stack and the pile is again as it was before #hello call.

To end with this first function, it is time to answer a question you might have in mind : from the beginning of this tutorial, why a '#' is used before functions or methods names ? We have to remember that, eveything is an object that can be pushed on the stack, used as parameter, ... It is also the case for functions and methods. Creating a function #hello is creating an instance of Function class, a function object with "hello" as its name. When "hello" string is sent to the interpreter, it detects a function name and execute the function. Now if we send #hello to the interpreter, it detects an object and pushes it on the stack.

>#hello .s
[1] (Function) #hello
ok

Object #hello is now on the stack. This object can be used as parameter for others functions or methods as any other object. For instance, #execute is a method that executes the object on the top of the stack :

>#hello execute
Hello, World
ok
>#hello #execute #execute #execute execute
Hello, World
ok

Various methods can be used on function or method objects. Let's finish this chapter with some examples :

>#hello name .
hello
ok
>10 #hello times
Hello, World Hello, World Hello, World Hello, World Hello, World Hello, World Hello, World Hello, World Hello, World Hello, World ok
>#hello bench .
Hello, World 236
ok

For the last example, #bench executes the top of the stack and returns the number of microseconds elapsed. Here, it took 236 microseconds to execute #hello.

V - Data stack operations and conventions

Lots of words allow to manipulate the data stack elements. The data stack only holds references to objects. Objects themselves (with the big exception of integers) are stored elsewhere (on the heap or into the dictionary). So, when you manipulate the data stack elements, you manipulate only references to objects. When you remove an element from the DS, you are just removing a reference, you don't remove an object. All objects, whatever their size is (strings, list, ...), consume only one slot, that holds the reference to this object.

Comments allow to describe the effect of a word on the data stack. Comments are initiated using #\ or #--. A stack effect is decribed by showing elements used by a word, then --, then elements returned by a word. The convention is that the top of the stack is always to the right part. For instance :

\ a b c -- d e

decribes a word that takes 3 elements from the stack (the top of the stack is c) and that, when returning, leaves 2 elements on the stack (the top of the stack is e).

Convention for elements type is :

n : an integer
u : an unsigned integer
b : a boolean (false or true)
f : a float
c : a character
s : a string
x,y,z : an object, whatever its type is
[x] : a collection of objects


Now, we can introduce words that manipulate the data stack :

dup   \ x -- x x : duplicates the top of the stack

Again, only the reference is duplicated, not the object itself. For instance :

>"abcd" dup .s
[1] (String) abcd
[2] (String) abcd

Here, the data stack, hold 2 times the same object.


drop   \ x -- : Removes the top of the stack

swap   \ x y -- y x : Swap the 2 elements on top of the stack

over   \ x y -- x y x : Copy the second item on top of the stack

tuck   \ x y -- y x y : Copy the top of stack under the second element

rot   \ x y z -- y z x : Rotate the position of 3 elements (third becomes top)

tor   \ x y z -- z x y : Rotate the position of 3 elements (top becomes third)

nip   \ x y -- y : Removes the second element (same as "swap drop")


Other words exist to manipulate the stack, but these words will be used during the tutorial. You can see "lang" references for a list of all words.

VI - Oforth words

Words are named objects defined into the dictionary :

  1. They are available during all the current session and are not removed by garbage collector even if not used.
  2. They are detected by the interpreter that will handle them according to their class.
  3. Two words can't have the same name : if a new word is created with the same name, an exception is thrown.

Words define Oforth object metamodel. They can be :

  • a class : it is an instance of Class class and is used to instanciate objects of this class. When the interpreter executes a class, it pushes it on the stack.

>Integer .s
[1] (Class) Integer
ok

  • a property : it is an instance of Property class and is used to set properties to classes. Like classes, when the interpreter executes a property, it pushes it on the stack

>Comparable .s
[1] (Property) Comparable
ok

  • a function : it is an instance of Function class and is used to name a piece of code. When the interpreter executes a function, it executes function's code.
  • a method : it is an instance of Method class. Each class can have an specific implementation of a method (polymorphism). When the interpreted executes a method, it execute a specific implementation according to the class of the object on the top of the stack.
  • a constant : it is an instance of Constant class. When the interpreter executes a constant, it pushes constant value on the stack.

>PI .s
[1] (Float) 3.14159265358979
ok

Other words type exist but we will discuss about them later.

VII - Functions

Functions are named piece of code. A general function declaration is :

: function_name
| var1 var2 ... varn |
    instructions ;

A function declaration begin by #: and ends with #; . Into this declaration :

  • function_name is the name of the function, which must not be the name of an already defined word.
  • var1, ... varn are function local variables
    If the function does not declare local variables, || are not required (see #hello)
    They are initialized to null value.
    They can be used to store intermediary results instead of keeping them on the data stack.
  • Using a variable name into instructions will push the variable value on the stack.
  • Using "->var" into instructions will remove the top of the stack and store it into variable var.
  • instructions are function code that will be executed when the function is called. A instruction can be :
    • Push a word on the stack (a constant, a class, ...)
    • Push a local variable on the stack : this is done by just using variable name
    • Remove the top of the stack and store it into a local variable : this is done using " ->var_name"
    • Call a function or a method : this is done by just using function or method name.
    • Use a built-in instruction : ifTrue:, ifFalse:, else:, while, doWhile, loop: , for:, try:, return, ...

Local variables define function environment. This environnement is not on the stack, it it linked to function call and is removed when the function returns.  So storing the top of stack  into a variable will remove it from the stack and store it into the function environnement. It then can be pushed on stack, used to call other functions, ...

Example 1 : square

: square   \ x -- y
   dup * ;

#dup duplicates the top of the stack. So, to calculate the square of a number, we #dup it and call #*. The result is already on the stack, so there is nothing more to do (this is just an example, this word already exists as #sq into lang package).


Example 2 : fact (imperative)

: fact   \ n1 -- n2
   | i | 1 swap loop: i [ i * ] ;

The comment says that #fact takes an integer from the stack and leaves an integer on the stack when it returns.

Into the function body :

  1. A local variable is declared (loops need a local variable).
  2. 1 is pushed on the stack.
  3. #swap exchange the two objects on top of the stack, so the number we want to calculate the factorial is now on top of the stack and 1 is behind.
  4. loop: i [ instructions ] is a built-in instruction : it removes an integer from the stack and will loop on instructions for each integer between 1 and this integer (included). Into thoses instructions, the local variable can be pushed on the stack and contains the current index.
  5. So here, i is pushed on the stack, the two integers on top of the stack are removed, multiplied and the result is pushed on the stack.

Do you agree that this function will replace the top of the stack with its factorial ? If you are not familiar with RPN and data stack and don't see how this factorial is calculated, try to follow the data stack while the loop is performed. I suggest you to not continue this tutorial before you fully understand this example and how the data stack is used.

So, let's try this function :

>10 fact .
3628800 ok

Integers have arbitrary precision, so you can also try :

>100 fact .
9332621544394415268169923885626670049071596826438162146859296389521759999322991560894146397615651828 6253697920827223758251185210916864000000000000000000000000 ok

This example also introduces a new convention : words that require something after to work (which is not the natural way with RPN) are postfixed by ':'. For instance, #loop: needs a local variable to work, so it is postfixed.


Example 3 : fact (imperative) with a parameter

Oforth allows to declare parameters on functions. Parameters are local variables, but are initialized with values on the stack (instead of null value) and those values are removed from the stack. Like local variables, parameters are stored into the function environnement. Parameters are never mandatory, but they can sometimes simplify stack operations. Let's rewrite #fact with a parameter :

: fact( n -- n! )
   | i | 1 n loop: i [ i * ] ;

A parameter (n) has been declared for #fact. So, when #fact is executed, it removes the top of the stack and store it as parameter value. Here, this parameter is just pushed on the stack after pushing 1, and a #swap is not needed. There is no really advantage to use a parameter here.

Note for Forthers : Oforth does not provides direct access to return stack with words like >R and R> so, sometimes, declaring locals (parameters or variables) will be required.


Example 4 : fact (recursive)

: fact( n -- n1 )
   n ifZero: [ 1 ] else: [ n dup 1 - fact * ] ;

Now fact is defined as a recursive function. This function introduces tests. Oforth provides various tests and they all work the same way : they remove the top of the stack and test it. If the test is OK, instructions within brackets that follow are executed. Otherwise, the function jumps after those instructions. A test can always be followed by a #else: block. If present, this block is only performed if the test is KO.

Tests provided are :

ifTrue: [ ... ] : instructions are performed only if the top of the stack is not "false" (ie not 0)
ifFalse: [ ... ] : instructions are performed only if the top of the stack is "false" (ie 0)
ifZero: [ ... ] : instructions are performed only if the top of the stack is 0
ifNull: [ ... ] : instructions are performed only if the top of the stack is null
ifNotNull: [ ... ] : instructions are performed only if the top of the stack is not null

Now the function : if n is 0, #fact just returns 1 on the stack. Otherwise, it pushes n, pushes n-1, call #fact recursively and multiply the result. Because n is a parameter of #fact, its value is stored into environnement call and is different for each call to #fact.


Example 5 : fibonacci sequence, introduction to blocks

Fibonacci sequence is u(n+1) = u(n)+u(n-1). It can be written by many ways.

The first one is to use a reccurence :

: fib( n -- n1 )
   n 1 <= ifTrue: [ n ] else: [ n 2 - fib n 1 - fib + ] ;

Not much to say about this implementation, it is the same logic than #fact.

The second one is to use the data stack to hold intermediary results. If fib(n-1) and fib(n) are on the stack, what are the instructions that transform them into fib(n) and fib(n+1) ?
A way to do this is : "tuck +" :
- fib(n-1) fib(n) --> tuck --> fib(n) fib(n-1) fib(n) --> + --> fib(n) fib(n+1)

So performing n times "tuck +" on 1 1 will calculate fib(n) and fib(n+1) and, after removing fib(n+1), fib(n) will remains on the data stack.

: fib( n -- n1 )
   | i | 1 1 n loop: i [ tuck + ] drop ;

But Oforth does not like loops : loops are prone to bugs, they need a local variable to be declared and they are not very "functional". Here, we just want to perform "tuck +" n times, without using the index. So we can use the #time function we used on #hello earlier.

times    \ n aWord -- ... : execute the word n times

For this, we could create a word that perform "tuck +" and use this word into #fib. But what would be the name of this word ? Names created must be representative and reusable. Here, it seems better to use an anonymous function. Forthers call this a quotation, Oforth calls this a block. A block is a word with no name and is declared like this :

#[ tuck + ]

When a block is created, it is pushed on the stack like any other object. Then it can be used as parameter and can be executed using #execute. So the new fib version could be

: fib( n -- m )
   1 1 n #[ tuck + ] times drop ;


Blocks are the reason why syntax for tests is different from classical Forth. When you write :

ifTrue: [ ... ]

this simulates conditional block execution according to the top of the stack and is comparable to block syntax.

Blocks are a powerful feature. Functions can return blocks and you will see later that function can even return a block that uses function's variables. In this case, blocks are closures that will react according to parameters of the function that create them (even if the function that created them have returned). For instance, you can write :

>: f( a -- aBlock )
   #[ a + ] ;
>12 13 f execute .
25 ok


Example 6 : Sum of digits of a number

This functions will introduce the #while loop. This loop allows to perform instructions while a test is OK :

while( instructions1 ) [ intructions2 ]

instructions1 are performed and must leave an object on the stack. If this object is not false, instruction2 are performed and the program flow jumps back to perform instruction1 again. This will continue until instruction1 returns false (or 0). In this case, the functions jumps after the instruction2 block

: sumDigits( n -- n1 )
   0 while( n ) [ n 10 /mod ->n + ] ;

n is pushed on the stack. If n is 0, the function jumps after the block (and finish). If n is not 0 the block is performed using /mod, which is a word that returns 2 objects : the remained and the quotient of two numbers :

/mod    \ n d -- remainder quotient

If #fact is still declared into your system, you can try :

1000 fact sumDigits .
10539 ok


Example 7 : Parameter's order

When more than one parameter are declared for a function, the last declared parameter value is retrieved from the top of the stack, and so on until all paramters are removerd from the stack. So, declaring this function :

: f( a b -- n ) a b - ;

will store (and remove) the top of the stack into parameter b, will store (and remove) the second element and store it into a. So this function will calculate the second element of the stack minus the top. Its stack effect will be : \ n m -- (n-m). This code is equivalent to :

: f | a b | ->b ->a a b - ;


IX - Basic types

Oforth defines some basic types that are detected by the interpreter. Interpreter loop will :

  1. Retrieve the next word (until a space is detected).
  2. Check if this word is defined into the dictionary.
  3. If the word is a function or a method, execute it
  4. Otherwise, if the word is a constant, push its value on the stack.
  5. Otherwise, if the word is an integer, push that integer on the stack.
  6. Otherwise, if the word is a float, push that float on the stack.
  7. Otherwise, throw an exception : "Can't evaluate".

Integers
Integer is the most basic type. They have arbitrary length. If an integer is a small integer (-2^30 - 2^30 on 32bits), it is pushed on the stack. Otherwise, an object is created and its reference is pushed on the stack.
Integers respond to lots of words (see the Oforth manual for a complete list). Among those words, we find :

+    \ n1 n2 -- n1+n2 : sum of two integers.
-    \ n1 n2 -- n1+n2 : diffence of two integers.
*    \ n1 n2 -- n1*n2 : multiplication of two integers.
+    \ n1 n2 -- n1/n2 : division of two integers.
mod    \ n1 n2 -- n3 : Returns the remainder of n1 by n2
/mod    \ n1 n2 -- n3 n4 : n3 is the remainder of n1 by n2 and n4 is the quotient.


Booleans
Oforth does not have a boolean type. Boolean are integers : 0 is the boolean false and 1 is the bollean true. Everything that is not false is considered as true.
Integers respond to logical words :

not    \ b1 b2 -- b
and    \ b1 b2 -- b
or    \ b1 b2 -- b
xor    \ b1 b2 -- b


Floats
Floats are also detected by the interpreter.
They respond to lots of words (see "lang" package reference for a complete list. Because Oforth uses polymorphism, the same words are used for integers and for floats. Among those words, we find :

+    \ f1 f2 -- f1+f2 : sum of two floats.
-    \ f1 f2 -- f1+f2 : diffence of two floats.
*    \ f1 f2 -- f1*f2 : multiplication of two floats.
+    \ f1 f2 -- f1/f2 : division of two floats.


Floats and integers are comparable and respond to all mathematical functions :

<=    \ n1 n2 -- b : Returns true if n1 <= n2, false otherwise
<    \ n1 n2 -- b : Returns true if n1 > n2, false otherwise
>    \ n1 n2 -- b : Returns true if n1 > n2, false otherwise.
>=    \ n1 n2 -- b : Returns true if n1 >= n2, false otherwise.
cos, sin, tan, acos, asin, atan, ...


Characters
Oforth does not have a character type. Characters are integers which value is character unicode value.

>'a' >upper .
65 ok


Strings
Strings are an array of UTF8 characters. They are created using "".

>"abcd" "efgh" + .
abcdefgh ok
>"abcdef" toUpper .
ABCDE ok


IX - Constants and immutability

Immutability, even is not mandatory (a mutable class can be created, see Classes chapter), is the default behaviour. Objects have to be immutables for some particular uses and this immutability is checked at runtime.

Among immutability contraints, we find :

  1. Set class immutables attributes (discussed into Classes chapter).
  2. Pass objects between tasks (discussed into Parallelism chapter).
  3. Set constant value.

A constant is a word that will be visible by all functions and methods. A strong constraint is to block mutable objects for value of constants.

A constant is created using the following syntax :

value const: constant_name

For instance (see Float.of file) :

-1.0 acos const: MyPi

Now Pi is defined as a word and each time the interpreter reads a word that is a constant, it pushes its value on the stack.

Classes does not have a syntax to declare internal constant. In order to link a constant to a class a naming convention is used : put the class name as the first part of the constant name. For instance (see TCPSocket.of) :

2 const: TCPSocket.IPV6


X - Methods and polymorphism

All the words we have created until now are functions. Functions are Forth words. In order to allow Object Oriented Programming, Oforth extends words to a new kind of words : classes and methods. Methods are words that allow to specify a particular implementation for each class.

A general method implementation declaration is :

aClass method: method_name(param1 param2 ... paramn)
| var1 var2 ... varn |
   instructions ;

This declaration defines an implementation of method_name for aClass.

Unlike functions, when a method is executed, it need a receiver to decide which implementation to use. And this receiver is always the top of the stack. When a method is called :

  1. The virtual machine checks the top of the stack object and retrieves its class.
  2. Then the correct method implementation is choosen according to this class. If no method implementation can be found for this class, an exception ExDoesNotUnderstand is raised.
  3. Then the top of the stack is removed from the stack and is stored into a special parameter : self, which is the method receiver. This parameter does not have to be declared : all methods have this parameter.
  4. If a method has declared parameters, they are handled like functions (and they should be "under" the receiver).
  5. Then the method implementation body begins, like functions.

The most important thing to understand with method implementations is point 3) : the top of the stack, like other parameters, will be removed from the stack by the virtual machine before method body is executed and stored into method environnement as "self" during body execution.


Example 1 : myDrop

>Object method: myDrop ;
ok
> 12 myDrop .s
Empty
ok

When myDrop is executed, the virtual machine check for the top of the stack. 12 is an Integer and Integer is a subclass of Object, so the virtual machine find an implementation of #myDrop compatible with Integer and execute it.

This example illustrate point 3) : the receiver (here 12) has been removed from the stack and, when the method ends, nothing is left on the stack.


Example 2 : myDup

>Object method: myDup    \ x -- x x
   self dup ;

Because the receiver has been removed from the stack, we have to push it, then dup it in order to implement a #dup method.


Example 3 : minusOne

>Integer method: minusOne    \ n1 -- n2
   self 1 - ;
ok
> 10 minusOne .s
[1] (Integer) 9
>"aaaaa" minusOne .s
[console:1] #Exception : String aaaaa does not understand method <#minusOne>
ok

#minusOne pushes the receiver one the stack and removes one to it. Because implementation has been declared for Integer, it works for integers but not for strings.


Most of built-in words are methods. This allows to have the same name for the same operation with different implementations. For instance #+ is a method and is declared for various classes: Integer, Float, String, ... You only have to remember one name : + and it will work for all objects.


Virtual methods implementation : 

When #method: is used to implement a method for a particular class, this implementation will be used if the receiver is an instance of this class or its subclasses. But subclasses can't redefine this implementation. In order to define an implementation that can be redefined into suclasses, #virtual: should be used instead of #method:

Example 4 : virtual implementation

Collection virtual: whoami
   "I am" . ;

Array method: whoami
   super whoami
   "an array ..." .
;

String method: whoami
   super whoami
   "a string ..." .
;

>[ 1, 2, 3 ] whoami
I am an array ... ok
>"abcd" whoami
I am a string ... ok

#super allows to call the method implementation declared at the parent level (and, if it does not exist, at the first level the method is declared into the hierarchy).

Some advices for virtual methods : 

  1. By default, use #method: and not #virtual. It is not a performance advise. If a subclass should not redefine a method, using #method: prevents it to do so.
  2. If subclasses have to redefine a method and #virtual is needed, we should think twice about it : is our class hierarchy truly good ? Why do subclasses really need to redefine a method if they really implement a "is a" relation.
  3. If subclasses really need to redefine the method, then declare it as virtual at the parent level.

XI - Another way to pass parameters to functions and methods

The classical way to pass parameters to function and methods is to push them on the stack before calling the function. For instance, with this function :

: fib( n -- n1 )
   n 1 <= ifTrue: [ n ] else: [ n 2 - fib n 1 - fib + ] ;
ok
>
10 fib . ;
55

Oforth interpreter allows a postfix syntax to pass parameters :

>fib( 10 ) .
55

This syntax is just "sugar" : in fact, interpreter will translate this call into "10 fib" before calling #fib. This "sugar" can also be used into function definitions. Here, we could write :

: fib( n -- n1 )
   n 1 <= ifTrue: [ n ] else: [ fib( n 2 - ) fib( n 1- ) + ] ;

Here again, this is just "sugar" : interpreter will translate this version into the previous one before compiling this function.

To be more general, if a, b and c are objects and f is a function :

a b c f
f( a, b, c )

are the same call to function f (and the second one is converted into the first call by the interpreter).

Just like functions, the interpreter also allows this "sugar" syntax for method calls. But unlike functions, the receiver must always be on the top of the stack, so, if aObject is the receiver and m a method:

x m( a, b, c )
is converted into :
a b c x m


XII - Collections and high order functions and methods

Even if Oforth is an imperative language, it can handle some functional programming features. But first, let's create some collections.

The main collection is the array (an instance of Array class).

An array can be created explicitly using syntax : [ elem1, elem2, elem3, ..., elemn ]

Elements do not have to be of the same class.

>[ 1.2, "aaaa", [1, 2, 3], 9 sqrt, null, Integer ] .s
[1] (Array) [1.2, aaaa, [1, 2, 3], 3, null, #Integer]
ok

And #seq create integer sequences :

>15 seq .s
[1] (Interval) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
ok

Collections are also created as the result of high order functions. These functions work on collections by applying functions, methods or blocks on each element of the collection. Some high order methods (see lang/Collection.of for more) :

  • #apply : this method applies a runnable on each element of a collection
  • #map : this method applies a runnable on each element of a collection and regroup results into a new list.
  • #filter : this method filters a collection according to a test and returns a new collection with elements that conform with this test.
  • ...

First let's see how #apply works, and the best way to undertand it, and its interactions with the data stack, is to see #apply source code (into lang/Collection.of) :

Object method: apply( m )   \ x -- ...
   | o | self forEach: o [ o m execute ] ;

#apply removes an object from the data stack (the collection) and enters into a loop : for each element of the receiver it pushes this element, then the runnable sent as parameter and calls #execute to execute it. If something is created on the stack, it is not removed and stay on the stack.

For instance :

>#1+ 5 seq apply .s
[1] (Integer) 6
[2] (Integer) 5
[3] (Integer) 4
[4] (Integer) 3
[5] (Integer) 2
ok

#1+ has been applied on each element of the collection and the results stay on the stack.

>0 #+ 100 seq apply .
5050 ok

Here, we push 0 on the stack, then call #apply on [1..100] with #+ as parameter : for each element, this element is pushed on the stack, then #+ is executed, which result is the sum of all integers from 1 to 100.

Here, it is interesting (but not mandatory at all) to use the "sugar" syntax presented before to clearly show that #+ is a parameter of #apply :

>0 100 seq apply( #+ )

Of course, blocks can also be sent as parameter to #apply :

>0 #[ sqrt + ] 1000 seq apply .
21097.4558874807 ok

Block sent as parameter is executed on each element of the collection : sqrt is calculated, then #+ accumulate the result, so this calculates sum of square roots of all integers from 1 to 1000

#map differs from #apply as all results are regrouped into a new collection. #map is not loaded at startup, so you have to load the mapping package :

>import: mapping
>#sqrt 5 seq map .s
[1] (Array) [1, 1.4142135623731, 1.73205080756888, 2, 2.23606797749979]

Here, #sqrt has been applied on each element and the top of the stack has been removed and added to a new array. This array is what remains on the stack when #map finishes.

Example : calculate sum of x^2+1 for all integers between 1 and 1000000 :

>0 #[ sq 1+ + ] 1000000 seq apply .s
[1] (Integer) 333333833334500000

Example : calculate sum of square roots of all even integers between 1 and 10000

Example : do you know this sequence ?

1
11
21
1211
111221
312211
13112221

each term is the result of "reading" the previous one. For instance, 1211 has 1 1, 1 2 and 2 1, so the next term is 111221.

Using #group and #mapFlat, we can calculate this sequence :

#group takes a collection and returns new collection with all identical adjacents elements regrouped int sublists:

>[ 1, 2, 1, 1 ] group .s
[1] (List) [[1], [2], [1, 1]]
ok

#mapFlat maps a collection using a collection of runnables and send the results into a new collection.

>[ #1+, #1- ] [ 1, 2, 1, 1 ] map .s
[1] (Array) [[2, 0], [3, 1], [2, 0], [2, 0]]

>[ #1+, #1- ] [ 1, 2, 1, 1 ] mapFlat .s
[1] (Array) [2, 0, 3, 1, 2, 0, 2, 0]
[2] (Array) [[2, 0], [3, 1], [2, 0], [2, 0]]
ok

So, if an element of this sequence is on top the the stack, group mapFlat( [ #size, #first ] ) will return next element :

>[ 1, 2, 1, 1 ] group mapFlat( [#size, #first] ) .s
[1] (Array) [1, 1, 1, 2, 2, 1]
ok

Finally, we just have to perform this n times to calculate the sequence :

import: mapping

: lookAndSay( n -- coll )
   [ 1 ] n #[ group mapFlat( [ #size, #first ] ) ] times ;
ok
>6 lookAndSay .s
[1] (Array) [1, 3, 1, 1, 2, 2, 2, 1]
ok


X - Blocks revisited

We have seen that blocks are unamed piece of code : #[ sqrt + ] creates an object on the stack and, when this object is executed, block code is executed.

A block body can also use variables and parameters (and receivers) of the function or method that creates it :

: dynblock( n -- bl )
   #[ n + ] ;

This functions creates blocks on the stack and block body uses function parameter n. When the function finishes, function environnement is deleted but the block will keep the value this parameter has when the block was created :

>12 2 dynblock .s
[1] (Block) aBlock
[2] (Integer) 12
ok
> execute .s
14
ok

Here, the created block will keep the value of the parameter until garbage collector deletes the block. The block can be executed long after the function call has been done, the parameter value is now included into block environnement. This behaviour is similar to a functionnal language returning a function.

Example :

: deriv( f -- bl)
   | x | #[ ->x x 0.0000001 + f execute x f execute - 0.0000001 / ] ;

This function returns a block that uses function parameter f. When this block will be executed, it will take the top of the stack (x) and calculate (f(x+e) - f(x)) / e.

Let's try with f(x) = x^2 + x

>2.0 #[ dup sq + ] deriv execute .s
[1] (Float) 5.00000009395762
ok

You cannot create locale variables dedicated to a block, but local variables of the function that created the block can be used into the block body. This is what has been done into #deriv : a local variable x has been created into function body and used into the block.

Oforth Documentation

Current available documentation :

I - Tutorials

  • Basics : here
  • Syntax : here
  • Writing programs : here
  • Oforth classes : here
  • Parallelism : here
  • Oforth node : Work in progress...

II - Reference

  • Lang package reference : here