Rulz and Rulz Language is Copyright © G.A.Jennings, 2015-2022.
Code Name: Tarzan
Though this code is part of the implementation of a (PHP) Rulz Interpreter, the code herein is not part of the Rulz Language Definition. (That just means that this code though "Official" in the sense of both being by the same Author, the code ain't "Official" in that any final release of the Rulz runtime executable will be based on this "thinking" only.
This presents a function to split an argument string into tokens. What a token is is defined by data in a unique format. An argument string is a PHP string with a format identical to a command line string in a shell.
The term Split, here, is similar to explode: split string into "array of strings, each of which is a substring of string". Given this string:
"word $var 123 'a b c'"
Result is this array of strings:
"word" "$var" "123" "'a b c'"
Since the argument string cannot be split on spaces nor a simple regular expression (there are many more argument definitions, or types), a simple algorithm has been devised.
(Spaces are not required to delimit arguments; the above argument string could be word
).$var
123'a b c'
A global data array is used. There ain't nothing wrong with global data of itself (themselves) – depends on how it's used. Perhaps a function will be created for this data access, or perhaps a Macro, or whatever. For now, it's a global.
The language may be named Rulz, but the word "rules" is here because the Rulz language is built on a "list of statement rules". The globals data is an associative array named $rules_def
– the rule definition file. The data, as I repeat often, drives the code.
Without further "todo", here is the (working) code... without comments as it needs to be fully explained, and it is later.
function tarzan($s): array {
global $rules_def;
$def = $rules_def['defs'];
$A = [];
do {
$D = GET($s,$def);
if ($D == array()) {
$s = substr($s,1);
continue;
}
foreach ($D as $d => $na) {
$t = EAT($s,$d);
if ($t) {
break;
}
}
if (empty($t)) {
break;
}
$A[][$t] = $d;
$s = substr($s,strlen($t));
} while ($s != '');
return $A;
}
function GET($s, $def): array {
$a = [];
$c = $s[0];
foreach ($def as $d => $v) {
if (isset($v[0][$c])) {
$a[$d] = '';
}
}
if (empty($a)) {
return $a;
}
if (count($a) == 1) {
return $a;
}
if (strlen($s) == 1) {
return $a;
}
$N = strlen($s);
for ($i = 1; $i < $N; $i++) {
$c = $s[$i];
foreach ($a as $d => $v) {
$n = count($def[$d]) - 1;
$t = ($i > $n) ? $n : $i;
$is = isset($def[$d][$t][$c]);
if (!$is) {
unset($a[$d]);
}
if (count($a) == 1) {
break 2;
}
}
}
return $a;
}
function EAT($s, $d): string {
$f = DFUNC($d);
$i = $f($d,$s);
$n = DMIN($d);
if ($i < $n) {
$t = '';
} else {
$t = substr($s,0,$i);
}
return $t;
}
/* END */
While the above code works, it won't (if you try it), due to some missing data. That's next though.
First, some thoughts on code documentation...
Take the following code, where $s
is a string, and $def
is an array.
$a = [];
$c = $s[0];
foreach ($def as $d => $v) {
if (isset($v[0][$c])) {
$a[$d] = '';
}
}
What the hell does that do? Well, what it does, or rather, what it's result is, depends on what $def
is, right? Obviously it's gonna be weird if $def
was like:
$def = [ 'abc' ];
Should if be like this?
$def = [ 'a' => 'abc', ];
Or should it be like this?
$def = [ 'a' => [ 'a', 'b', 'c' ], ];
My neighbhor's dog is barking to tell me to get to the point! Okay, Johnny Lee! (He's from Alabama.)
The point is that – and I am kind of editorializing again – even the simplest of code needs to be documented. And documentation is not telling people what the code is doing (you can see that!) but why the code is being.
In this case that loop simply takes the first character of a string an creates a list of possible token definition names. PHP has several types, but so far this has been about handling (parsing or tokenizing) a string into pieces parts.
word$var123'a b c' word $var 123 'a b c'
One string to four. To add the final touch the resulting array is actually to be:
$result = [
"word" => "word",
"\$var" => "variable",
"123" => "number",
"'a b c'" => "string",
];
Now you'll be able to "see" what the definition data should be like.
$def = [
"word" => [
0 => [
'a' => '', 'b' => '', 'c' => '', 'd' => '', 'e' => '', 'f' => '', 'g' => '', ..., 'z' => '',
],
],
"variable" => [
0 => [
'$' => '',
],
1 => [
'a' => '', 'b' => '', 'c' => '', 'd' => '', 'e' => '', 'f' => '', 'g' => '', ..., 'z' => '',
],
],
"number" => [
0 => [
'0' => '', '1' => '', '2' => '', '3' => '', '4' => '', '5' => '', '6' => '', ..., '9' => '',
],
],
"string" => [
0 => [
'\'' => '',
],
],
];
The method of that data's madness is the elimination of a search-gorithm. By that I mean having to call a library to search a string...
Say it was the "normal" (less mad):
$def = [
"word" => [
"abcdefghijklmnopqrstuvwxyz",
],
];
Somewhere then, a "search string" functionality is needed to see if "w" is the start of a "word".
$a = [];
$c = $s[0];
foreach ($def as $d => $v) {
if (strstr($v[0],$c)) { // or strpos or somethin
$a[$d] = '';
}
}
With the mad data it's just a test for existance.
Of course, the madness doesn't end there...