Professional

Home
Learn Shen
Videos
Community Wiki
Community
OS Kernel
OS Library
Shen Professional

 

The String Library

(v.6.1 27-10-16)
W O Riha

1. Introduction

1.1 Basic Definitions

In Shen, a unit string is represented by double quotes flanking a single character; e. g. "A", "$", "2" etc. Shen supports the full keyboard set for unit strings and, under current platforms, the ASCII set whose codes are found in the range 0-127. The notation "c#N;", where N is a decimal integer, is used to access the non-keyboard characters e.g. "c#64;" is read as "@". It is thus possible to access the full Unicode set if the platform permits it, but this is not guaranteed by the specification.

A string is the result of n (n = 0) concatenations of unit strings to the empty string "" using the two-place primitive cn, e.g. (cn "h" (cn "e" (cn "l" (cn "l" "o")))) = "hello". The polyadic @s can also be used (@s "h" "e" "l" "l" "o") = "hello". See section 3.1.

1.2 Important Note

A subtype ustring of type string was previously used to provide increased type security. However, it was decided to drop this type. Where a function requires a bona fide ustring, this is checked by the function using the predicate ustring?.

Earlier versions also made use of a type integer, a subtype of type number, also for reasons of type security. It is found, however, that the string library works perfectly ok, without such a type, as most of the functions return meaningful results, when invoked with non-integer values, where whole numbers are normally expected. For examples, see section 5.10.1 (Note on Non-Integer Arguments).

1. Introduction

1.1 Basic Definitions
1.2 Important Note

2. Obsolete Data Type ustring

3. Data Type string

3.1 The Primitive String Functions
3.2 Other String Functions

4. Definitions

5. The Library Functions

5.1 Remarks and Conventions
5.2 Original Unit String Predicates
5.3 String Predicates
5.4 Extending Unit String Predicates to Strings
5.5 Extending Unit String Functions to Strings
5.6 String Comparison
5.7 Length Functions
5.8 Selection
5.9 Searching
5.10 Replacing and Tokenising
5.11 List and String Conversion
5.12 Miscellaneous
5.13 String to Number Conversion
5.14 Radix Conversion

2. Obsolete Data Type ustring

Unit strings are the building blocks of strings – they may therefore be viewed as a generalisation of ‘characters’, used in other languages. The set of characters in such languages constitutes a type, often called char, equipped with specific operations and/or functions, e.g. char-upcase, char-lowercase? etc.

Shen’s type system does not distinguish between strings and unit strings: a unit string is just another string. There is nothing wrong with such an abstraction, however, there are situations where it is useful to draw a distinction, as it is not always clear, for example, if a function produces a string or a unit string, or if it is defined only for unit strings: a signature string --> string is not very informative. For this reason, a subtype ustring had been introduced in earlier versions of this library, which has been dropped.

ustring? : string --> boolean
Input: A string S.
Output: true if S is a unit string otherwise false.

(ustring? "12")
false : boolean

(ustring? "A")
true : boolean

3. Data Type string

string is one of the basic data types in Shen. It comes equipped with a number of functions and predicates.

3.1 The Primitive String Functions

The following seven primitive string functions are defined (see Document The Primitive Functions of KLambda).

string? : A --> boolean
Input: Any object X.
Output: true if X is a string, otherwise false.

(string? Willi)
false : boolean

(string? "Willi")
true : boolean


n->string : number --> string
Input: A number (integer) N.
Output: If N is recognised as a code point, the corresponding unit string, otherwise an error message.

(n->string 65)
"A" : string

Only the code-points c#N; with 0 = N < 128 and 160 = N < 256 are recognised in SP, e.g.

(n->string 165)
"" : string

The current library only caters for codes < 128 (the ASCII codes).

The inverse of n->string, is the function


string->n : string --> number
Input: A string S.
Output: The code of the leading unit string of S, or an error message, if S is the null-string

(string->n "A")
65 : number

(string->n "shen") \\ returns the code of the first unit string
115 : number

(string->n "")
The value NIL is not of type CHARACTER. \\ in SP

Note: On the Lisp platform, the following identity holds for any N with 0 = N = 1114111 (= 0x10ffff); (string->n (n->string N)) = N i.e. string->n is a left-inverse of n->string. Try for N > 255!

Note:
(@s S1 S2) is equivalent to (cn S1 S2). @s, however, is more general, as it allows repeated concatenation.

(@s "Dr. " " Willi" " " "Riha")
"Dr. Willi Riha" : string


str : A --> string
Input: Any atom of type A (boolean , symbol, string, number)
Output: The normal form of A as a string.

(str 3.141592653) \\ conversion of a number
"3.141592653" : string

(str (+ 2 2))
"4" : string \\ expressions are evaluated before conversion

(str Shen) \\ a symbol is converted
"Shen" : string

(str "Shen") \\ a string is converted to a string
""Shen"" : string

Note: Even though Shen declares that the output is of type string, it cannot be entered again as input because ""Shen"" is parsed as being two empty strings flanking the symbol Shen.

""Shen""
"" : string
Shen : symbol
"" : string


pos : string --> number --> string
Input: A string S and a (natural) number N.
Output: The Nth unit string in S, if N is an integer satisfying 0 = N < (
string.length S), otherwise a platform-dependent error message.

(pos "12345" 2)
"3" : string \\ indexing starts at 0 !


tlstr : string --> string
Input: A string S.
Output: If S is the null-string, a platform-dependent error message, otherwise S without its first unit string.

(tlstr "Willi")
“illi” : string

3.2 Other String Functions

hdstr : string --> string
Input: A string S.
Output: The first unit string of S – or an error message if S is "".

(hdstr "Willi")
"W" : string

A very useful string constructor is


make-string (no type)
For Input/Output, see Strings, Bytes and Unicode in the Shen Document.

(define hrs-mins
{number --> number --> string}
H M -> (make-string "~A hrs ~A mins" H M))

(hrs-mins 12 45)
"12 hrs 45 mins" : string

4. Definitions

Let S be a string consisting of L = 0 unit strings, S[0], S[1], …, S[L-1].
To emphasise this fact, we use the notation S[0..L-1].
If L = 0 then S is the null string, denoted "".
The length of S is L.
If ~(M = N), then the substring S[M..N] of S is the string composed of the unit strings S[M], …, S[N]
otherwise the substring is the null-string "".
S1 is a prefix of S iff S1 is "" or equal to S[0..M], for some M.
S1 is a suffix of S iff S1 is "" or equal to S[M..L-1], for some M.

5. The Library Functions

5.1 Remarks and Conventions

  1. The choice of functions (and their names) was inspired by the Scheme SRFI-13 string libraries. Identifiers usually include the prefix string., except when this is redundant and/or clumsy. For example, string.map (see below), but not string.substring .
  2. The notation employed in the code makes a distinction between unit strings and proper strings: unit string variables are denoted by S, S1, S2, …, whereas string variables are named Str, Str1, Str2,…, for example (@s S Str). This notation is not used in the present document, where the meaning of each parameter is explained in detail.
  3. An attempt was made to be consistent when passing arguments to a function: the string being operated upon always comes last. For example, string.take : number --> string --> string, which returns a prefix of specified length e.g. (string.take 4 "ABCDEF"), or string.replace-all : string --> string --> string --> string, where the third argument is the target string (see description).
  4. All functions are tail-recursive.
  5. Error messages are kept to a minimum. The functions are robust and will not cause a system crash, when supplied with illegal or silly arguments. For example,

(substring 2.9 5.1 "01234567") \* bounds should be integers *\
"3456"

  1. Where the names of string package functions overlap or potentially overlap with Shen ones the prefix string. has been used; e.g. string.reverse reverses a string whereas reverse reverses a list.

5.2 Original Unit String Predicates

Note: These predicates were previously defined for type ustring, resulting in a type error when a predicate is invoked with a string argument. They can now be invoked with any strings.

digit? : string --> boolean
Input: A string S.
Output: true if S is in ["0", "1", …,"9"], otherwise false.

(digit? "1")
true : boolean


uppercase? : string --> boolean
Input: A string S.
Output: true if S is an upper-case letter "A","B", …,"Z", otherwise false.

(uppercase? "4")
false : boolean


lowercase? : string --> boolean
Input: A string S.
Output: true if S is a lower-case letter "a","b",… "z", otherwise false.

(lowercase? "qq")
false


letter? : string --> boolean
Input: A string S.
Output: true if S is an upper- or lower-case letter, otherwise false.

(letter? "q")
true : boolean

(letter? "qq")
false

(letter? "@")
false : boolean


whitespace? : string --> boolean
Input: A string S.
Output: true if S is in
["c#9;" "c#10;" "c#11;" "c#12;" "c#13;" " "], otherwise false.

(whitespace? "q")
false : boolean

5.3 String Predicates

string.prefix? : string --> string --> boolean
Input: Two strings S1 and S2.
Output: true if S1 is a prefix of S2, otherwise false.

(string.prefix? "cat" "catapult")
true : boolean


string.suffix? : string --> string --> boolean
Input: Two strings S1 and S2.
Output: true if S1 is a suffix of S2, otherwise false.

(string.suffix? "ton" "Newton")
true : boolean


substring? : string --> string --> boolean
Input: Two strings S1 and S2.
Output: true if S1 is a substring of S2, otherwise false.

(substring? "tap" "catapult")
true : boolean

5.4 Extending Unit String Predicates to Strings

string.every? : (string --> boolean) --> string --> boolean
Input: A predicate P of unit strings, and a string S.
Output: true if P is true for all unit strings of S, otherwise false.

(checks if a string consists entirely of letters and spaces)
(string.every? (/. S (or (letter? S) (= S " "))) "String Library")
true : boolean


string.some? : (string --> boolean) --> string --> boolean
Input: A predicate P of unit strings, and a string S.
Output: true if P is true for at least one unit string of S, otherwise false.

(checks if a string contains a digit?)
(string.some? (function digit?) "String Library")
false : boolean


Using predicate string.every? it is easy to define functions to test if a given string consists entirely of upper-case letters, is alpha-numeric (whatever your definition), numeric, a digit-sequence, and many more. The library only includes

digit-string? : string --> boolean
Input: A string S.
Output: true if S is a string consisting of digits (i.e. represents an unsigned integer), otherwise false.

(digit-string? "143")
true : boolean

5.5 Extending Unit String Functions to Strings

string.map : (string --> string) --> string --> string
Input: A function F : string --> string, and a string S.
Output: The string obtained from S by applying F to its constituent unit strings (for string.ncopy, see below)

(string.map (/. S (string.ncopy 3 S)) "1234")
"111222333444" : string

Using string.map one may define the following functions


string.upcase : string --> string
Input: A string S.
Output: The string obtained from S by converting its lower-case letters to upper case.

(string.upcase "Shen 3.1")
"SHEN 3.1" : string


string.downcase : string --> string
Input: A string S.
Output: The string obtained from S by converting all its upper-case letters to lower case.

(string.downcase "Shen 3.1")
"shen 3.1" : string

5.6 String Comparison

This is the lexicographic extension of the unit string comparisons to proper strings (only the ASCII codes are considered).
The comparison functions for strings all have the signature
string --> string --> boolean

The following functions are available:

= equal (equality is defined for all native types, including strings).
<str less than
>str greater than
<=str less than or equal
>=str greater than or equal

(<=str "zebra" "zebu")
true : boolean

but

(<=str "zebra" "Zebu")
false : boolean \\ upper-case letters precede the lower-case ones

Note: Some libraries provide ‘case independent’ comparisons, in our notation <str-ci, =str-ci etc. These have not been included, because, as far as ASCII is concerned, they can be expressed by converting both arguments to the same case. For example, (if (<str (string.upcase S1) (string.upcase S2)) …)

5.7 Length Functions

string.length : string --> number
Input: A string S.
Output: The length of S, i.e. the number of unit strings in S.

(string.length "ABCDE")
5 : number


string.prefix-length : string --> string --> number
Input: Two strings S1 and S2.
Output: The length of the longest common prefix of S1 and S2.

(string.prefix-length "Mark Tarver" "Mark Anthony")
5 : number

(string.prefix-length "Mark Tarver" "Willi")
0 : number


string.suffix-length : string --> string --> number
Input: Two strings S1 and S2.
Output: The length of the longest common suffix of S1 and S2.

(string.suffix-length "preclude" "interlude")
4 : number

5.8 Selection

string.take : number --> string --> string
Input: An integer N and a string S.
Output: The prefix of length N of S, i.e. the substring S[0..N-1].
Note: if N is greater than (string.length S), S is returned
if N is less than 1, the null string is returned.

(string.take 3 "ABCDEFG")
"ABC" : string

(string.take -1 "ABCDEFG")
"" : string


string.drop : number --> string --> string
Input: An integer N and a string S.
Output: S without its prefix of length N of S, i.e. the substring S[N..(string.length S)].
Note: if N is greater than (string.length S), "" is returned
if N is less than 1, then S is returned.

(string.drop 3 "ABCDEFG")
"DEFG" : string

(string.drop -1 "ABCD")
"ABCD" : string

In certain situations, both the ‘take’ and the corresponding ‘drop’ of a string are required. It would be wasteful to compute them separately as one can get the ‘drop’ for free, when working out the ‘take’. The following function takes this fact into account.


string.split : number --> string --> (string * string)
Input: An integer N and a string S.
Output: The pair consisting of the prefix of length N of S and the remaining suffix.

(string.split 5 "ABCDEFGH")
(@p "ABCDE" "FGH") : (string * string)

A related, and very useful, function splits a string S, at a given unit string U. For example, if U is "+", and S is "123+15", then the result is the pair (@p "123" "+15"). This idea can be generalised to splitting a string at one of a set of given unit strings. Such a set can either be represented as a predicate P, i.e. a function of type string --> boolean, for example, whitespace?, or as a string of possible separators. We shall provide both versions, as it is often simpler to specify the separators as a string, than to define a predicate function on the fly. We call the two function string.split@ and string.split@’


string.split@ : string --> string --> (string * string)
Input: Two strings Sep and S.
Output: Let U be the first unit string of Sep that occurs in S. Then the output is the pair consisting of the prefix of S which does not contain U, and the suffix starting with U. If there does not exist such U then the output is the pair (@p S "").

(string.split@ "+" "123+15")
(@p "123" "+15") : (string * string)

(string.split@ "+=-" "time-limit = 60mph")
(@p "time" "-limit = 60mph") : (string * string) \\ not (@p "time-limit" "= 60mph")
(string.split@ ".;:" "no punctuation here")
(@p "no punctuation here" "") : (string * string)


string.split@’ : (string --> boolean) --> string --> (string * string)
Input: A predicate P on unit strings and a string S.
Output: If U is the first unit string of S such (P U) is true then the output is the pair consisting of the prefix of S which does not contain U, and the suffix starting with U. If there does not exist such U then the output is the pair
(@p S "").

(string.split@' whitespace? "once and for all")
(@p "once" " and for all") : (string * string)

(string.split@' digit? "time-limit = 60mph")
(@p "time-limit = " "60mph") : (string * string)

Note: string.split@ is defined in terms of string.split@’ with the predicate P
(/. X (element? X (string->list Sep)))


string.take-right : number --> string --> string
Input: An integer N and a string S.
Output: The suffix of length N of S.
Note: if N is greater than (
string.length S), S is returned; if N is less than 1, the null string is returned.

(string.take-right 3 "ABCDEFG")
"EFG" : string

(string.take-right 13 "ABCDEFG")
"ABCDEFG" : string


string.drop-right : number --> string --> string
Input: An integer N and a string S.
Output: S without its suffix of length N.
Note: if N is greater than (
string.length S), "" is returned; if N is less than 1, S is returned.

(string.drop-right 3 "ABCDEFG")
"ABCD" : string

(string.drop-right 10 "ABCDEFG")
"" : string


Also related to string.split is the function

string.take-drop : number --> string --> (string * string)
Input: An integer N and a string S.
Output: The pair consisting of the prefix of length N-1 of S and the suffix of length M, where M+N = (
string.length S), in other words, S is split into two portions with the N-th unit string missing.

(string.take-drop 3 "1234567")
(@p "12" "4567") : (string * string)


substring : number --> number --> string --> string
Input: Two integers M and N and a string S.
Output: if M > N: the null string "" if M = N: the substring S[m..n], with m = (
max 0 M) and n = (min N, (- (string.length S) 1). Note: neither m nor n is actually computed!

(substring 1 3 "ABCDEFG")
"BCD" : string

(substring 3 1 "ABCDEFG")
"" : string

(substring 3 10 "ABCDEFG")
"DEFG" : string


In the functions above, if non-integer values for N (and/or M) are supplied as input, the output returned is as if |-N-| (and |-M-|) had been supplied, i.e. non-integer values are rounded up (see Maths Library, section 3.3.3). No actual rounding takes place – the output is a result of the way functions are coded.

(string.take 4.76 "ABCDEFG") \* takes 5 *\
"ABCDE" : string

(string.drop 3.004 "ABCDEFG") \* drops 4 *\
"EFG" : string

(substring 1.2 3.02 "ABCDEFG") \* substring 2 4 ... *\
"CDE" : string


string.trim-left : (string --> boolean) --> (string --> string)
Input: A predicate P of unit strings and a string S.
Output: The string obtained by dropping the longest prefix of S whose unit strings all satisfy P.

(string.trim-left (/. S (element? S [" " "0"])) "0 0 12003400 0")
"12003400 0" : string


string.trim-right : (string --> boolean) --> (string --> string)
Input: A predicate P of unit strings and a string S.
Output: The string obtained by dropping the longest suffix of S whose unit strings all satisfy P.

(string.trim-right (/. S (element? S [" " "0"])) "0 0 12003400 0")
"0 0 120034" : string


string.trim : (string --> boolean) --> (string --> string)
Input: A predicate P of unit strings and a string S.
Output: The string obtained from S by trimming it at both ends.

(string.trim (/. S (element? S [" " "0"])) "0 0 12003400 0")
120034 : string

Note: Choosing for P the predicate whitespace? will strip off all leading and/or trailing white space.

(string.trim whitespace? " 123 c#13;")
"123" : string


string.pad : string --> number --> string --> string
Input: A unit string U, a non-negative integer N and a string S.
Output: The string of length N obtained from S by padding it, on the left, to length N with copies of U.
For N = 0 the null string is returned, and if N < (string.length S), the suffix of length N of S, in other words S2 is truncated on the left. Note: This latter behaviour may, or may not be, what is required in a particular situation. If not, the length of S should be tested for before applying string.pad.

(string.pad " " 10 "123456") \* pad with spaces *\
" 123456" : string

(string.pad " " 10.33 "123456") \* non-integer values are rounded down *\
" 123456" : string

(string.pad " " 4 "123456")
"3456" : string


string.pad-right : string --> number --> string --> string
Input: A unit string U, a non-negative integer N and a string S.
Output: The string of length N obtained from S by padding it, on the right, to length N with copies of U.
For N = 0 the null string is returned, and if N < (string.length S), the prefix of length N of S, in other words S is truncated on the right. See note under string.pad.

(string.pad-right " " 10 "123456")
"123456 " : string

(string.pad-right " " 4 "123456")
"1234" : string

5.9 Searching

string.index : string --> string --> number
Input: Two strings S1 and S2.
Output: If S1 is a substring of S2, the starting position of the first occurrence of S1 in S2, otherwise -1.

(string.index "is" "Mississippi")
1 : number

(string.index "eros" "heroine")
-1 : number


string.index-last : string --> string --> number
Input: Two strings S1 and S2.
Output: If S1 is a substring of S2, the starting position of the last occurrence of S1 in S2, otherwise -1.

(string.index-last "is" "Mississippi")
4 : number


instring? : string --> string --> boolean
Input: Two strings S1 and S2.
Output: true iff S1 is a substring of S2.
Note: This is just a convenient short-hand for (< (string.index S1 S2) 1).

(instring? "in" "string")
true : boolean

(instring? "trig" "string")
false : boolean


string.count : string --> string --> number
Input: Two strings S1 and S2.
Output: The number of times S1 occurs as a substring in S (ignoring “overlapping” occurrences).

(string.count "11" "231145111")
2 : number \\ "11" occurs only once in "111"

5.10 Replacing, Inserting and Tokenising

string.replace-all : string --> string --> string --> string
Input: Three strings S1, S2, S3.
Output: The string obtained from S3 by replacing all occurrences of S2 with S1.

(string.replace-all "-" "/" "16/07/12")
"16-07-12" : string

(string.replace-all "XX" "000" "12000000340000789000")
"12XXXX34XX0789XX" : string


string.replace : string --> number --> string --> string --> string
Input: An integer N, and three strings S1, S2, S3.
Output: The string obtained from S3 by replacing the N-th occurrence of S2 with S1.

(string.replace "=" "+" 2 "100 + 3 + 103")
"100 + 3 = 103" : string


string.delete-all : string --> string --> string
Input: Two strings S1, S2.
Output: The string obtained from S2 by deleting all occurrences of S1.

(string.delete-all "00" "12000340005600")
"12034056" : string


delete-substring : number --> number --> string --> string
Input: Two integers M and N and a string S.
Output: S if M > N,
if ~(M = N) the string obtained from S by deleting substring S[m..n],
where m = (max 0 M), n = (min N (- (string.length S) 1)).
Note: neither m nor n are actually evaluated!

(delete-substring 2 4 "01234567")
"01567" : string

(delete-substring 2.2 4.8 "01234567") \* non-integer values are rounded up *\
"01267" : string

(delete-substring -2 4 "01234567")
"567" : string

(delete-substring 2 40 "01234567")
"01" : string

(delete-substring 4 1 "01234567")
"01234567" : string


string.insert : number --> string --> string --> string
Input: An integer N and two strings S1 and S2.
Output: The string obtained by inserting S1 into S2 after the N-prefix of S2.
Note: If N = 0 then S1 is pre-pended, if N is greater than the length of S2, S1 is appended.

(string.insert 4 "ty" "nine days")
"ninety days" : string

(string.insert -4 "ty" "nine days")
"tynine days" : string


string.tokenise : (string --> boolean) --> string --> (list string)
Input: A string S and a function F : string --> boolean (defining the separators).
Output: The list of tokens.

(tokenise a date-and-time string "04-05-2012 20h 15m 32.5s" with separators "-" and " ").
(string.tokenise (/. S (element? S ["-" " "])) "04-05-2012 20h 15m 32.5s")
["12" "03" "2012" "13h" "34m" "12.5s"] : (list string)

A (kind of) ‘inverse’ of tokenise is the following function which produces a string from a list of strings by inserting a string between every two strings in the list.


string.join : string --> (list string) --> string
Input: A string S (to be inserted) and a list of strings StrL (the tokens).
Output: The string obtained by inserting S between every two strings in StrL.

(string.join " - " ["one" "two" "three"])
"one - two - three" : string

(string.join " " (string.tokenise (/. S (element? S ["-" " "])) "04-05-2012 20h 15m 32.5s"))
"04 05 2012 20h 15m" : string


A related function is ‘interpose’ which inserts a string between every two unit strings of a string.

string.interpose : string --> string --> string
Input: Two strings S1 and S2.
Output: The string obtained by inserting S1 between every two unit strings of S2.

(string.interpose " + " "123456")
"1 + 2 + 3 + 4 + 5 + 6" : string

5.11 List and String Conversion

string->list : string --> (list string)
Input: A string S.
Output: The list of unit strings of S.
Note:
string->list is defined in terms of the system function explode : A --> (list string) (which explodes any object into a list of unit strings).

(string->list "ABCD")
["A" "B" "C" "D"] : (list string)


list->string : (list string) --> string
Input: A list StrL of strings.
Note: The elements of StrL can be general strings!
Output: The string formed from the strings in StrL.

(list->string (string->list "ABCD")) \* list->string is the left-inverse of string->list *\
"ABCD" : string

(list->string ["AA" "BBB" "C" "DD"])
"AABBBCDD" : string

5.12 Miscellaneous

string.reverse : string --> string
Input: A string S.
Output: S reversed.

(string.reverse "abcd")
"dcba" : string


string.ncopy : number --> string --> string
Input: An integer N and a string S.
Output: A string of N copies of S (or an error message if N is negative).

(tlstr (string.ncopy 3 " hello")) \\ to get rid of the leading space
"hello hello hello" : string

(tlstr (string.ncopy 2.75 " hello")) \\ non-integer is rounded down
"hello hello hello" : string


string.filter : (string --> boolean) --> string --> string
Input: A predicate P of unit strings and a string S.
Output: The string of all unit strings of S for which P is true.

(string.filter lowercase? "Abc1Dd4")
"bcd" : string

(string.filter (/. S (= S "0")) "10011100101")
"00000" : string


string.reduce : (string --> A --> A) --> A --> string --> A
Input: A function F : string --> A --> A, an element I of type A and a string S.
Output: The (right-left) reduction of S with respect to F. (I is the value of the reduction of "")
Note: reduce is alternatively known as foldr (“fold-right”).

If F : string --> string --> string is the function
(/. S Str (if (= S "0") (@s "zero " Str) (@s "one" Str))) and I is "", then a binary-string as input is ‘reduced’ to a string as shown below

(string.reduce (/. S Str (if (= S "0") (@s "zero " Str) (@s "one " Str)))"" "011001")
"zero one one zero zero one " : string


string.foldl : (string --> A --> A) --> A --> string --> A
Input: A function F : string --> A --> A, an element I of type A and a string S.
Output: The (left-right) reduction of S with respect to F. (I is the value of the reduction of "")
Note: For associative operations
string.reduce and string.foldl yield the same result, but not for non-associative operations.

(string.foldl (/. S Str (if (= S "0") (@s "zero " Str) (@s "one " Str)))"" "011001")
"one zero zero one one zero " : string \* the reverse of the previous example *\

If F : string --> number --> number is the function (/. S N (+ N 1)) and I is 0, the reduction (either right or left) of a string is the length of the string. Thus, one could define

(define strlen
{string --> number}
Str -> (string.reduce (/. S N (+ N 1)) 0 Str))

More generally, if P : string --> boolean is any predicate of unit strings then the function F : string --> number --> number, with F equal to (/. S N (if (P S) (+ N 1) N)) used as an argument in string.reduce (string.foldl), will count all the unit strings of a string S satisfying predicate P.


string.count-ustrings : (string --> boolean) --> string --> number
Input: A predicate P of unit strings and a string S.
Output: The number of unit strings of S for which P is true.

(string.count-ustrings (function digit?) "a103b48k A*7")
6 : number

5.13 String to Number Conversion

string->number : string --> number
Input: A string S representing a Shen number
Output: The number corresponding to S, or an error message, if S does not represent a valid Shen number.

(string->number "--0023.78")
23.78 : number

(string->number "--+.367")
0.367 : number
(string->number "--+23.01e-1")
2.301 : number

(string->number "--+23.0p1e-1")
illegal character 'p' in number

(string->number "555.01e3")
555010.0 : number

(string->number "555.01e+3")
illegal character '+' in exponent

(string->number "666.")
fractional part missing!


string->unsigned : string --> number
Input: A string S representing an unsigned integer.
<unsigned-integer> ? <digit> | <digit><unsigned-integer>
Output: The number corresponding to S, or an error message, if S does not represent an unsigned integer.

(string->unsigned "00123") \\ leading 0s are allowed
123 : number

(string->unsigned "+123")
illegal character '+' in unsigned integer


string->integer : string --> number
Input: A string S representing a mathematical (not a Shen) integer.
<integer> := <unsigned-integer> | +<unsigned-integer> | –<unsigned-integer>
Output: The number corresponding to S, or an error message, if S does not represent a valid integer.

(string->integer "000123") \* leading 0s are allowed *\
123 : number

(string->integer "123.0") \* this would be a Shen-integer *\
illegal character '.' in integer

(15+) (string->integer "12o3")
illegal character 'o' in integer

(16+) (string->integer "--123") \* this would be a Shen-integer *\
illegal character '-' in integer

(17+) (string->integer "-123")
-123 : number

5.14 Radix Conversion

The functions in this section are used to convert between different radix number systems. Only unsigned integers are considered. The attribute “decimal” used in function names indicates an unsigned decimal integer; numbers expressed in other number systems are always represented as strings.

The following function converts from decimal to radix-B. B must be greater than 1, and should be no greater than 36. The radix-B digits are taken from the sequence 0, 1, …, 9, a, b, c, …, z. (Capital letters are permitted).

decimal->radixB : number --> number --> string
Input: Two integers N and B.
Output: The decimal integer N converted to radix-B (represented as a string), or an error message, if N or B are not integers.

(decimal->radixB 65535 16) \* conversion to hex *\
"ffff" : string

(decimal->radixB 65535 16.7)
radix must be an integer!

(decimal->radixB 65535 8) \* conversion to octal *\
"177777" : string

(decimal->radixB 65535 2) \* conversion to binary *\
"1111111111111111" : string

(decimal->radixB 65535 24) \* conversion to radix 24 *\
"4hif" : string


The inverse of decimal->radixB is

radixB->decimal : string --> number --> number
Input: A string S and an integer B, where S represents an integer in the radix-B number system.
Note: An error is raised if B is not an integer or is less than 1.
Output: If all the unit strings of S are radix-B digits, S is converted to a decimal integer – error otherwise.

(radixB->decimal "111111111" 2) \* conversion from binary *\
511 : number

(radixB->decimal "abc" 16) \* conversion from hex *\
2748 : number

(radixB->decimal "abc" 12) \* the highest digit in radix-12 is ‘b’, decimal 11 *\
illegal digit 'c' in radix '12' number

(radixB->decimal "Mark" 28) \* upper-case is allowed *\
491560 : number

(radixB->decimal "Willi" 33) \* radix-33 number system is smallest with a digit ‘w’ *\
38619918 : number

It is easy to combine the preceding two functions to convert between any two number systems.

radixB->radixC : string --> number --> number --> string
Input: A string S and two integers B and C (both > 1).
Output: If all the unit strings of S are radix-B digits, S is converted to a string representing the number in the radix-C number system, otherwise an error is raised.

(radixB->radixC "345" 8 16) \* octal -> hex *\
"e5" : string

(radixB->radixC "345" 16 8) \* hex -> octal *\
"1505" : string

(radixB->radixC "121212" 3 2) \* ternary -> binary *\
"111000111" : string

(radixB->radixC "abcdef" 16 32) \* hex -> radix-32 *\
"anjff" : string


In many programming languages, including C/C++ and Javascript, there is a convention for denoting octal and hexadecimal numbers:

• any ‘digit’ sequence starting with ‘0’ denotes an octal integer
• any sequence preceded by ‘0x’ is a hex-integer
• any other digit sequence is taken as decimal.

Examples:

1234 is a decimal integer
01234 is an octal number (668 in decimal)
0x1234 is a hex number (4660 in decimal)

A function string->int with either one or two argument(s), which therefore has no type, is available. This function (vaguely inspired by the Javascript function parseInt) behaves as follows:

(1) when invoked with one argument S (of type string) assumes that S is the string representation of either a hex, octal or decimal integer, and attempts to convert it to a decimal integer.

(2) when invoked with two arguments, a string S and an integer B >1, converts S to decimal, assuming that S represents a radix-B integer.

Examples:

(string->int "001234") \* one argument – octal assumed *\
668 : number

(string->int "001234" 10) \* radix 10 specified – octal overridden *\
1234 : number

(string->decimal "0x1234") \* prefix ‘0x’ – hex assumed *\
4660 : number

(string->int "0x1234" 10) \* radix 10 specified *\
illegal digit 'x' in radix '10' number

(string->int "0x1234" 16) \* radix-16 (hex) *\
4660 : number