Text String Analysis in KBL

Text String Manipulation and Analysis in KBL

 

One of KBL's most important features is its data handling ability - from manipulating at whole dataset level to working with individual items. There is extensive vocabulary for processing and manipulating alphanumeric text so whether you are dealing with numeric, alphabetic or mixed components or both, there are commands and examples that enable you to do logical gymnastics with tabular and parametric items of data.

Working with text in dataset columns in MAIN

When working with the COL command, all values in the dataset column will be updated. The key text handling commands for COL are:


COL c.1 = c.5 LEFT 3    // c.1 (col 1) set to the 3 starting alphanumeric characters of col 5
COL c.1 = c.6 LEFT [N] // c.1 set to the value= N starting alphanumeric characters of c.6
COL c.1 = c.5 RIGHT 4    // c.1 set to the 5 alphanumeric characters at the end of c.5
COL c.2 = c.6 Right [JJ]    // c.2 set to the value=JJ alphanumeric characters at the end of c.6
COL c.4 = c.6 MID 2 6    // c.4 set to a substring, characters 2 to 6 of c.6
COL c.4 = c.6 MID [START] [END] // c.4 set to a substring, characters from value in START to END
COL c.3 = "Temp = " + c.5 + " degrees C"  // Enrich text column c.3 (c.5 could be numeric or text)
COL c.5 = c.2 +  " and " + c.3    // concatenate columns and text,  c.5 must be a text column

Note, on the MID command, by convention and in common with other programming languages, the starting position of a string for this type of command is 0 not 1. So MID 2 4 for the string "Lioness" would yield the substring "one" (not "ion"). Call this a quirk, but KBL sticks with the convention.

As you can see you can do a certain amount of text manipulation at whole dataset column level using the COL command but for more facilities on the parameters are more extensive.

Working with text strings in parameters

In KBL, a parameter used as a text string can be anything from a single character to, if needs be, millions of characters long.

For example, what could we do if we took the text example:

PAR TEXTSTRING = "Once upon a time there were 3 Bears, Mother Bear, Father Bear and Baby Bear, who all lived together in a small house in the woods"

Here are some ways to manipulate the string

Work out the string length

var.par TEXTLEN = /TEXT LENGTH "[TEXTSTRING]"    // Textlen returns the no. of chars in TEXTSTRING

 Substitute characters or a phrase with another string

Util.text.new  /PAR  TEXTSTRING    // Move focus to TEXTSTRING
Util.text.replace /ANY 3 "three" // Replace the numeric 3 with text
Util.text.replace /ANY "  " " " // Replace two spaces with one space
Util.text.replace /LOWER "a small house"  ""  // Replace phrase with nothing (remove)

Note, when it comes to replacing a character, word, phrase or substring in a text string, we can specify whether the search characters/phrase to be substituted is upper, lower or any case.

Find (the location of) a character or phrase in a text string

var.par FindPos = /TEXT  findstring "MOTHEr" =  "[TEXTSTRING]"
var.par FindPos = /TEXT findstring "MOTHEr" ==  "[TEXTSTRING]"
var.par FindPos = /TEXT  findstring "Mother" ==  "[TEXTSTRING]"

The first case will return the start position (character number) in FindPos of the word "Mother" in the TEXTSTRING, the single = denotes case insensitivity.

The second case would set FindPos as -1 as the == denotes an exact case match, MOTHEr does not exist in TEXTSTRING as an exact match.

The third case will return the start position in FindPos of Mother in the TEXTSTRING, as there is an exact case match (as required by the ==).

Grab a specified snippet of a text string

PAR INTEXT = "[TEXTSTRING]"
var.par INTEXT = /TEXT MID 5 10 // Keep substring between character position 5 to 10 of INTEXT
var.par INTEXT = /TEXT MID [STARTPOS] [ENDPOS] // Keep substring between parameter positions
var.par INTEXT = /TEXT RIGHT 5    // Keep the last 5 characters of the string in INTEXT
var.par INTEXT = /TEXT RIGHT [RLEN]    // Keep the last [RLEN] characters of the string in INTEXT
var.par INTEXT = /TEXT LEFT [LLEN]    // Keep the first [LLEN] characters of the string in INTEXT

Concatenating strings

The simplest way is:

Par Xstring = "The cat sat"
Par Ystring = "on the mat"
Par Xstring = "[XSTRING] quietly [YSTRING]"

The resultant value of Xstring will be: "The cat sat quietly on the mat."

Extracting Numbers from strings

PAR CODESTR = "X221JB657N"
Var.par CODESTR /TEXT STRIPTEXT

The resultant value of CODESTR = 221657

Case sensitivity?

When testing and comparing text In IF statements, you can control whether or not case sensitivity is important. Suppose you specified the parameter NAME = "Mike Smith"


IF "[NAME]"= "MIKE SMITH" is True (condition is met)
IF "[NAME]" == "MIKE SMITH" is False (condition not met) the == requires an exact case match)
IF "[NAME]" == "Mike Smith" is True (condition is met, characters are an exact match)

Example of Text manipulation

With the above building blocks you can perform a great number of different tasks. The following link is for a KBL script to analyse word usage in a Shakespeare play. It works by putting plain text in a string, stripping out punctuation then building a list of individual words used for analysis. This may be a bit more than a script for beginners but it shows how versatile KBL is when working with script. Pdf of Text Analyser KBL Script.

 

Back to top