Philip: A Conversational Chatter Bot
Preliminary Desgin Notes - April, 2003
By Gary J. Shannon
Overview
Philip is a conversational chatter bot program designed to mimic ordinary human conversation. Unlike some chatter bots Philip is not aware that "he" is a computer program. In fact, Philip's simulated mental model of the world includes himself as a male human and his responses are appropriate to that assumption. In other words, Philip thinks he's a person. For the sake of consistency, and to honor his existential assumption about himself, we shall refer to Philip as "him". The simulation of the world is done in the same manner as simulations in computer role-playing games that model a three-dimensional world for the player to navigate.
The method of Philip's operation revolves around three key concepts. First is that sentences are recognized by matching them against a library of sentence patterns. The second concept is that understanding is based on constructing a mental model of the state of the world revealed by the input sentences. Finally, output sentences are generated by substituting values into templates, where the values themselves may in turn be templates.
The knowledge base used by Philip is constructed of nodes which stand for things in the virtual reality world created by Philip's "thoughts". In some cases where rote memory of static information is called for standard tables, lists and databases are referenced by nodes. A units conversion table is an example, which would contain millimeters per inch, cups per gallon, justices per Supreme Court, and so on. Another example would be the geographic data base that locates cities. monuments, landmarks and such within their respective states, provinces, countries, planets (real and fictitious) etc.
The simulation begins as soon as a user logs into the program. At initialization Philip creates an instance of the class Person to represent the user and places that person in an assumed location (in some generic room, sitting at a computer) located at some unspecified place in the same world in which Philip lives. This actor object (referred to as the player character, or PC) represents the user in Philips mental world, and can be manipulated and moved about just as any actor object in any virtual reality video game might be. Thus if the user types "Last week I was in Tinsel Town.", Philip (recognizing the slang term for "Hollywood, CA" from his tables of names) finds Hollywood in his geographical data base, determines that this location is an object of the class "city", and creates an instance of the city class (a process called, in this instance, "spawning a new city") , populating its member variables with the default values found in the data base for Hollywood. Having thus created, or spawned a limited simulation of "Hollywood" and placed it in its proper context in the world, he moves the PC to that location.
He also knows, however, that this location applies in the past, and that the user's present location is his initial location when he logged into the program. So in that sense Philip "imagines", for the sake of discussion, that the user is in Hollywood. This enables him to properly answer questions like "Where am I?" and "Where was I last week?" as well as hypothetical questions like "If I were looking down from the Parthenon, what city would I see?"
The sentences that Philip generates have their origin in templates with slots that are filled by string values. Each node (a node can be an object, a concept, a relationship, an attribute, and so on) contains it's own methods for generating strings. In other words, each node can say its name or its value, or recursively, say the name or value of any node linked to it. The strings generated by nodes can be simple phrases, compound phrases, or complete simple or complex sentences, depending entirely on the templates used and on which method is invoked at that node. Suppose, for example, that a particular node represents the size attribute of an object, say an apple. Now suppose that the value of that attribute is "significantly larger than average". When that node is called upon to say its value it will return a string something like "really huge", "gigantic", "exceptionally large", or any one of a dozen or more phrases that describe the value of the attribute. That phrase, which may be a canned phrase selected at random or a phrase generated by a template, becomes the value slotted into the template being used by the calling node. In turn the calling node might use that value in a template to generate a phrase like "the gigantic apples", which it then passes down to its own calling node. Finally, the ultimate calling node produces a complete sentence and outputs it to the user: "Yeah, I heard him bragging about the gigantic apples he grew in his orchard." In this manner, Philip can generate conversational sentences of arbitrary length and complexity.
The Details
With that general outline in mind we can take a closer look at the inner workings of Philip's mind. We'll begin with the knowledge base because that's the real key to how everything else functions. The other functions are input, vocabulary compression and parsing. Vocabulary compression is where all the many alternate words for a concept get boiled down to one standardized word. "Chat", "discuss", "shoot the breeze", "chew the fat", "discourse", "gab", "yak", and "yap" (among others) all get changed to "talk". This is necessary because Philip's parsing is simple pattern matching and every word that stands for "talk" must actually _be_ "talk" for the pattern matching to succeed. We'll return to the details of that process a bit later, including how Philip tells the difference between the verb "yak" and the animal "yak".
The Knowledge Base
The knowledge base is made up of nodes and methods. Methods are subroutines that are associated with a given node. Even though methods are stored separately from nodes in memory they can be thought of as belonging to a particular node or family of nodes. For that reason we will think of methods as being a part of the node that owns them. Keep in mind, however, that more than one node may own a particular method, but that only one copy of the method exists in memory. The method is, therefore, shared by all its co-owners.
A key concept here is that nodes of a given class inherit all the methods and attributes of that class. But they also inherit things that are themselves inherited from a more general class. For example, a node of the class "Toolbox" inherits all the methods and attributes of the "Toolbox" class. However, the "Toolbox" class is itself derived from the more general class "Container", and so the node also inherits all the methods and attributes of the "Container" class. This inheritance proceeds up the chain of parent classes to the base class "Thing" which causes it to have attributes like weight and size. A class may inherit from more than one parent class so that a toolbox with an openable lid would inherit from both "Container" and "OpenableObject".
Nodes
A node is collection of optional constant string values, pointers to other nodes, and pointers to methods. Each node represents some "thing" in the broadest sense of the word. A node might stand for a physical object such as an apple, or for something more intangible like, "size", "heat", or "emotion". Nodes are written in a human-readable "source code" form that superficially resembles C++, and compiled into an internal representation by the knowledge base compiler, kbComp.
In source code form the pointers to other nodes are identified with symbolic names, however these names disappear when the knowledge base is compiled. Therefore it is important not to assume that just because a particular pointer is named something like "myColor" that the brain engine somehow knows that the value pointed to is a color. This is not the case. The brain engine knows nothing except how the nodes are connected. The fact that a node represents a color is encoded in the node that holds that color value, but not as an explicit statement: "this is a color", rather as an implicit consequence of the methods owned by that node. The brain engine does not know that this node holds a color value. Instead it knows that this node has methods that generate strings. The significance of a string that says something like "Red is my favorite color." is recognized only by the user who is talking to Philip. To the brain engine these are just meaningless strings generated to fill in equally meaningless templates.
I belabor that point because it is important to bear in mind that although Philip gives every indication of being intelligent and of understanding the conversation, in reality he understands nothing. He only simulates understanding in the same sense that a flight simulator program simulates flight without actually flying.
Methods
So far much of this may seem like some kind of magic. Words come in and words go out, and somehow Philip manages to maintain a reasonable train of conversation, asking and answering questions, making remarks and observations, and in general carrying on quite a pleasant conversation. But that feeling of magic will fade very quickly when we take a close look at methods. There is nothing subtle, sublime, or elegant about methods. They are ad hoc string generators of the most primitive kind.
In particular, clever programmers and academic artificial intelligence researchers are likely to react to these methods with disappointment or disdain. In spite of Philip's amazing abilities, there is nothing new here. There are no ground breaking theoretical discoveries or general principles that will aid in explaining the nature of intelligence or understanding. The only thesis is that the fluency of the human language user originates not in the power of his biological inference engine, but in the shear size of his knowledge base and the shear diversity of his template collection.
With that in mind we will take a close look at some actual methods extracted from Philip's brain. To begin with, we need to know the two general rules about nodes:
What a node knows is encoded in its links (relationships) and attributes (variables). What a node knows how to say is encoded in its methods. Let's look first at a particular kind of node called a static node. A static node, the simplest kind of node, has no variables. Furthermore, only one static node of any given class can exist. In other words, it represents a class of nodes that is not explicitly instantiated. The purpose of a static node is only to provide methods for saying the values associated with one of a set of enumerated values. For example, any object that can be turned on and off, like a radio, a lamp, or a hot water faucet, will have an attribute that represents its current (or imagined, or hypothesized) state as one of three values "UNKNOWN", "ON" or "OFF". Since this variable can only take on three values these values are explicitly enumerated. However, any node that has such an attribute needs to be able to say what the value of its attribute is. Nodes, however, only know how to talk about themselves. They do not know how to talk about their attributes. Instead, they require that their attributes each be able to talk about themselves. Therefore a node must be created which knows how to say things like "on" and "off". This requirement is satisfied with a static node. It has no variables because it will only talk about values of variables that are passed to it.
As an aside that will be explained in more detail later, every value in Philip's brain consists of both a value and a certainty that is associated with that value. The certainty field is an enumerated field that can take on values like { ASSERT, ASSUMED, UNCERTAIN, ... } to mention a few. If a sentence mentions an apple without specifying the values of any of its attributes those attributes will be filled in with assumed default values like "red", "average size". Assumed values do not hold the same force as asserted values because they are only assumed in the absence of information to the contrary. If Philip reads a sentence like "John went into the shower." he was not told that he went in there to take a shower, but he assumes that this is the case. Likewise he is not certain that John removed his clothes before taking a shower, but he assumes that he did. (The mechanism behind the making of these assumptions will be discussed a bit later.) If it is asserted that John did not remove his clothes before entering the shower then Philip will assume that John entered the shower for a different purpose such as cleaning the tile or fixing the plumbing. If it is asserted that John actually took a shower with his clothes on then Philip will recognize that as a humorous or incongruous situation and will respond appropriately.I mention this now only because the example below explicitly refers to the ".value" field of the argument passed to it. This is necessary to distinguish between an argument's value field and its certainty field. These fields are not defined explicitly, but are implicitly a part of every variable.
Now consider the example of the relative size of an object. Object sizes in Philip's knowledge base are not normally stored as specific numerical values with units of measure, but as selections from an enumeration of abstract sizes that identify whether the object is something that will fit in the hand, is as big as a house, is bigger than a bread box, or is planet sized, and so on. Philip can guess the size of an apple, just as a human might, but he doesn't know anything like the mean size of apples in millimeters. Such knowledge is decidedly unhuman-like, and inappropriate for an entity that believes itself to be human. For each class of object there is a nominal size for members of that class and an actual size for that instance of the object. The actual size is a relative size which is one of five enumerated values: very small, small, average, large, and very large. "Very small", for example, is very small relative to the average size of such objects in general. Thus a very small bus will still be much larger than a very large avocado.
In cases likes James and the Giant Peach or The Little Old Lady Who Lived in a Shoe, Philip recognizes from assertions made that the default nominal size value is not appropriate and instantiates, for example, a new object of class "Dwelling" (i.e., "spawns a Dwelling") and assigns to it shape attributes borrowed from the class "Shoe".
For any object that has a relative size that object will have one of the five enumerated values mentioned above and an associated pointer to a static node called "RelativeSize". If that object node is ever called upon to say its size it passes that request to the "RelativeSize" node, along with its own relative size value. RelativeSize knows how to say relative size values, and that's all it knows. But there are numerous different ways to say relative size depending on where the requested string falls in the template. Take for example:
It appears that all the method needs to return is a simple word like "large" or "small". But that is not the case. Consider these variations on the same sentences:
Here we see that the word "large" is replaced by phrases of different sorts:
That these mappings are not always interchangeable can be demonstrated by trying out a few of them.
That's a of gigantic proportions for an apple. ( Sentence 3, mapping 1. )
This apple is pretty good. ( Sentence 1, mapping 3. )
This apple is one helluva humongous. ( Sentence 1, mapping 5. )
Some of the others are tolerable, but when a much larger collection of alternative wordings is amassed we find that there are four functionally distinct ways in which the size may be spoken. I'm sure that grammarians have names for these ways, but Philip, like the average seven-year old, is able to speak quite grammatically while still knowing nothing about formal grammar. Likewise, we don't have much use for theories of grammar in this implementation. Instead, we just plow ahead blindly and enumerate the classes into which phrases fall based entirely on whether or not they can be used interchangeably in a wide variety of templates. A set of phrases that can be used interchangeably in all templates is considered to constitute one such category.
From that ad hoc perspective the categories we find for naming relative attributes with phrases are as follows:
Here's an example of an actual static node (somewhat condensed and simplified for the sake of discussion) from Philip's brain. This is the node that knows relative sizes and how to talk about them. The syntax is pretty self-explanatory for anyone familiar with structured object oriented computer languages in general and C++ in particular. There are, however, some significant differences between this knowledge base language and C++ which we will go into detail about later.
StaticNode RelativeSize
{
// Every physical object has a normal or nominal size
// and a relative size. The normal size is the average
// size for members of the class. The relative size is
// the actual size of this instance relative to normal
// The class could be easily extended with the addition
// of "ITTY_BITTY" and "ENORMOUS", but these will do for now.
Enum
{
EXTRA_SMALL,
SMALL,
AVERAGE,
LARGE,
EXTRA_LARGE
}
List vxs1 = { "very small", "really small", "unusually small", "exceptionally small", "much smaller than average" };
// words that don't fit with "sized"
List vxs2 = { "tiny", "little tiny", "tiny little", "petite", "itsy bitsy", "diminutive" };
List vs1 = { "small", "smaller than average", "relatively small" };
List vs2 = { "less than average" };
List vs3 = { "a bit smaller than average" );
List va1 = { "average", "normal", "ordinary" };
List vl1 = { "large", "larger than average", "relatively large" );
List vl2 = { "big" };
List vxl1 = { "huge", "enormous", "very large", "unusually large", "exceptionally large", "jumbo" };
List vxl2 = { "gigantic", "monstrous" };
List vxl3 = { "king size" };
Method SayName()
{
Phrase( "size" );
}
Method SayTheName()
{
Phrase( "the size" );
}
Method SayAName()
{
Phrase( "a size" );
}
Method SayOfName( String insert )
{
// can say "of size", "of this size", "of that size",
// "of such size",...
Phrase( "of", insert, "size" );
}
Method SayNamed()
{
Phrase( "sized" );
}
Method SayValue( RelativeSize size )
{
// returns straight unqualified value "large",
// "smaller than average", ...
switch ( size.value )
{
case EXTRA_SMALL :
Phrase( Select( vxs1 + vxs2 ));
case SMALL :
Phrase( Select( vs1 + vs2 + vs3 + "a bit on the small side" ));
case AVERAGE :
Phrase( Select( va1 + "the usual size" ));
case LARGE :
Phrase( Select( vl1 + vl2 ));
case EXTRA_LARGE :
Phrase( Select( vxl1 + vxl2 + vxl3 ));
}
}
Method SayAdjValue( RelativeSize size )
{
// returns a form that can be used as a pure adjective.
// May or may not be qualified with the name of
// the attribute.
// There are cases where this can almost be
// Phrase( SayValue(), SayNamed()) to generate
// "larger than average sized". But that doesn't work
// for all combinations like "king size sized" or
// "diminutive sized". Hence a separate method so
// that no restrictions need be placed on which words
// and phrases are usable.
// This method is used less often in templates and is
// present only to provide variety.
switch ( size.value )
{
case EXTRA_SMALL :
Phrase( Select( vxs1 ), SayNamed() );
case SMALL :
Phrase( Select( vxs2 ), SayNamed() );
case AVERAGE :
Phrase( Select( vsx3 ), SayNamed() );
case LARGE :
Phrase( Select( vxs4 ), SayNamed() );
case EXTRA_LARGE :
Phrase( Select( vxs5 ), SayNamed() );
}
}
Method SayIsValue( RelativeSize size )
{
// returns qualified value (i.e., has name of attribute
// "size", or synonym ) that can be used after forms
// of the verb "to be", "is", "are",... ("is" can be
// implied as in "An apple [that was] of enormous size ..."
// or it may be displaced as in "It is an apple of
// diminutive proportions."
// typical returns: "of less than average size",
// "of gigantic proportions"
// It would seem that this method is not needed because
// the caller could simply call
//
// SayOfName( SayValue( size ))
//
// and generate phrases like "of relatively small size".
// Generally that is true, but there are idioms that we
// need to implement like "of gigantic proportions"
// which do not fit that mold, and need to be implemented
// here in since this is the only node that knows how to
// talk about relative size, and no other node does.
switch ( size.value )
{
case EXTRA_SMALL :
Phrase( Select( SayOfName( Select( vxs1 )),
"of diminutive proportions", SizeOf( vxs1 ));
case SMALL :
Phrase( SayOfName( Select( vxs2 + vsx2b )));
case AVERAGE :
Phrase( SayOfName( Select( vsx3 )));
case LARGE :
Phrase( SayOfName( Select( vxs4 )));
case EXTRA_LARGE :
Phrase( Select ( SayOfName( Select( vxs5 + vsx5a )),
{ "of gigantic proportions", " of enormous proportions },
SizeOf( vxs5 + vxs5a ));
}
}
Method SayNounValue( RelativeSize size )
{
// returns noun phrase referring to the value as an thing
// in its own right. Typical returns: "a small size",
// "a less than average size" for statements like "this
// apple has a less than average size"
}
}
That seems like an awful lot of coding just to enable Philip to say "big". On the other hand, that coding enables Philip to talk about the relative sizes of any object, in any context, in any manner under any circumstances. And since that knowledge is encapsulated in the methods of this node we need never concern ourselves with how to talk about relative sizes again. We can have Philip discuss small planets and large atoms with equal ease.
Notice the word lists that define alternative expressions for each size. In a more generalized system these could come from a data base that represented the "personality" or characteristic vocabulary of the program. A small child might use different words than a Harvard professor, and so would have different lists.
Notice also that because of the way language knowledge is embedded in world knowledge Philip would have a very difficult time learning a different language. Of course if Philip had been taught another language while he was much younger it would have been much easier, but now it's almost too late. Second language acquisition will not be easy for Philip because he only knows how to think in English.
A few asides for programmers. Case statements do not require "break" since it is implied by the next case statement. The plus operator concatenates two lists. The Select function can take either one or three arguments. If one argument is passed it is assumed to be a list of strings and one of those strings is selected with equal probability for all of them. If there are three parameters the first parameter must be a single string, the second parameter may be either a single string or a list of strings and the third parameter is a count of the items from which the first string was selected. This enables the Select function to make two selects and yet produce the same statistical distribution as if both selections were performed at the same time. For exampleSelect( Select( list_of_3 ), list_of_5, 3 );will first select one of the three items in the first list, and then, with a probability of 5/8, either select an item from the second list, or keep the first item. The net result is that each of the 8 items from the two lists is selected with equal probability.
The keyword "Method" is not a function return type but a flag identifying the presence of a new method definition. All methods have a return type of String. The functions "Phrase()", "Sentence()", "Question()", and "Exclaim()" set the return value and cause the method to exit. The functions Sentence, Question and Exclaim also insure that the string they were passed is properly capitalized and punctuated.
One simplification in the above example relates to the random selection process. In many instances only short descriptions are desired. We don't mind saying "the big red apple" but we would hesitate to say "the significantly larger than average sized bright rosy red apple" simply because when a list of attributes is named we prefer to keep each item in the list short and to the point. The mechanism for enforcing that preference was left out of the above example and will be discussed later.
Object Nodes
Now lets look at a different kind of node; an object node. In this case we will examine a much condensed and abbreviated version of the node that enables Philip to spawn a shower in his mental imagery. In this example we will be focusing on the type of methods that are unique to objects which are rooms, i.e. enterable containers that are locations. While the Room class is derived from the Container class and the EnterableObject class, the Shower class is derived from the Room class and inherits all of its methods. However, this particular room will override some of the methods that it inherited in order to perform some special operations unique to showers. For the sake of brevity, and to stay focused on the topic under discussion we will omit showing the other objects that the Shower object spawns in its constructor method, such as a shower head and a hot and cold water valve. Instead we will make the simplifying assumption that the ON/OFF attribute belongs to the shower itself rather than to the water valves as it would be properly implemented.
What we wish to focus on in this example is the MoveInto() and MoveOutOf() methods. These methods are inherited by all objects of type "Room" and perform any operations that are necessary to insure that the virtual reality simulation represents a realistic picture of what might be supposed to be happening in the real world under such circumstances.
Node Shower : Room : OnOffObject
{
Node activator;
Method Shower();
Method MoveInto( Node * actor, tense );
Method Activate( Mode * actor, Certainty cert );
Method DeActivate( Mode * actor, Certainty cert );
};
Method Shower::Shower
{
onOffState.value = { OFF, ASSUME };
}
Method Shower::MoveInto( Node * actor, tense )
{
if ( actor->isWearing.value == NULL )
{
actor->isWearing = { "none", ASSUMED };
}
else if ( actor->clothing.type == AGED )
{
// Template: "I hope <he> <removed> <his>
// <clothing> first."
Sentence( "I hope", actor->SayPron( NOM ),
SayVerb( remove, tense, actor->conj ),
actor->SayPron( POS ),
actor->isWearing.SaySetName(), "first" );
actor->isWearing = { "none", ASSUMED };
}
else if ( actor->clothing.type == ASSERT )
{
// it is asserted that he showered with clothing on
Sentence( "LOL! That's funny." );
}
actor->SetLocation( this, ASSERT );
if ( actor->activity.type != ASSERT )
{
// if we haven't been told otherwise, assume he
// entered the shower to take a shower
actor->SetActivity( ShowerScript, ASSUME );
}
}
Method Shower::MoveOutOf( Node * actor, tense )
{
// not sure where he went, but assume the room
// that contains the shower
actor->SetLocation( this.parent, ASSUME );
onOffState = { OFF, ASSUME }; // assume its off
activator = { actor, ASSUME }; // assume actor turned it off
}
Method Shower::Activate( Mode * actor, Certainty cert )
{
onOffState = { ON, ASSERT }; // certain its off
activator = ( actor, cert };
}
Method Shower::DeActivate( Mode * actor, Certainty cert )
{
onOffState = { OFF, ASSERT };
activator = { actor, type );
}
Yet another programmer aside: Notice that in this example the methods are not defined within the body of the node definition as they were in the first example. Instead, they were declared in the definition but actually defined outside the node definition. When the definition is deferred (and it may be deferred to a completely different source code file) we need to associate the method with the proper node. This is done using the C++ convention of mentioning the node to which it belongs and appending "::" plus the name of the method. Also notice the line:Node activator;This is actually a pointer variable that points to an object of type Node. It is not declared in the usual C++ manner "Node * activator" for the simple reason that all variables in kbComp are pointers. There is no need, therefore, to explicitly state that fact.
In this example we see the use of the value and certainty fields of a value. For example when the Deactivate() method is called it is asserted as known fact that the shower is off. However, the certainty we have about the actor who performed that action is something we cannot know about first hand, and we depend on the caller to supply the value which may be ASSUMED if we dont know for sure, or ASSERT if the user told Philip that the actor in question actually turned off the shower.
Another point of interest is found in the statement:
actor->SetActivity( ShowerScript, ASSUME );
Here we are assuming that the actor entered the shower for the purpose of taking a shower and we set his current activity accordingly. But suppose the user had typed "If John is taking a shower, what is he wearing?" In this case it is asserted that John is taking shower, and the MoveInto() method will have been called by a script object rather than by the simulation engine itself. In such a case the statement above will have no effect. In general, any attempt to replace an asserted value with an assumed value will simply fail. The ShowerScript script object has already been asserted and activated for the NPC John (NPC or Non-Player Character, meaning any third party mentioned by the PC) and there is no need to activate it again or to assume anything.
Well get into more details on script activation later on, but for now just realize that it is the activation of the script that allows the simulation engine to keep track of likely activities and sub-activities that the PC or NPC might be engaged in until such time as the simulation is informed of the fact that the script has ended. In the case of the ShowerScript the activities begin with LocateNearest( Shower ), and MoveInto( Shower ) (which are skipped if the person in question is already located in the shower). In addition, if the characters location has not previously been specified the LocateNearest( Shower ) function will have made some assumptions about the characters location (generic "Dwelling") and spawned an instance of "Shower" in the appropriate place (the newly spawned bathroom) within that generic dwelling.
Virtual Nodes
Another useful type of node is the virtual node. The virtual node is a container which does not exist. That may sound odd but consider the following sequence of inputs from the user:
There is an apple and a book on the table.
If I picked something up from the table what would I be holding?
This is representative of a large number of cases where it is known (asserted) that something was picked up but the identity of the object is not known. The sequence of events required to model this situation in the virtual reality world of Philips imagination is to remove both objects from the table (i.e., the tables "inventory of items contained") and place them into some inter-dimensional limbo. They are replaced with a virtual object. A companion virtual object is now placed into the characters hand (using MoveTo() for the hand object which is a container object as well as a body part). Now two copies of the list of objects formerly on the table are made and one copy is placed into each of the two matching virtual containers.
In order to answer the question "What am I holding?" the simulation engine interrogates the characters hand with a NameContents() method (common to all container objects). If the hand contained an apple then the hands NameContents() method would ask the apple to say its name, "an apple" and plug that string into the template "Im holding <string>." However, the hand now contains a virtual object which has no name, and so when the hand asks this object to name itself ,its SayName() method has a number of choices. It may enumerate all of the objects in its list and generate "either an apple or a book", or it may look to its companion object (to which it is linked by a pointer) and say "whatever is not on the table". Either response is correct, and which one is selected will depend on various criteria such as the relative lengths of the lists in the two virtual containers. Or, all else being equal, it might be a random choice of templates.
The virtual container knows how many objects its parent is supposed to be holding so if there were three objects, an apple, a book and a bottle of wine, on the table, the tables virtual box knows its holding two of them and the hands virtual box knows its holding the third. Now suppose the user types "Ill give you a hint. Its not the apple." At this assertion the apple can be removed from the hands virtual box and discarded, while the tables virtual box will be told to move the apple from itself to its parent (the table). Thus we are now certain that the apple is on the table and not in the PCs hand. Thus Philip does not "logically reason" about what is on the table, but actually moves, in simulation, the named objects about and then "observes", in simulation, the state of affairs.
When it happens that the number of items on the list matches the number of items known to be held, then the virtual box will move all of its remaining items to its parent and discard itself. This happens when later information clarifies who (or what) is holding which objects. For example: "There was an apple, a book, a candle and a bottle of wine on the table." "I took something from the table and so did John." There are now three interconnected virtual containers in the set, each containing a matched list of possible contents. "John ate what he was holding." The apple is the only thing that fills the bill (it is derived from an EdibleObjects class) so it is removed from all three virtual boxes and moved to Johns hand. Johns virtual box destroys itself in the process because it is now holding zero objects. "I drank what I was holding." The bottle of wine works for that sentence (its contents are derived from the Beverage class) so it is removed from the remaining two virtual boxes and moved into the PCs hand, and that virtual box is discarded Now the virtual box on the table knows it is holding two items from a list of two items, and therefore, that it is actually holding everything in its virtual box. The items are moved to its parent and the virtual box is discarded.
Even complex logic problems of the sort where one is given a list of name, occupations and favorite beverages can be solved easily and effortless in this manner. An NPC is created for each named person and their occupation and favorite beverage attributes are filled with virtual boxes containing all of the alternatives. As the clues are revealed the various virtual boxes empty themselves of alternatives one by one, and finally discard themselves leaving the correct values in the correct slots. Philip can solve complex logic problems not by logical deduction or inference, but by "imagining" the effects of the given clues on the state of the simulation.
Intangible Nodes (with a side-trip into curiosity and emotion)
Another important class of nodes is intangibles. Theres no need to go into the details of coding methods since weve seen a few examples of how they work. Instead we will look at these nodes from a function perspective, with the details of coding left as an exercise, as they say, for the reader. Intangible nodes include things that are not objects yet can be detected by some sense. Odors, sounds, heat, and light are examples of nodes of this type. Consider this example where the user is telling Philip a story:
"It was a dark and stormy night."
Philip's internal response to that statement is to create a generic outdoor location and attach time of day and weather to it.
"John was glad he wasn't out in the rain."
Philip spawns generic NPC and assigns name "John" (ASSERT) and gender "male" (ASSUME). Sets actor's location to generic building, ASSUME "dwelling", and ASSUME placed in the generic location above. ASSERT Johns emotional state = "happy".
"He built a cozy fire and snuggled up with a favorite book."
Updates generic room by populating it with a generic fireplace object. Spawns generic burnable object, ASSUME generic fireplace logs. Places logs into fireplace. Activates the fireplace. (Fireplace's Activate() method will activate a burning script for the logs. Activated fireplace spawns an intangible object which is a source object for the perceptual phenomenon "heat", which is emitted by the that intangible object.) If asked, Philip knows that John will feel heat from the fireplace. Assigns adjective "cozy", (or standard vocabulary synonym) to fireplace. Spawns generic seating object ASSUME generic easy chair. Places John in seating object. (Chairs are containers with MoveInto() methods.) Sets John's contentment state variable to "high" (ASSUME). Spawns generic book object. Gives book object "John's favorite" attribute. Moves book object to John's hands. Invokes ReadABook script with John as the actor and the book object as a participating object. (ReadBook script will call OpenObject() method for the book object and ASSERT that the book is open. Books derive from the class ReadableObjects and OpenableObjects.)
Now if nothing else is said, Johns NPC will sit in his chair, turning pages from time to time, and the fireplace will continue to burn and give off heat until the logs run down. Then it will go out. All this is now being simulated in the virtual world of Philip's imagination.
"Then he heard a strange scuffling noise just outside his window."
Room is updated with generic window object. Intangible noise source object is spawned and placed outside the window. Noise adjective list is set to { strange, scuffling }(or equivalent standard vocabulary adjective(s)). The identity of the noise source is pointed to a virtual container which is populated with some possible sources of nocturnal scuffling noises, perhaps a cat, a raccoon, and a deranged serial killer. The actor's HearNoise() method (Inherited from Person class) is invoked. This method, noting time of day, weather, adjectives attached to noise and possible contents of the virtual container, and noting that John's "bravery" attribute is unspecified, might generate the response:
"Was he scared?"
Or since Philips attention is focused on ascertaining the values of unknown objects he might just as easily speculate:
"It was probably just a raccoon."
Why would he make such a speculation? Because Philips curiosity is driven by the need to establish values for high-priority unknown variables. If Philip asks the user "What is your name?" it is not because he was scripted to print this canned question, but because he is motivated by his curiosity to know this important piece of unknown information. This particular unknown, the identity of the noise source, is of particularly high priority since it is not only a source of noise, but a source of anxiety. In other words, the NPCs reaction to the noise (possibly anxiety) would trigger the spawning of an intangible anxiety generating object associated with the noise generating object. Since Philip is motivated to reduce his anxiety level, and by identification with the storys character, the NPCs anxiety level, he will ASSUME a benign value for the source of anxiety, such as a raccoon, and by that assumption, reduce his anxiety level.
Testing for plausibility and Impossibility
Methods that perform actions on objects in the simulation should perform certain tests on the proposed action to see if it is possible or plausible. Statements like "The toaster opened the window." should be recognized as absurd by the Open() method of the window, which is inherited from the parent class of OpenableObjects. The problem with that is the fact that when an action method is called it is presumed that the action is actually to be simulated in the world. In other words, we are committed to performing the action and its too late to complain about it. For this reason action methods have matching test action methods which, rather than performing the action, return an indication of whether the action is possible. If the TestOpen() method for window returns a NULL then it has found nothing objectionable about the proposed action and the simulation engine is free to carry out the action. If, however, the return value is not NULL then it will be a string describing the objection raised by the method; "Toasters cant open windows. They dont have hands."
Language Input
All this talk of simulating the actions described by the user presupposes that Philip is somehow able to parse the statements made by the user into commands to the simulation engine. Like the language generation techniques used by the object methods, the parsing of input English language statements is entirely ad hoc. Frankly, my knowledge of English grammar is pretty much confined to what I picked up watching childrens television programs with my kids when they were young. If it werent for Sesame Street and Conjunction Junction. I wouldnt have the slightest idea what a conjunction is. As it is I have only a vague notion of the concept. Philip, therefore, knows no more about grammar than I do. Fortunately, a lack of knowledge of grammar is no barrier to understanding and speaking intelligibly as any five-year old is able to demonstrate quite convincingly.
Philip deconstructs sentences in a series steps. Each step is handled by a separate program function and makes reference to its own special purpose data base. Lets take a look at some of these functions.
Numeric Preprocessor
The numeric preprocessor only understands numbers and number words. It handles units of measure and special currency words like "nickel" and "dime", "two bits" and "a buck fifty two". It also recognizes certain unique values like "3.14 " which it changes to "$pi". The output from the numeric preprocessor will be numeric values and arithmetic operator words in an internal form that uses keywords that begin with "$". Thus a statement like "a buck thirty five" becomes "$num.1 $dot $num.35 $dol", with any units of measure placed in a standard position at the end of the quantity to which it refers. You will notice that that the numbers are integers and that the decimal point is passed on as "$dot". Like many humans, Philip is not too sharp with decimals and fractions but he does know how to do arithmetic to two decimal places, since this is the standard precision of currency values. The numeric preprocessor could be expanded to handle numbers of indefinite precision but for the purpose at hand two decimal places suffices.
Names Preprocessor
The names preprocessor scans the input for recognizable names. This includes peoples names, place names and whatever else fits comfortably into that category. Slang names for geographical locations like "Sin City" and "Tinsel Town" are replaced with the standard geographical signifier for that place name. Signifiers consist of a type code and an identity code. Hollywood, for example, is a city and has the identity code "hwd". The complete signifier for Hollywood is "$cit.hwd", and that is the value that replaces the word "Hollywood" in the input sentence before it is passed along. Later on these codes will be used to index into a geographical database so that if Philip is called upon to spawn an instance of Hollywood in order to think about it, he will know where to place it in relation to the rest of the world. Knowing the context of $cit.hwd enable Philip to know that it is in California ("$sta.ca"), in the United States, on the North American continent in the western hemisphere on planet Earth in the solar system in the Milky Way galaxy, in the real Universe, and yes, Philip does know all of that.
The geographical database also includes appropriate contexts for fictional locations such as the planet Vulcan in the Star Trek universe, and the Emerald City in land of Oz.
Parsing
The remainder of the parsing and interpretation of the sentence plays back and forth between the sentence, the data bases and the data base nodes discussed above. What follows is a step by step description of each part of the complete process from input sentence to output response.
The sentence is scored against the most common 1000 words in the English language. If it scores poorly then the words are scored against a set of foreign language dictionaries. The dictionary that generates the highest score is assumed to be the language of the sentence. If that language is not English then Philip generates a response of the type "Im sorry I dont know any French." with the appropriate language named, or if the language is unidentified, a query as to which language it might be.
If it passes the English language test then words are identified by part of speech from the dictionary. If more than one is possible (such as verb: "I was running.", adjective: "The running water...", or noun: "He liked the running better than the standing around.") then copies of the sentence are made and one part of speech is assigned to each so that all possible interpretations are present. When another word has multiple interpretations then copies of all those copies are made for those alternate interpretations. At the end of this phase there might be a few dozen copies of the sentence, each representing a different alternative set of assignments of parts of speech.
Unidentified words are resolved. These may be misspellings or phonetic or idiosyncratic spellings entered by the tricky user trying to trip up Philip. Sentences such as "The guvmint man wuz here lookin for Billy Bob." Should be handled without missing a beat. If several alternatives are possible then several alternative sentences are generated for further testing. The process of resolving these words uses letter transposition tests, phonetic spellings, and head/tail compares The head/tail compare finds words that phonetically match part way through, such as "dikunArE" which matches the first three sounds of "dikshunArE". Then the tails are compared, in reverse order from the ends of the word. We find that we have a head tail/match on "dikunArE" with very little missing in the middle and we conclude that the word, originally spelled "diconary" was meant to be "dictionary."
There is always the risk that a wrong guess was made or that the unknowns are actual words missing from the dictionary, and so the unknowns are also passed through unaltered to be analyzed later in the process. These unknowns will be matched for function, thus the seeming nonsense words "kamunga garabed glubnic" would be recognized in the context: "When I visited Kamunga Garabed Glubnic was my guide." In this case the unknowns are simply marked as unknowns and passed that way to the pattern matcher. If a pattern can find a match by assuming that "kamunga" is a place name and "garabed glubnik" is a person name then it will return that match. If it can match with other assumptions ("He garabed his glubnik on the low branch.", by assuming that garabed is a past tense verb: "to garab, was garabing, etc." and "glubnik" is a noun.) it will count those as matches as well. This also insures that quoting Lewis Carrol or Finegans Wake wont completely throw it for a loop, and that when it asks for clarification it can do so in an intelligent manner.
Functional Conversion
The next step finds the function performed by the sentence. Each sentence is copied, discarding the actual words in the copied version and keeping only the part of speech of each word. These part of speech blocks represent the grammar of the sentence.
Each part of speech block is matched against templates, starting with shortest templates which will match portions of the sentence and return functional notation equivalent. Each alternative causes the creation of a new copy of the sentence pattern. The original pattern and the partially functionalized patterns are matched against longer and longer templates until all templates have been tried. At the end of this phase there could be more possibilities or fewer than at the beginning of the phase, but each surviving interpretation will be in purely functional form, i.e. a procedural list of method calls to named objects.
Before the statement can be answered or the simulation performed each object named in the statement is bound to some object in the simulation. This may require a data base search to find named objects like "Capt. Kirk" and instantiate them in the simulation, or it may simply attach the references to recently discussed objects that meet the criteria. ("Him" applies to the only non-self, non-PC male character mentioned recently, or to the most recently mentioned one if more than one is present in the simulation.)
Data base searches may generate clarifying questions if insufficient data is present. A statement like "I saw Jim yesterday." will not expect to find any matches in the "famous" people data base, and so will instantiate a generic person object and give the name "Jim". A statement like "Jim and Spock were standing on the bridge." will generate a hit because one Jim was found that shares a group list with one "Spock" that was found. (See name data base format below for an explanation of group links.) The assignments are ASSUMED and a clarifying question is generated: "I assume you mean Capt. Kirk and the Vulcan science officer." The tag line "the Vulcan science officer" comes right out of the data base verbatim.
If the clarifying question is answered in the affirmative then the ASSUME values are changed to ASSERT, and the engine returns to the statement it was working on before it interrupted the flow with the clarifying question.
If the binding is uncertain ("Jim and Spock were standing on the bridge when he hit him.") then the bindings attach to a virtual box (see "Virtual Nodes" section above) containing the alternative bindings.
For each possible alternative present it will be clear if the statement is asking for factual data, (including a "yes" or "no" response) from the data base, factual data about the state of the simulation (What's on the table?"), or personal opinion (information from Philip's opinion data). In addition the statement could be a command to update the simulation in some manner, either factually, historically, or hypothetically. (The PC can have three person objects representing him. One present, one past and one hypothetical.)
Factual updates like "I'm getting bored with this game." are applied to the associated objects or present person object of the PC.
Hypothetical updates are applied to the hypothetical person object for the PC, or to hypothetical thing objects. ("If I was standing on top of Big Ben, what would I see?" would move the PCs hypothetical object to a hypothetical instantiation of Big Ben, or more correctly, an instantiation of a hypothetical Big Ben.)
Historical updates are applied like hypothetical ones but are applied to the historical PC object and given a time frame. ("Last week I was in Chicago." will move the PC to Chicago and mark the time frame as 7 days ago plus and minus some unknown number of days. It could also request clarifying information such as "How long were you there?") Historical updates to objects are made by instantiating a past copy of the object linked to the present copy and marked as past. Thus "John is in New York. Last week he was in Denver. Where is he now?" is answered correctly.
But at this point, unless there is only one possible interpretation, we still havent figured out what the sentence means, therefore each statement is tried out with the "Try" methods for the objects. The TryXXXX() methods, such as TryMoveTo(), do not perform the actions or retrieve the data, but only report back on the likelihood or impossibility of the request or assertion. Thus "Who is the president" will try to assign the title "president" to the rock group "The Who" and get back a highly unlikely score, while another interpretation will treat it as a question of fact, try asking for that factual information and get back a good answer. The result that gets the best answer will be kept and actually executed and the others will be discarded.
If two alternative parsings get a very close score, or if no parsing gets a good score then a clarifying question will be generated. "Do you mean to tell me that the rock group The Who has been made president?" The clarifying statement is parsed and the factual assertions tested against the alternative parsings. At that point only one parsing should remain and it will be the one carried out.
If any command cannot be carried out because of uncertain bindings (as marked above, for example, when "he hit him" caused the creation of virtual boxes in the binding slots for "he" and "him") then the engine will generate a clarifying question such as "I'm confused. Who hit whom?"). If the answer to the clarifying question resolves the binding then the action is carried out in the simulation.
At this point if the statement was a simple declarative such as "Last week I was in Chicago." the performance of that action and the activation of the "TakeATrip" script generated no responses. They do their work silently. The engine therefore queries the most recently created object to see if there is any missing information it would like to plug into any of its member variables. The TakeATrip script might respond to its QueryInfo() method by generating "Business or pleasure?" in order to fill in that particular missing value. If the response is simply "Both." then this response, after being parsed (a simple job in this case) is passed to whichever object generated the query. Since the object knows what query it last generated it knows how to handle the information, replacing the ASSUMED virtual box that holds { "business", "pleasure" } with an actual value container object holding those same two values. The certainty flag for that variable is changed to ASSERT to indicate that we were specifically told that these values apply.
The following conversation that seems almost like magic is thus explained quite easily: (Human input preceded by ">")
>I was in Chicago last week. Business or pleasure? [Created by TakeATrip script to fill holes.] >Both. Ive heard Chicago weather can be brutal in January. What was it like? [Created by Chicago object SmallTalk() method based on current date.] >Cold and windy. Well, they dont call it "the windy city" for nothing I guess. [Created by Chicago object by matching response word "windy" to db tag line.] >You got that right. So how long were you there? [Created by TakeATrip script to fill holes.] >Just a couple of days. Did you see Mrs. OLearys cow? [Created by Chicago object SmallTalk() method.] >Huh? That Picasso sculpture that nobody knows what it is. I figure it must be the cow that kicked over the lantern and started the big fire back in the 1800s. [Created by PicassoSculpture object DescribeSelf() method.] >LOL So where do you hang out when youre not in Chicago? [TakeATrip has its fill of info so engine calls users PC object QueryInfo().]
And so on.
The questions about Chicago-specific features and the opinion on the Picasso sculpture come from the geographic data base and are spouted verbatim, after instantiating any objects that they refer to. Once used, a question or statement will never be repeated in the same instantiation so that it will not become apparent that they are canned responses. Had the inquires about the sculpture gone any further Philip would have been called upon to confess his ignorance in some convincing, human-like way like "To tell you the truth, Ive never seen it myself." or "The only time I was ever in Chicago was a half hour layover at the airport.", both taken from the "library of lame excuses" and plugged with the appropriate variables. Of course the values are plugged into the appropriate variables in the Philip object so if the user later asks "When did you say you were in Chicago", Philip will remember.
The Control Loop
The control loop running under Windows uses time allocated to it between messages in the message loop. The flow of control is governed by an event control loop that routes control for each time slice based on state variables. Those details are omitted in this discussion and what is presented is not exactly what the loop is, but rather, what it does. The logic flow is equivalent to the following pseudo code.
Loop forever
{
Get user input.
Parse input and return list of candidate parsings.
For each possible parsing
{
Try the methods called for and score that parsing
based on values returned by "Try" methods.
If all parsings get low scores
{
Confess inability to understand, or select
highest scoring parse and let its
"Try" method generate a string explaining
why the action doesnt make sense.
Ask for clarification.
}
If two or more parsings get high scores
{
Paraphrase highest scoring parse and ask for
validation of the parse. ("Did you mean
")
}
If one parse gets high score
{
if awaiting validation and input did validate
{
Current parse = best parse from last round.
}
Apply current parse to the simulation and possibly
generate a response.
}
}
If called methods did not generate a response
{
Enumerate objects recently accessed.
Query each object for a "curiosity" value indicating
how badly it wants information to fill values.
QeuryInfo the object with the highest curiosity priority
so it can generate a question to the user.
Keep track of which object was queried so it can be
properly bound to the response to its question.
}
}
Notice where each object is queried for its curiosity level. The TakeATrip script object, being the newest object created, gets a high score for being recently spawned. In addition it has values only in the slot for "who" and "where" and because it has several empty slots to fill it raises its hand and clamors for attention with a high priority value. The Chicago object, being created just slightly before the TakeATrip object also has a recent origin. But the questions it wants to ask are in the nature of generating Chicago-related small talk and are not of quite as a high a priority, so TakeATrip wins out and asks "Business or pleasure?"
In the next iteration the TakeATrip object has been temporarily satisfied and the Chicago objects request to make small talk rises to the top of the heap and generates a piece of random small talk. And so it goes, with curiosity driving the conversation forward, and each new declaration by the user generating more objects about which more curiosity can be generated.
Knowledge and Data
By now it should be obvious that the engine that drives Philip is relatively straightforward, and as long as Philip has node definitions for objects it is called upon to create it can carry on a convincing conversation. What should be equally obvious is that in order for Philip to have this ability over any reasonable range of conversation topics will require a huge library of object definitions along with their associated methods, and a huge set of data bases against which to make queries and from which to instantiate specific objects to represent objects in the data base.
For example, in order to create the Chicago object in the above example it must have data on file for Chicago in a form that can be directly plugged into a newly spawned generic "City" object. Other examples include information needed to instantiate an Abraham Lincoln, or Captain Kirk object from the generic person class. Lets take a closer look at the person data base as an example of how such data bases are constructed.
During parsing the words "Capt. Kirk" would be transformed into
"kirk||||capt||.nspm".
where all the known pieces of data have been slotted into their appropriate fields in the string. This is what that name looks like in the sentence as it is passed along for further processing. When that string is sent to the names data base a record is found that looks like:
kirk|james|tiberius||capt|#I|captain of the starship Enterprise|. . .
The data base reports back that the name has one record that matches two fields and contradicts zero fields. If "kirk" had been spelled "kirck" then it would report back one exact match, one phonetic match and zero contradictions. Scores are assigned based on the number and type of matches. If there are more than one record that score high then a clarification is requested from the user, using the data base tag line. ("When you say "Captain Kirk I assume you mean James T. Kirk, captain of the star ship Enterprise.")
Also part of each data base record is a pointer to a list of association groups. Each person record may point to more than one association group. Paul McCartney would point to the "Beetles" group as well as to the "Wings" group records. The group record itself contains data about the group as a single entity; its name, its members and so on. When matches are very doubtful because of too much missing information shared group links can sometimes disambiguate them. For example, "John, Paul, George and Ringo" would easily be identified by finding the name with the fewest number of matches ("Ringo") and examining the association groups of each "Ringo" record for a group with other members named "John", "Paul" and "George". In the same way "Larry, Moe and Curly" would be identified as three specific people in the data base in spite of the lack of qualifying information like last names. Not only that, but Philip would then have the correct reference at hand, the group record, to discuss any of the other Three Stooges, such as Shemp Howard.
Thus it comes as no surprise that Philip can respond appropriately to questions like: "If Jim and Spock are on the bridge, where are they?" Theres no way Philip will think they are two generic people on a river-crossing structure, and quoting the group record tag line in his response he might say "I presume they are on the star ship Enterprise going where no man has gone before."
Questions like "Who is your favorite Star Wars character." are answered in a similar manner. The phrase "Star Wars" is identified in one of the preprocessor passes as a movie title and the movie data base entry points to the group list which includes pointers to characters and to the actors that played them. Flags in the person records identify each person as real, fictional, historical, biblical, animated, and so on. Likewise, fictional people include pointers to the work or works in which they appeared. The work of fiction data base identifies it as a movie or book, and if a book, points to movie records based on the book. These records also point to the authors of those works. Mention of "Bilbo and Frodo" puts Philip in immediate contact with the records for Tolkien and Elijah Wood as well.
Geographic Data
Each entry in the geographic data base is a single object identified as a city, state, province, country, landmark, and so on. Included in the record is a pointer to its container object so that San Francisco is contained by California which is contained by the USA which is contained by North America and so on up to the entire factual Universe. Batmans Gotham city is likewise contained, but not in the factual Universe, but a comic super-hero universe that is quite separate from our universe.
Whenever a specific location is named that location is instantiated as an instance of the appropriate class and placed in the world. (Earth, solar system, already exist since they were instantiated as the ASSUMED location of the PC.) Thus if Philip claims to live in San Jose and the user reports that he lives in San Francisco, Philip will recognize that they are in the same state and by referencing the geographic data base constructs a connecting object named "El Camino Real" (the highway connecting those two cities). Thus he will be equipped to discus the geography of the area and the relationship between their two reported locations. If pressed for details beyond his knowledge Philip can fall back on the "lame excuses library" and claim that he only just moved to the area and hasnt yet become that familiar with it.
Common Sense
Common sense is encoded in the class definitions for common objects. Philip knows what activities commonly occur in a shower (coded in the shower class), which side of the road to drive on (coded in the road class, keyed to country under dicsussion), what occurs during a visit to a restaurant (coded in the dining out script class), and what keys are used for (coded in the key object class).
While Philips engine is straightforward his real intelligence, knowledge, common sense and ability to carry on convincing conversations is dependent entirely on the size and quality of his data bases and his class and script libraries.