This is an old webpage! For the spring 2014 course, visit: http://www.cse.chalmers.se/edu/course/TIN172/

Artificial Intelligence, LP4, VT2013

Project 2: "Shrdlite"

In this project you will implement a dialogue system for controlling a robot that lives in a virtual block world. The robot can move around blocks of different forms, colors and sizes, and it can answer questions about the world and ask for clarification whenever necessary.

The inspiration is Shrdlu, by Terry Winograd 1970. A short reading list about Shrdlu is at the end of this page. For Winograd it was a PhD project, but now there are tools available that make things easier: grammar formalisms, ontologies, planners, etc.

You will not have time to create a system that can do everything that Shrdlu can, so you only have to implement a subset of Shrdlu's capabilities.

Try it yourself

The Shrdlite system is already running here:

http://www.grammaticalframework.org/~peter/shrdlite/shrdlite.html

But this one has a pretty lousy planner, it's bad, slow and even incorrect! Your task is to make a better planner, and perhaps also extend the vocabulary and grammar.

You can test the program right here and now. Enter the following sentence in the input textbox and press 'enter':

  • Put the blue block that is to the left of a pyramid in a medium-sized box.

The system parses the sentence, and infers a sequence of simple actions that accomplishes the task.

Note: You need an up-to-date web browser! I have tested it with Firefox 19, Safari 6 and Chrome 25.

The world

Shrdlite uses a 2-dimensional blocks world, not 3d. The main reason for this is that it is much easier to visualize, but also because spatial reasoning becomes easier.

The world contains a floor, and a number of blocks. The blocks can stand on or in each other (if the form and size permits). The goal of the "game" is to move around the blocks in any way the user likes, a bit like Minecraft or Lego. This can only be done by talking to a robot arm which can pick up and put down blocks.

The world contains at least the following forms, colors and sizes:

  • Forms: Squares, rectangles, pyramids, balls and boxes.
  • Colors: Red, black, blue, green, yellow, white.
  • Sizes: Large, medium, small, tall, wide.

Internally, the world is stored as a list of lists of blocks, where each block has a unique identifier.

The basic commands

There are only two basic commands:

  • pick(n): Picks up the topmost block from column n
  • drop(n): Puts the block that is carried onto column n

These commands must obey some natural physical laws that constrain the movement of the blocks. Here are some examples:

  • The floor can support any number of blocks.
  • All blocks must be supported by something.
  • The arm can only hold one block at the time.
  • Smaller block cannot support larger.
  • Pyramids and balls cannot support anything.
  • Blocks are "in" boxes, but "on" other blocks.
  • The arm can only pick up free blocks.
  • Boxes must be supported by something larger.

The grammar

You are given a grammar written in GF (Grammatical Framework). This grammar is used for parsing, i.e., it can translate user input into block world semantics.

One of the possible additional tasks is to extend the grammar in differnt ways, e.g., to handle more abstract or complex descriptions, or to use it for generating system utterances.

The grammar can (among others) understand and produce the following user commands:

  • Pick up a big red block.
  • Grasp the pyramid.
  • Put it in the box.
  • Move the green ball on the floor inside the box.
  • Put the white square under a ball.
  • Put the ball that is to the left of a pyramid inside the box.
  • Take the ball and put it in the box.

The following utterances are syntactically ambiguous, which means that the grammar returns several different parse trees as possible semantics:

  • Take the ball under the pyramid inside the box.
  • Move the ball under the pyramid inside the box.
  • Put all pyramids inside the box on top of the red block.

Compare the last one with the following two, which are not ambiguous:

  • Put all pyramids that are inside the box on top of the red block.
  • Put all pyramids inside the box that is on top of the red block.

The following are grammatically correct, but semantically wrong:

  • Take the floor.
  • Take the tall square.
  • Put it in the pyramid.
  • Move the ball under the floor.

These are semantically correct, but logically impossible:

  • Take the ball that is left of all blocks.
  • Put it on top of all pyramids.

Your planner should be able to handle all these things, with appropriate responses.

Testing the grammar in GF

When you have installed GF, you can test the grammar like this:

> i ShrdliteEng.gf ShrdliteSem.gf
(...)
Languages: ShrdliteEng ShrdliteSem

Shrdlite> p -lang=ShrdliteEng "put all rectangles in a red box" | l -lang=ShrdliteSem
( move ( all ( block rectangle _ _ ) ) ( inside ( any ( block box _ red ) ) ) )

Shrdlite> p -lang=ShrdliteEng "take a tall block in the red box" | l -lang=ShrdliteSem
( take ( any ( thatis ( block _ tall _ ) ( inside ( the ( block box _ red ) ) ) ) ) )

Planning the moves

A user command such as the example on top of this page:

  • Put the blue block that is to the left of a pyramid in a medium-sized box.

is translated by the parser into something like this:

  • (move (the (thatis (block _ _ blue) (leftof (any (block pyramid _ _))))) (inside (any (block box medium _))))

From this the system has to infer which blocks the outmost command refer to; such as:

  • move a l

The the system has to come up with a plan: a sequence of simple actions that puts the rectangle a into the box l; e.g.:

  • pick 1; drop 0
    • a is under a ball in column 1, so first we have to move it out of the way (by putting it in column 0).
  • pick 9; drop 8
    • there is already a block in l (which is in column 9), so we have to move that block out out of the way too (by putting it in column 8).
  • pick 1; drop 9
    • the main movement: we move block a from column 1 to column 9.
  • pick 0; drop 1
    • we restore the ball that was on a, but moving it back to the original column.
    • we can't restore the ball that was originally in l, since a is too thin to support medium-sized ball.

As you hopefully have noticed, the example planner generates a much longer plan, with lots of unnecessary actions. Furthermore, the example planner does not care that some blocks cannot support other blocks.

Execute the plan

The system now executes the actions in the plan. When doing this, it informs the user of what is happening, at the same time showing what is happening visually. This is also decided by the planner, so the returned plan can be something like this:

I move the topmost block from column 2 to column 0.
pick 2
drop 0
I move block r9 to the floor.
pick 7
drop 5
I move rectangle r2 inside box b8.
pick 2
drop 7
I move ball c3 to column 2.
pick 0
drop 2

The plan should be returned as one action per line. Everything that is not a basic command is assumed to be a message to the user.

You can enter a plan directly into the interface, if you separate each action with ";" and only use basic actions. Try e.g. to enter this:

pick 1; drop 0; pick 9; drop 8; pick 1; drop 9; pick 0; drop 1

The code

Update: Now there's a page with installation instructions! I hope it's correct...

The HTML, Javascript and the GF grammars are packed together in a single zip file. Just pack it up and start working:

.../shrdlite.zip

The main files are shrdlite.html and shrdlite.js. They need the following things to work correctly:

  • world.js: A Javascript file describing the initial world. It defines two constants, world and blocks. It is deliberately kept simple, so that it will be easy to read the file by another language than Javascript. E.g., Python can read the file directly as it is.
  • Shrdlite{Lex}{Eng,Sem}.gf: A GF grammar that recognizes all sentences mentioned above. The main grammar files are ShrdliteEng.gf and ShrdliteSem.gf.
  • parser.cgi: A simple wrapper CGI script that calls GF with the user input and the grammars.
  • jquery-1.9.1.min.js: A Javascript library that you don't have to care about, just make sure it's in the correct place.

All files above are included included in the zip file. But you also need the following two things:

  • A working GF distribution.
  • planner.cgi: The planner program, which is not included in the zip file. There are however included stubs for Python, Java and Haskell, that return a really really stupid and incorrect plan.

The URLs of the parser and planner can be changed in the file shrdlite.js. Look for the constants ParserURL and PlannerURL at the top of the file.

In the parser CGI, parser.cgi, you might want to change the path to the GF binary or the main grammar. This is done in the constant PARSER at the top of the file.

To get the parser and planner working as CGI scripts, you need to have a local web server. We will NOT help you with this, but there are several descriptions on the web on how to do this. (Hint: it's really simple!)

Additional things to do

When you have gotten Shrdlite to work well, you might want to extend it with new functionality. Here are some suggestions.

  • Add support for user questions, such as:
    • Where is the big red ball?
    • How many small blocks are there?
    • Are you holding a ball?
    • What are you holding?
    • Is there any black block to the left of a big ball?
    • How many blocks are above red squares?
  • Add support for more complex commands, such as:
    • Stack up all red blocks.
    • Build a tower of all blocks that are not green.
    • Take the block between the blue rectangle and the green ball.
  • Add support for anaphoric references and/or ellipsis:
    • Put the red ball that is under a box into it.
    • Take the square under the green one.
  • Add support for clarification questions:
    • Are you referring to the large red pyramid or the small green?
  • Add support for better system responses:
    • I moved the red block into the large box.
    • I picked up the pyramid next to the large box.
  • Make the world more complex, such as:
    • More sizes and forms; multi-coloured and textured blocks.
    • Blocks can be rotated.
    • Multiple arms.
    • One wide block can support two small, and two small (of the same height) can support one large.
    • Note that for these ideas to work, you have to change the Javascript code too!

Or try something else of your choosing – but check with your supervisor first!

Suggested reading

Here are some notes on partial-order planning, that might be useful:

http://pages.cs.wisc.edu/~dyer/cs540/notes/pop.html

Jussi Rintanen at Aalto University has a short tutorial on classical planning algorithms, that can be read as a complement to the course book:

http://users.cecs.anu.edu.au/~jussi/aaai11tutorial/

The supplied grammar is written in Grammatical Framework, a grammar system developed here at Chalmers/GU:

http://www.grammaticalframework.org/

To be able to run the system on your own computer you need to install the system. If you want to extend the grammar, or add new languages, or change the semantics, you need to read the tutorial, up to and including "lesson 4". Probably you'll want to look around the website for more reading, and example grammars.

The original Shrdlu

More information about Shrdlu can be found here.