Tuesday, August 14, 2012

TOWARDS A SYSTEM THAT REALLY UNDERSTANDS NATURAL LANGUAGE

The recent years have seen substantial progress in natural language processing, including practical language translation based on statistical methods and also phones with spoken input in natural language. But weaknesses of these systems are also known. The translation engine uses combinations of words and their counterparts in the other language mostly correctly but sometimes produces embarrassing errors because it doesn't try to understand the texts. And spoken input to the phones is mostly regarded as a gateway to applications already existing in the phone where understanding is restricted to the needs of each particular application.

A number of known techniques are usually listed as the required "skills" in natural language processing, like named entity recognition or co-reference resolution. However most of these techniques are regarded as imprecise tools that are inherently subject to errors. By extension, the whole field of natural language processing is considered imprecise and uncapable to make reliable conclusions or decisions on important issues.

But why? Why is it that we can make most precise statements in the field of law or science that can be interpreted by humans without such fears but cannot extend this level of precision to computer applications? My explanation is that each computer application takes only a slice of the language whereas people use the complete language. We have elaborate and nearly complete computer dictionaries, even showing relationships between words, and nearly complete grammars (probably not covering some colloquial abbreviations). But neither of them truly goes into the meaning, except for some well-described semantic relationships. In the academic tradition, one strives to describe the language as completely as possible but this cannot extend to real-world meanings because then we would have to describe everything in the world for which the language is used.

And this creates a barrier for the abilities of "rule-based" NLP. A well-developed parser can thouroughly analyze a complex sentence but produce a great number of possible interpretations which are unexpected and absurd for an understanding human reader; in addition, it spends lots of time and memory to process those spurious interpretations. And even given the whole list of possible parsings we cannot do much to select automatically the one originally intended.

Is there a way to develop an integrated approach to natural language to achieve adequate understanding of the texts? We cannot expect (at least at this time) to build a full model of the world known to humans, but we can start with smaller domains or applications. Would this be different from current phone apps? I think is must be different. If the application (probably developed before natural language interfaces) is too narrow-minded (e.g., just expects agruments for a specific function call) there is no point speaking to such an application in a natural language; menus and forms with fields would be better. The application itself must be rethought to warrant NLP use.

I recall my recent discussion with the CEO of a startup who planned to outperform latest systems like Siri. He was absolutely convinced that what a language processor should do was, first, to identify the application to be called (give him a method!) and then to extract arguments for the call. The only thing that embarrassed him was that the speaker could deviate from the expected response. To me, the problem was just to understand what was said each time. I suggested a case, for a flight reservation system, when after some discussion the customer says "In case you don't have an earlier flight I will accept this one", upon which the system would respond "We do have an earlier flight but it arrives at a different airport". I didn't see any problem understanding the meaning of the customer request by connecting the components of the request to the appropriate database entities. But for the CEO it was absolutely outlandish, and we never talked again.

What can we do for a system that really understands the language? We need to select an application domain broad enough to warrant the use of a natural language but still representable in a computer. So we are still slicing natural language but not by the types of linguistic phenomena (vocabulary, syntax, etc.). Over time, smaller domains can coalesce.

Selecting a domain and providing sample dialogs might not be an easy task, and indeed it may take as much ingenuity as developing a new human interface. In my old research (see The Prague Bulletin of Mathematical Linguistics, 65-66, pp.5-12 (1996), or http://www.math.spbu.ru/user/tseytin/mytalke.html) I used (like many other researchers) school problems about moving objects, and in trying to connect linguistic entities with mathematical models I got a number of useful insights. (Authors of school problems are often particularly inventive in squeezing complex mathematical dependencies into concise descriptions.)

We will still need a parser, but not necessarily of the usual type. Once we identify a probable syntactic entity we need to immediately check whether it makes sense in our problem domain, and so avoid spurious variants. Moreover, based on statistical techniques, we might be able to start with units bigger than words, for which we might have ready semantic counterparts. Once the parse is complete we will immediately have a meaningful object in the problem domain and we will proceed with its domain-specific processing.

So, is there anyone to pursue this course?






No comments:

Post a Comment