RavenClaw
Yet another (please read “An improved”) dialog management
architecture for task-oriented spoken dialog systems
Presented by: Dan Bohus (dbohus@cs.cmu.edu)
Work by: Dan Bohus, Alex Rudnicky
Carnegie Mellon University, 2002
New DM Architecture: Goals
Able to handle complex, goal-directed dialogs
Easy to develop and maintain systems
Go beyond (information access systems and)
the slot-filling paradigm
Developer focuses only on dialog task
Automatically ensure a minimum set of taskindependent, conversational skills
Open to learning (hopefully both at task and
discourse levels)
Open to dynamic SDS generation
More careful, more structured code, logs, etc:
provide a robust basis for future research.
11-04-01
Modeling the cost of misunderstanding …
2
A View from far, far away
SELECT * WHERE …
Since that failed, I need you to push
button B
What’s your name ?
Can you repeat that, please ?
Suspend… Resume …
What did you just say ?
Backend
Dialog Task Specification
Conversational Skills
Core
Let the developer focus only on the dialog task spec.:
Don’t worry about misunderstandings, repeats, focus shift,
etc… merely describe (program) the task, assuming perfect
knowledge of the world
Automatically generate the conversational mechanisms
Examples
11-04-01
Modeling the cost of misunderstanding …
3
Backend
Outline
DTS
Conversational
Goals
A view from far away
Main ideas
Core
Dialog Task Specification / Execution
Conversational skills
In more detail
Dialog Task Specification / Execution
Conversational skills
11-04-01
Modeling the cost of misunderstanding …
4
Dialog Task Spec & Execution
Communicator
Welcome
AskRegistered
AskName
Login
GreetUser
Travel
GetProfile
DepartLocation
Locals
Bye
Leg1
ArriveLocation
Agencies and Microagents (for input, request, execute …)
Handle Concepts
Execution with interleaved Input Passes.
Execute the agents by top-down “planning”
Do input passes when information is required
REMEMBER: This is just the dialog task
11-04-01
Modeling the cost of misunderstanding …
5
Handling inputs
Communicator
Welcome
AskRegistered
AskName
Login
GreetUser
Travel
GetProfile
DepartLocation
Locals
Bye
Leg1
ArriveLocation
Input Pass
11-04-01
Assemble an agenda of expectations (open concepts)
Bind values from the input to the concepts
Process non-understanding (if), analyze need for focus shifts
Continue execution
Modeling the cost of misunderstanding …
6
Conversational Skills /
Mechanisms
A lot of problems in SDS generated by lack of
conversational skills. “It’s all in the little details!”
Dealing with misunderstandings
Generic channel/dialog mechanisms : repeats, focus
shift, context establishment, help, start over, etc, etc.
Timing
Even when these mechanisms are in, they lack
uniformity & consistency.
Development and maintenance are time
consuming.
11-04-01
Modeling the cost of misunderstanding …
7
Conversational Skills /
Mechanisms
More or less task independent mechanisms:
Implicit/Explicit Confirmations, Clarifications,
Disambiguation = the whole Misunderstandings problem
Context reestablishment
Timeout and Barge-in control
Back-channel absorption
Generic dialog mechanisms:
Repeat, Suspend… Resume, Help, Start over, Summarize, Undo,
Querying the system’s belief
The core takes care of these by dynamically inserting in the
task tree agencies which handle these mechanisms.
11-04-01
Modeling the cost of misunderstanding …
8
Backend
Outline
DTS
Conversational
Goals
A view from far away
Main ideas
Core
Dialog Task Specification / Execution
Conversational skills
In more detail
Dialog Task Specification / Execution
Conversational skills
11-04-01
Modeling the cost of misunderstanding …
9
Dialog Task Specification
Goal: able to handle complex domains, beyond
information access, frame-based, slot-filling
systems i.e. :
Symphony, Intelligent checklists, Navigation, Route
planning
We need a powerful enough formalism to
describe all these tasks:
11-04-01
C++ code ?
Declarative would be nice … but is it powerful enough ?
Templatized C++ code … ?
Modeling the cost of misunderstanding …
10
Dialog Task Specification
A possible more formalized approach
Tree of agents with:
Preconditions
Success Criteria
Focus Criteria (triggers)
Expressed mostly in terms of concepts
11-04-01
Data, Type (basic, struct, array)
Confidence, Availability, Ambiguousness,
Groundedness, System/User, TurnAcquired,
TurnConveyed, etc…
Modeling the cost of misunderstanding …
11
An example DTS
UserLogin: AGENCY
concepts: registered(BOOL), name(STRING), id(STRING),
profile(PROFILE), profile_found(BOOL)
achieves_when: profile || InformProfileNotFound
AskRegistered: REQUEST(registered)
grammar: {[yes]->true,[no]->false,[guest]->false}
AskName: REQUEST(name)
precond: registered==no
grammar: [user_name]
max_attemps: 2
InformGreetUser: INFORM
precond: name
AskID: REQUEST(id)
precond: registered==yes
mapping: [user_id]
DoProfileRetrieval: EXECUTE
precond: name || id
call: ABEProfile.Call >name, >id, <profile, <profile_found
InformProfileNotFound: INFORM
precond: !profile_found
Given that the baseline is 259 lines of C++ code, this is pretty good.
11-04-01
Modeling the cost of misunderstanding …
12
Can a formalism cut it ?
People have repeatedly tried formalizing
dialog … and failed
11-04-01
We’re focusing only on the task (like in
robotics/execution)
Actually, these agents are all C++ classes, so
we can backoff to code; the hope is that most
of the behaviors can be easily expressed as
above.
Modeling the cost of misunderstanding …
13
Other Ideas for DTS
4 Microagents: Inform, Request, Expect,
Execute
Provide a library of “common task” and
“common discourse” agencies
11-04-01
Frame agency
List browse agency
Choose agency
Disambiguate agency, Ground Agency, …
Etc
Modeling the cost of misunderstanding …
14
DTS execution
Agency.Execute() decides what is
executed next
Various simple policies can be implemented
11-04-01
Left-to-right (open/closed), choice, etc
But free to do more sophisticated things
(MDPs, etc) ~ learning at the task level
Modeling the cost of misunderstanding …
15
Input Pass
1. Construct an agenda of expectations
(Partially?) ordered list of concepts expected by the
system
2. Bind values/confidences to concepts
The SI <> MI spectrum can be expressed in terms of the
way the agenda is constructed and binding policies,
independent of task
3. Process non-understandings (iff) - try and detect
source and inform user:
11-04-01
Channel (SNR, clipping)
Decoding (confidence score, prosody)
Parsing ([garble])
Dialog level (POK, but no expectation)
Modeling the cost of misunderstanding …
16
Input Pass
4. Focus shifts
11-04-01
Focus shifts seem to be task dependent.
Decision to shift focus is taken by the task
(DTS)
But they also have a TI-side (sub-dialog size,
context reestablishment). Context
reestablishment is handled automatically, in
the Core (see later)
Modeling the cost of misunderstanding …
17
Backend
Outline
DTS
Conversational
Goals
A view from far away
Main ideas
Core
Dialog Task Specification / Execution
Conversational skills
In more detail
Dialog Task Specification / Execution
Conversational skills
11-04-01
Modeling the cost of misunderstanding …
18
Task-Independent, Conversational
Mechanisms
Should be transparently handled by the core; little
or no effort from the developer
However, the developer should be able to write his own
customized mechanisms if needed
Handled by inserting extra “discourse” agents on
the fly in the dialog task specification
11-04-01
Modeling the cost of misunderstanding …
19
Conversational Skills
Universal dialog mechanisms:
Repeat, Suspend… Resume, Help, Start over, Summarize, Undo,
Querying the system’s belief
The grounding / misunderstanding problems
Timing and Barge-in control
Focus Shifts, Context Establishment
Back-channel absorption
Q: To which extent can we abstract these away
from the Dialog Task ?
11-04-01
Modeling the cost of misunderstanding …
20
Repeat
Repeat (simple)
Repeat (with referents)
The DTT is adorned with a “Repeat” Agency
automatically at start-up
Which calls upon the OutputManager
Not all outputs are “repeatable” (i.e. implicit
confirms, gui, )… which ones exactly… ?
only 3%, they are mostly [summarize]
User-defined custom repeat agency
11-04-01
Modeling the cost of misunderstanding …
21
Help
DTT adorned at start-up with a help agency
Can capture and issue:
Local help (obtained from focused agent)
ExplainMore help (obtained from focused)
What can I say ?
Contextual help (obtained from main topic)
Generic help (give_me_tips)
Obtains Help prompts from the focused agent and
the main topic (defaults provided)
Default help agency can be overwritten by user
11-04-01
Modeling the cost of misunderstanding …
22
Suspend … Resume
DTT adorned with a SuspendResume
agency.
Forces a context reestablishment on the
current main topic upon resume.
Context reestablishment also happens
when focusing back after a sub-dialog
11-04-01
Can maybe construct a model for that (given
size of sub-dialog, time issues, etc)
Modeling the cost of misunderstanding …
23
Start over, Summarize, Querying
Start over:
Summarize:
DTT adorned with a Start-Over agency
DTT adorned with a Summarize agency;
prompt generated automatically, problem
shifted to NLG: can we do something corpusbased … work on automated summarization ?
Querying the system’s beliefs:
11-04-01
Still thinking… problem with the grammars…
can meaningful Phoenix grammars for “what is
[slot]” be automatically generated ?
Modeling the cost of misunderstanding …
24
Timing & barge-in control
Knowledge of barge-in location
Information on what got conveyed is fed
back to the DM, through the concepts to
the task level
11-04-01
Special agencies can take special action
based on that (I.e. List Browsing)
Can we determine what are non-barge-in-able
utterances in a TI manner ?
Modeling the cost of misunderstanding …
25
Confirmation, Clarif., Disamb.,
Misunderstandings, Grounding…
Largely unsolved in my head: this is next !
2 components:
Confidence scores on concepts
Taking the “right” decision based on those
scores:
11-04-01
Obtaining them
Updating them
Insert appropriate agencies on the fly in the dialog
task tree: opportunity for learning
What’s the set of decisions / agencies ?
How does one decide ?
Modeling the cost of misunderstanding …
26
Confidence scores
Obtaining conf. Scores : from annotator
Updating them, from different sources:
(Un)Attacked implicit/explicit confirms
Correction detection
Elapsed time ?
Domain knowledge
Priors ?
But how do you integrate all these in a
principled way ?
11-04-01
Modeling the cost of misunderstanding …
27
Mechanisms
DepartureCity = <Seattle,0.71><SF,0.29>
Implicit / Explicit confirmations
Clarifications
Did you say you were leaving from Seattle ?
Disambiguation
When do you leave from Seattle ?
So you’re leaving from Seattle… When ?
I’m sorry was that Seattle or San Francisco?
How do you decide which ?
11-04-01
Learning ?
Modeling the cost of misunderstanding …
28
Software Engineering
Provide a robust basis for future research.
Modularity
Separability between task and discourse
Separability of concepts and confidence
computations
Portability
Mutiple servers
Aggressive, structured, timed logging
11-04-01
Modeling the cost of misunderstanding …
29
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )