Abstract
Acknoledgement
Table of Contents
Chapter 1
Introduction
1.1 Motivation
The invention of mobile telephone around 34 years ago was a big step in mobility. It gives the users a
great degree of freedom to move around while talking on the phone. They can communicate with other
users at anytime and from anywhere. The mobile phones were not enough. The same degree of
mobility was needed for the internet users so they would not need to stick to their stationary computers
when they need to be connected to the internet. Wire Less Local Area Network (W-LAN) made it
possible to achieve a reasonable degree of mobility in a building environment. This is achieved by
installing a number of Access Points (AP) on an existing Wide Area Network (WAN) and a wireless
Internet card on the mobile device such as a laptop computer or a PDA.
There already exists a W-LAN at the ITU and also the facility of positioning or locating devices is
provided by the Ekahua Positioning Engine. These technologies made it possible to build various
interesting location based applications at the ITU. Good examples of such applications are Messaging
Services, Find Friends and so on developed by students at the ITU. A new concept which is attracting
more attention is combination of Speech with the existing location based applications running on
wireless devices. A number of application in this area have already be implemented and these projects
have shown good results both by use of IBM ViaVoice Speech Engine (a commercial product) and the
use of Speech Recognizer developed by the students at ITU developed in Java programming language.
These projects build the foundation and attract attention to do more research in this area. These
applications have proven the usability and importance of human computer interaction both for the
system control and speech dialogs.
Having all these basic technologies and my interest in this field I found it a good opportunity to
develop the existing applications to provide more freedom and mobility to the users. The Location
Based Interactive Speech System (LBISS) is planed to be fully controlled by speech.
1.2 Problem Definition
The fundamental concept of this project is to combine the speech technology with the positioning
technology and to build an application which is fully controlled by speech.
The application will be divided into client part that will be running on a mobile device such as a laptop
computer or in the future on a Personal Digital Assistant (PDA); this is because PDAs do not support
speech input at the moment. The speech recognizer engine together with the speech synthesizer engines
will be placed on the client side of the application. The synthesizer part of the speech engine will be
responsible to speak the synthesized speech to the client providing the client with the information
intended to be delivered to the client. This covers most part of the dialog between the client and the
application. In order to increase the speed and functionality of the application and to prevent errors in
the application the system will also, through the synthesized speech, be providing the key words (sort
of instructions and choices) relevant to the information the system can provide to the client. The
recognizer will then be listening to the user to say the words and recognize them. These words will be
limited to the words in the Grammar file used in the system, but will be sufficient enough to make the
dialog meaningful and complete. The number of the words in the grammar file solely depends on the
choices and selection of services that the system will be providing to the client. These recognized
words could vary from browsing commands, control commands and service request commands.
The server side, on the other hand, will be running on a stationary server. When the user connects to the
system the server side of the application will first determine the location of the user. This is done by the
application having a connection to the positioning engine. As soon as the user signs in providing the
user name and password to system, he will be authenticated by checking the registry of the registered
users. A separate XML file will be provided for registering the user. At this moment LBISS is defined
to register only staff and students of ITU, but as a real life application the categories of registered users
could be expanded. According to the location and the identity of the user the application will then start
the relevant dialog with the user. This means that the content of dialog will be filtered according to the
user’s location and identity. In this research and development phase the application will have only two
services that could, by almost no modification, be expanded to variety of services.
The first type of service will be delivering information, via speech dialog, about the services provided
in a specific location defined in LBISS (i.e. an office such as a reception, student’s administration,
computer lab and so on). At this moment, for test purposes, only Reception and Exam Office are
defined in LBISS. So any user connected and using LBISS will get relevant and pre-defined
information about these locations. This information will also be filtered by the user identity. This
means that a student registered in the LBISS will get different information than a guest who is not
registered in LBISS.
The second type of the service consists of Reminder and Event New that do not solely depend on
location. Any employee registered in LBISS would be able to set their own reminders based on time.
LBISS will be checking the time of the reminder continuously and will speak the contents of the
reminder to the user as a synthesis speech. Event new is Institution specific information, in the scope of
this LBISS ITU specific information that is intended for all the users excluding the quest. This could
replace the newsletters and group mails. More specifically news about the annual party that invites all
the students and staff of ITU could be delivered by this service of LBISS. Just after the log in phase the
LBISS will check if there is any Event Information. In case there is such information LBISS will
communicate this information. As mentioned before this is service is location independent and will be
communicated to the user as soon as he connects to the system no independent of his location. All
location dependent services or information will fall in the fist category of the services.
In this development and research phase, LBISS is defined only for two locations at ITU namely
Reception and Exam Office and only a couple of students and staff are registered, but it could, with
very little or almost no modification, be used in a variety of areas such as offices, museums,
supermarkets and schools to provide the user with audio localized information based on their current
location. The greatest advantage of this application is that the user does not have to monitor the screen
of the mobile device or take any other action to extract the information. All the users have to do is to
communicate to the system by speech through the microphone and has the freedom to move around
free hands. To content of information could very easily be changed and updated according to the area
where LBISS is used.
1.3. Contribution
Since the installation of W-LAN and Ekahua positioning engine students at the ITU developed a good
number of location based applications. These applications rang from sending SMS to finding friends,
audio streaming, multiplayer games, finding the shortest path and so on. Each application target a
specific group and explore a different aspect and usability of W-LAN and the positioning engine.
A course project “Tracking the Position of Mobile Clients by Mobile Phones” developed by myself and
Hilmi Olgun explore the use of W-LAN and the positioning engine together with Java TM 2 Micro
Edition (MJ2ME) to view the location and movements of targeted connected mobile user on the map of
the area plotted in the screen of a mobile phone. Based in the scenario of the project this application is
of particular importance in large targeted area where one would like to locate a friend of a co-worker.
The application offers a great degree of mobility and a good degree of accuracy.
Multiplayer game, audio streaming and other application mentioned above are examples of good
applications built on the W-LAN and positioning engines, but I will leave that to the reader to explore.1
Finding the shortest path is another good example of such applications where the mobile device user
provides the application with the destination name and the system will calculate and find the shortest
path to that location.
1
Description of location based applications developed by students at ITU could be found at:
http://positionserver.itu.dk:8000
In addition to the above application there have also been a number of projects where the students have
explored, experimented and developed application with combination of W-LAN, positioning engine
and Speech Technology. As and example I could talk about “User Authentication and Information
Exchange by speech and Location Information in Mobile Systems” (UA) (find the revised title)a
Master thesis by Emad El-Dean El-Akehal and “Position Dependent Communication System” (PDCS)
a Master thesis by Thomas Lynge.
In UA Emad El-Dean El-Akehal has built a recognizer in Java programming language which will
identify and authenticate the users based on there voice. This voice biometrics is excellent mean of
authentication that eliminates the chances of and opponent trying to steel ones identity. Furthermore,
UA uses its speech recognizer for browsing command and service requests. The biometrics
(identification) will again be used to filter the services to users. This means that as a result of the
identification users of different categories (i.e. registered and unregistered or authorized and
unauthorized) user will get different services according to their identity and location. UA provides two
services in this stage namely Messaging where a user can leave or get messages in a specific location in
a specific time and Find Friend where a user can track and find location of a registered friend.
The PDCS on the other hand uses the IBM ViaVoice speech engine developed by IBM. This engine
has both the recognizer and synthesizer so the users listen to and speak with the system. PDCS provides
the shortest path service. In this application the system gets the current location of the user from the
positioning server and the user is required to say (by speech) the name of the destination. The system
will calculate and find the shortest path to the destination and guide the user (by synthesis speech)
through the path and will notify the user when he arrives to the destination.
LBISS, on the other hand, uses the same technologies to build a fully speech controlled system. All the
browsing commands, control command and service requests are by accomplish by user speech. The
only single area where the user needs to use system input (keyboard) is the log in process where the
user has to enter the user name and password. LBISS provides a location based interactive information
systems where when the user arrives to predefined location LBISS will start a speech dialog with the
user. An example dialog could be as the flowing:
LBISS:
Welcome. This is reception. We provide information about Meetings and
admission. What do you require?
User:
Meeting
LBISS:
Who do you have meeting with?
User:
John
In this example dialog the system will compare the user name that is retrieved from the log in text box
against the list of meetings for John and check the date and time of the meeting. LBISS will reply to the
user whether he has a meeting or not. In the same manner LBISS will provide localized information to
the user in a fully speech dialog about any location that is defined in LBISS. The volume of dialog
between the user and LBISS depends on the amount of information provided to the system which
should be conveyed to the user. In case of reminders LBISS will authenticate the user in the log on
phase. While the user is a registered user and has an allocated reminder LBISS will fist check his
reminder time against the current time and notify the user at the exact time registered in the reminder.
To my up to date knowledge while writing this paragraph this is the fist fully speech controlled
application experimented and developed at the ITU.
In its experimental and development phase LBISS uses no security protocols to protect the user name
and particularly the password. In future work, the voice biometrics of UA could be combined with
LBISS or any encryption method could be used to ensure secure exchange of user name and password.
On the other hand if we recall the dialog above one could guess that it would be very useful if the user
is then guided to the room of the person he has meeting with. In this particular situation and in similar
situations like that as a next step PDCS could be combined with LBISS. As a result LBISS will be
authenticating the user, listening to the speaker requests and finally if the user needs to find the
required area (a particular room) the system will then use the shortest path method of PDCS. It is of
great advantage the both LBISS and PDCS use the same basic technologies and could be integrated
with very little modification.
As a closing paragraph for this section I would like to say that the combination of all these tree
projects, LBISS, UA and PDCS will result in an ideal application for any type of location based speech
communication system. It is very easy to see that, by doing so, all the aspects of a useful and realistic
location based speech application could be covered.
1.4. Report Guide
This report mainly consists of two parts:
Part 1 and part 2
Part 1
This part covers the introduction to this report and the a short but general discussion about the basic
technologies used in LBISS.
Part 2
This part is LBISS specific which covers all aspects of LBISS including its programming, specific
technologies used, tests and results and finally the conclusion.
The whole report is then divided into 6 chapters.
Chapter 1: I assume that by now you have almost finished chapter 1 of this report. The main focus of
this chapter is the motivation, problem definition and finally what would be the contribution of LBISS
to the existing research and work in this area.
Chapter 2: This chapter will walk you through the basics and fundamentals of the technologies used in
LBISS. You will find general information about these technologies and the topics would not be directly
related to the specific points of the technologies used in LBISS.
Chapter 3: This is the start of LBISS in specific words. It will go through the design steps of LBISS
and the specific points about the technologies used in LBISS and to some extend arguments on why
they have been selected to be used in LBISS.
Chapter 4: This chapter will specifically go through the steps taken in the programming part and will
explain how each component of LBISS was designed and implemented in the programming language.
The programmers in particular those who would like to develop LBISS would find this section very
interesting and useful.
Chapter 5: This chapter will include different test performed on LBISS including the source code tests,
usability tests and so on. It would also describe the results of the tests together with the areas of LBISS
that would need improvements.
Chapter 6: Finally this chapter will cover the conclusion about the work done in LBISS. And it will be
followed by the reference lists and appendices.
Chapter 2
Basic Technologies
The main technologies, as mentioned in the problem definition section, for this project are Positioning
technologies and Speech technologies. This chapter will go through the basics of these technologies
together with XML which is used here to store data required for LBISS. But prior to that I would like
to shortly explain what specific speech and positioning technologies are used in LBISS.
The LBISS is a fully speech controlled system therefore it needs both Speech Recognizer and Speech
Synthesizer. It is possible, but really demanding task, to build both Speech Synthesizer and a Speech
Recognizer with a large vocabulary that is also able to work efficiently in noisy environment. Besides
that is not the scope of this project. Instead LBISS is uses an existing speech engine, IBM ViaVoice
that has both Speech Synthesizer and a Speech Recognizer. The speech recognizer part of the engine
has a large vocabulary both in American and British English with a couple of other languages. LBISS
is configured to use American English which will fulfill all the demands of this project. To use the IBM
ViaVoice engine, the Java Speech API (JSPAI) is required. Java Speech API, a software interface is
developed by Sun Microsystems which is supported by the IBM ViaVoice that makes it possible to use
both the Speech Recognizer and Speech Synthesizer of the IBM ViaVoice. The speech synthesizer in
LBISS is also configured to work with American English. To make the speech output of the synthesizer
more like natural and human voice, LBISS uses Java Speech Mark up Language (JSML) in its default
messages.
The second technology used in LBISS is positioning. LBISS takes advantage of the existing Ekahau
Positioning Engine that is installed at the IT University where LBISS will be tested. It is possible to
Ekahau positioning engine through a Java interface makes it possible to track and give the position of a
mobile device connected to system.
The LBISS uses XML for storing data relevant to the system. SAX and DOM Parsers are used in order
to access and read the data from the XML file. SAX and DOM are defined in …… developed by
…………… give more details.
2.1 Positioning Technologies
Using Wireless technology simply means greater mobility and freedom. Of course the degree of
mobility and freedom to move around solely depends on the application and the technology being used.
The use of wireless technologies in its turn arise new concepts and give birth to new applications
particularly when the issue of positioning is the subject of the matter. Once the position of a mobile
device is determined a variety of location based applications could be built on top of the wireless
technologies. In this case the mobile user will be getting position dependant information. Thus,
positioning plays an important role in today and future wireless applications as it is an excellent filter
for providing localized information. This increases the value and relevance of information and services
and therefore will increase usage. This has a particular importance when the mobile device user (with
location based application) is moving in rich information domains for instance, supermarket, etc. Then
the position dependant application will filter and select the flow of information to the interest of the
mobile user.
Now let’s ask ourselves a simple question, what is positioning? Positioning (location detecting) is a
functionality, which detects a geographical location of any physical object, where in this context we are
interested in a mobile device that could vary from a cellular telephone to a PDA, laptop computer or
even a moving car. Among these, cellular technology, due to its wide acceptance and coverage, has
attracted significant interest in the tracking industry. It could be said that for applications with
infrequent position reporting rates and requiring voice communications, a cellular system is ideal. But
still there is possibility of building attractive application with other devices named earlier.
According to their functionalities, positioning technologies could be generally classified as outdoor and
indoor positioning [1].

Outdoor Technologies: Wide positioning system which could be using an earthbound mobile
network or a satellite navigation system. GPS is one of the biggest and internationally used and
recognized systems that are used for outdoor positioning.

Indoor Technologies: This system mostly uses the same principle with some sensors inside the
building to get the signal of the wireless device and by performing some algebraic calculation
determine the position of the wireless devices.
There are different systems performing both the indoor and outdoor positioning that could be generally
classified in the following groups:

Cellular Network-Based Methods

Space-Based Radio Navigation Systems

WLAN and Short Range Connectivity Systems
2.1.1. Cellular Network-Based Methods
Cellular Network-Based positioning methods gained a high interest in today’s technologies. It has been
implemented and running in a number of countries namely Japan, USA etc. The applications provide a
variety of services listed in previous table. Within this positioning system there is a subset of methods
such as:

Cell ID

CI + Timing advance

Time of Arrival

etc
Each system provides a different degree of accuracy in positioning. But it could be said that in this
class of positioning the accuracy of positioning generally depends on the size of the cell to which the
mobile station belongs at the time of position estimation. As an example Cell ID method of positioning
will be shortly explained here.
Cell ID
Cell ID positioning, a pure network-based method, is the oldest and simplest way to locate a mobile
station. In this technology all operators know where their cells are. Each cell has an ID and when a
mobile device is connected to a particular cell, the cell ID, then represents the location of the mobile
station. The estimated location of the mobile station is calculated as the mass center of the geographical
area covered by the cell. The positioning estimation in this method is not accurate for many location
based since the size of the cell could exceed many kilometers. In the current GSM NWs the cell size
varies typically from 200m to 35km which is the main factor in the positioning accuracy. To provide
more accurate position estimation other cellular positioning methods are used, though, in future CI may
provide an accurate way to locate the MS in environments with very small cell sizes. [1]
The following figure show an example of CI + Timing advance which provides much better position
estimation than the pure CI method. Any individual cell in this figure could be seen as a pure CI
method from where it could be clearly seen that even if the mobile device is at the extreme end of the
cell, the CI method will estimate its position as the mass center of the cell. [1]
2.1.2. Space-Based Radio Navigation Systems
Another widely used positioning technology is the Space-Based Radio Navigation System. This
technology has a global coverage and generally the positioning is done by the use of satellites.
Involving satellites, it is considered to be very expensive to start and keep running. Therefore, currently
not many systems of this kind exist. So far, there are only three Space-Based Radio Navigation
Systems namely [1]:

Global Positioning System (GPS): Developed and maintained by the US Department of
Defense which is fully operational since April 1995. The commercial receivers are also widely
available in the market.

(GALILEO): An European system that is still under construction and is planed to be
operational in 2008.

Global Navigation Satellite System (GLONASS): This is a Russian system that is operational
since January 1996, but not fully available today because of the economical problems in Russia.
We will explain the Global Positioning System in details and leave the other two systems for interested
readers.
Global Positioning System (GPS)
Global Positioning System (GPS) navigation, an all weather, worldwide precise positioning system has
been an integrated part of wireless data communication on land, in air and at sea. Starting out as an
exclusively US military navigation system in1978 it has been fully operating since 1995 even for civil
use. Small, inexpensive GPS-navigators have been available for years now.
Satellite navigation has two major advantages over its earthbound alternative: the positioning accuracy
is very high and is publicly available to anyone since no telecom operator owns it. Instead it is
controlled by Pentagon, which can possibly prove to be an even worse alternative if they decide to shut
it down. To meet this possible risk EU is working on setting up a parallel global satellite navigation
system the GALILEO.
Fundamental concept in GPS applications is that one can determine fairly accurately the location of any
device that has a GPS transceiver mounted inside the device and has a clear sight to the stars. This
determination of location is facilitated by a series of satellites.
GPS basically consists of 3 segments.

The Space segment: This part consists of specific GPS satellites launched in specific orbits
with orbital periods of 12 hours. There are currently around 24 satellites that orbit the Earth .

The Control segment: This part consists of the master control station at Colorado Springs
along with few other stations positioned around the globe to monitor and maintain the condition
and orbits of the GPS satellites.

User segment: It consists of the military and the commercial users who use the GPS receivers
to track signals from GPS satellites. These devices use the triangulation technique to find out
their position on the surface of the earth. A standalone GPS receiver is capable of giving
accuracies of about 100m.
The satellites are synchronized to send out, or transmit encoded navigational information in bit streams
which contains the satellite ID, GPS time and some other parameters. Any device equipped with a GPS
receiver will intercept these transmissions. Using a simple mathematical formula derived from
triangulation (Triangulation in the case of GPS is collecting of signals from three or more satellites in
carefully monitored orbit from which the receiver computes its own spatial relationship to each satellite
to determine its position). The fundamental task is to calculate the distance between the mobile station
and the satellite. In mathematical terms this could be expressed as the following [4]:
Pj = ( (xj – xu)2 + (yj – yu)2 + (zj – zu)2 )1/2 + ctu where
Pj is the measured range
tu is the user device time offset from the satellite
c is speed of light
(xj, yj, zj) is the coordinates of the satellite
(xu, yu, zu) is the coordinates of the user device (mobile station) and
The receiver should at least detect signal from four satellites to determine a 3-D position estimation.
Though, signals from three satellites would be enough to obtain a 2D estimation.
As mentioned before the mobile station should have a clear sight to stars. This means that GPS does
not provide good coverage in the urban area and indoors. That is because the GPS signal strength is too
low to penetrate a building [6]. In order to efficiently use GPS for indoor and urban positioning some
indoor GPS solutions are applied. Mobile phones with GPS devices are available to provide indoor
coverage in the GPS systems. Among the solutions is the use of Assisted GPS (A-GPS). In this case the
positioning system consists of a targeted wireless device with partial GPS receiver, an A-GPS server
that has a reference GPS receiver which can receive signals from the same satellites that the mobile
device receives signals from and a wireless network infrastructure that connects the mobile device and
the A-GPS server. The GPS receiver of the server has a clear sight to the stars and hence can get direct
signals from the satellites. The A-GPS server uses its reference GPS receiver to continuously gather
information related to the mobile device and then it transmits that assistance information over the
wireless network to the mobile device in order to for the mobile device to find its more accurate
position.[5]
More over, future GPS satellites will broadcast three civil signals rather than just one. For the most
part, civil use of GPS today is based on only one frequency, fL1=1575.42 MHz. In the future, two new
signals at fL2=1227.60 MHz and fL5=1175 MHz will be available. These new signals will begin to be
available starting in 2003 or so. They will provide so-called frequency diversity, which will help to
mitigate the effect of multi-path. In other words, multi-path may interfere with one signal, but is much
less likely to simultaneously degrade all three signals. In addition, the new signal at fL5 has been
designed to be more powerful and to give better performance in multi-path environments. [7].
And finally GALILEO will contribute to the indoor positing. The signals from Galileo are being
designed with use in cities and indoors as prime objectives. Galileo and GPS will be two components
of a worldwide Global Navigation Satellite System (GNSS). Taken together, they will virtually ensure
that several satellite signals are available even to users in tough environments [7].
2.1.3. WLAN and Short Range Connectivity Systems
The Internet has become main stream, with millions of households wired, though, for many accessing
the Web means staying close to a PC. But time is changing and in the very near future internet access
will look much different, transformed by the phenomenon of wireless Web. Just like the cell phones the
wireless Web promises to make Internet access available whenever and wherever we need it. And in
fact the mobile phone phenomenon has set the stage for widespread deployment of wireless Web
services. Today’s installed base of cell phones can become the most widely available platform for
mobile Internet services because enabling a mobile phone for Internet access is low cost software up
grade. For indoor applications Wireless Local Area Networks (WLAN), Short Range Connectivity
Systems (e.g. BlueTooth) and other Radio Frequency Location Systems (RFLS) offer interesting
solution. WLAN access point can be set up and identified very easily in any office, mall, airport and
any other point of interest. This technology is not very expensive because they could be offered as
additional service built on existing networks with practically no extra costs. The only remaining items
to complete the purpose of positioning would be installation of a positioning server integrated with the
W-LAN.
The task of a positioning engine in this case would be to collect and provide the location information
through the access points to any application that provides location based services. As mentioned before
many technologies both indoor and outdoor exist to provide collect such data. The GeoMode for
instance is used for outdoor to collect the location information of mobile devices.
GeoMode recently developed and successfully tried by Digital Earth System researchers demonstrates
and accuracy of less that 20 meters and can be installed on a Serving Mobile Location Center of a
Gateway Location Mobile Server or even on a GeoMode ASP Server. It ertreives the location data
which is sent from the mobile station to the base station on regular bases and turn it into useful format
for any location based service application.2
On the other hand the Ekakau positioning engine, developed by the Coplex Systems Computation
group, is used for indoor positioning. It is based on Calibration where each sampled point contains
received signal intensity (RSSI) and related map coordinate stored in an area-specific Positioning
Model and returns location coordinates (x, y, floor). The greatest advantage of Ekahau positioning
engine is that it is only software positioning server that require no hardware beyond a standard 802.11.
It requires at least a Pentium II, 128 MB RAM, 200 MB HD and works with Windows 2000/XP and
also works with other platforms such as Linux. The new TCP based Yax Protocol makes Ekahau
support any platform and programming language and hence makes Ekahau available and easy to
implement for any application.3
The only limitation that the W-LAN based positioning systems are facing is the shortage of IP
addresses in the present time. This problem will remain unsolved until IP version 6 protocol is
deployed. In the mean time it is also planed that in the future, subscribers may also have the option to
use multiple devices with a single IP address. For example, a subscriber may use four different devices
in her home office with one bay station (one “address” that goes out of the building). All the devices
talk to the bay station, the bay station transmits to the rest of the world, and she receives one monthly
bill.
2.1.4. Location Application Services
The main goal is to deliver, to the mobile terminal, specific information according to the geographical
location of the mobile device. This is normally based on knowing the position of a user with their
2
3
From: http://www.geomode.net
From: http://www.ekahua.com
mobile terminal. These services can be delivered through a wide range of devices, including wireless
phones, PDAs, laptop computer and other devices attached to other moveable items such as people,
packages and in-vehicle tracking devices and other types of mobile terminals. There is a wide range of
information that could be delivered to the end-user application. The following bullet points give a short
summary of the information according to the geographical location and needs and requirements of the
mobile user.4








Positions: Fixed locations. Expressed in terms of coordinates, positions on a map, named
places, and so on.
Events: Time-dependent incidents at one or more locations.
Distributions: The densities, frequencies, patterns, and trends of people, objects or events
within a given area(s).
Service Points: Points of service delivery. May also differ according to the interests of the
user.
Routes: Navigational information expressed in terms of coordinates, directions, named streets
and distances, landmarks, and/or other navigation aids.
Context/Overview: Maps, charts, three-dimensional scenes or other means for representing
the context and relationships between people, objects and events for a given area(s).
Transactions: Transactions for the exchange of goods, services, securities, etc. Trading
services, Financial services.
Sites: Characteristics of a given site.
2.1.5. Accuracy
Depending on the application and area of use accuracy of location should be paid high attention to offer
good and reliable service. For this purpose for most application to perform well according to the needs
of users a lot of research has been done and more underway to find efficient ways to determine more
accurate and precise location of the mobile user. For instance, for an indoor positioning application in a
museum a very high accuracy is required since the content of information could change in the course of
centimeters. On the other hand, for an outdoor application where the target is location of a passenger
bus on it’s way form Paris to Copenhagen an accuracy of up to 100s of meters should be acceptable.
2.1.6. Privacy
Location Dependent Services rely on the awareness of the end-user’s location in the mobile network to
provide location relevant content and services. For example, a map can be displayed on a multimedia
4
This section is taken from: Location Services for Mobile Terminals, Samuli Niiranen, DMI/ TUT, stn@cs.tut.fi
enabled mobile terminal, showing not only the actual location of the user but also that of other mobile
users. This is of great advantage in many cases where privacy is not a matter. Examples are:
At a university compass where the positioning system shows the location of classes, meeting rooms,
library, administration offices and so on
At a firm where the mobile workers need to be aware of the location of other mobile workers or
equipment in order to monitor all the processes
A museum, a very good application area but requiring a very high accuracy of ranging in centimeters,
where clients have mobile unites and the application provides information according to their location (
for example when they stand in front of a painting the application will provide the information about
the painting based on the location). The same application could be used to give tourist information in
the city. To mention is that accuracy up to 10s of meters could be acceptable for this outdoor
application. In this case use of GPS could be a good option.
On the other hand, if the networks show both the location of the user and the other mobile users, this
could not be fully acceptable in the local population. The reason is that the mobile users will never
have any privacy and secrecy because their location could always be seen by other users. And that
location has a context and gives not only give the coordinates on the geographical world but also
convey some context information. As explained before someone being in a location in a particular time
could mean performing a particular task or doing something obvious. Then a good application could be
where the subscribers can decide themselves whether they want to use Location Dependent Services or
let others know their whereabouts. Fortunately, this feature has been considered by providers in a
number of countries. AT&T Wireless in the United States, for instance, has implemented location
based applications in the mobile phones. Andre Dahan president of mobile multimedia services of
AT&T Wireless says "We have a comprehensive privacy policy and our customers' information -including their geographical location -- is theirs to share with whom they want." Perry said users of the
service could prevent being located by turning their phones off or using an "invisible" setting.
Customers can also change their lists or "revoke a friend".
**************************************************************************
2.2 Extensible Markup Language XML
sdgsdgsdgfsdf
XML is just like a simple text so it can obviously be moved between different platforms, but more
important is that XML should conform to a specification defined by World Wide Web Consortium
(W3C)5. This in fact means that XML is a standard [2].
One of the most important layers to any XML-aware application is the XML parser. An XML parser
handles the most important task of taking a raw XML document as input and making sense of the
document. The parser will make sure that the document is well formatted and valid according to the
DTD of schema if defined in the XML document. The parsed result will typically be a data structure
that can be manipulated and handled by other XML tools of Java APIs. There are no set roles which
parser to use but most often the speed of and XML parser becomes more important when the size and
complexity of XML document grows. And also it is important that the parser conforms to the XML
specification.
********************
There are many low-cost Word to XML converters available in the market, and a number of them support a twostage conversion process, first from Word into XML, and secondly from XML into HTML, using the XSLT
stylesheet language. Because XML information is structured, it is easily manipulated, so converting XML into
accessible HTML is achievable. You do have to write the XSLT stylesheet yourself, but this is a once-off task,
and allows you to clean up the initial XML mark-up when converting to HTML. The following stand-alone tools
support this two-stage process by default.
From:
Campbell, Eoin
“Maintaining accessible websites with Microsoft Word and XML” XML Europe
Conference, England, May 2003
*****************
XML Document
(Well formatted XML Document)
5
W3C specification can be found at http://www.w3.org
Any computer language in the computer science has its own format which is acceptable for the system
to interpret, read and get the desired data from. An HTML page has its own tags; when tags not in the
right order, with some exceptions, the application will not be able to display the contents in the
intended format.This applies to an XML documents too. Thought XML gives the programmers a great
degree of freedom in choosing the self-explanatory tags still the document should be well formatted
meaning that it should match the predefined XML document structure. The structure of an XML
document could also be defined by an internal source such as a DTD (Document Type Description. It is
worth mentioning that the use of a DTD is not mandatory. Instead it is used for means of good
performance. So as rule of thumb it could be said that an XML document is valid if it matches all
internal structure descriptions along with the predefined XML document structure.
*******************************************************
(XML as Database)
A database could be thought of a container for data in the sense that it is programmatically easy to
access and retrieve the data from. In practice all the data in the database is set in named tables and each
entry with an ID. These references will then make is easy to access the data easily through some
database query methods. As a matter of fact all files contain data in them and so does an XML
document. An XML file is much similar to a database in the sense that it contains data that is placed in
nodes and attributes. Each node and attribute has a programmer-defined-self-explanatory name. As an
added advantage the data in an XML document could also be viewed as three or graph format. On the
other hand, due to parsing and text conversions, the access to data in an XML document is slower
compared to a database.
From
XML and Databases
Copyright 1999-2003 by Ronald Bourret
Last updated July, 2003
http://www.rpbourret.com/xml/XMLAndDatabases.htm#isxmladatabase
*************************************************************
To access data in a database we first need to make a connection to the database through the driver the
user is using.
try {
con = DriverManger.getConnection(“database address”);
}
...
Once this connection made then the some queries could be send to the data base to request specific
data. An example database query could be:
...
String query = "select Meetings, John" +”from Reception where
Meeting_with = ‘Abdul Azim Saleh’”;
Statement stmt = con.createStatement();
ResultSet rs = stmt.executeQuery(query);
...
This query will search for the desired table named Reception in the database and retrieve the desired
data.
On the other hand, XML document is treated just as a simple file by JAVA. To read or access an XML
file we need no to open the stream to the specified XML document in the right path.
...
File xmlFile = new File(“ReceptionServices.xml”);
...
Once the file is accessible other methods are called to access and retrieve the desired data from the
XML document. Another alternative for accessing different parts (nodes and attributes) is the use of
XPath.
From IBM website
Developers works> xml zone
Article name: Building XML Application Step 2
Generating XML from a Data Store
**********************
XPath as could be seen from its name is to access and address nodes and attributes of an XML
document in a hierarchical order and provides some facilities to manipulate Strings, Numbers and
Booleans. In addition XPath provides facilities to check whether or not a specific node or attribute
matches some predefined patterns. Same as DOM (Document Object Model) and SAX tree the XPath,
too, models an XML document in a tree structure of nodes. XPath can fully differ between different
types of nodes (Element node, attribute node and text nodes) and it reflects this information in the
XPath tree model. XPath also supports XML Namespaces and thus models both the Namespace and the
local part of the node name.
from http://www.w3.org/TR/xpath see this page for more information
******************************
SAX
SAX or SAX parser is a free standard which is defined by a Java implementation. It is among one of
the most used programming interfaces used in XML related applications. SAX, basically, is a set of
interfaces that through it well-known methods make it possible to process a document (a file or a
heretical tree) and act according to the predefined methods in the application when it meets or reaches
the application specific start tags, attributes and definitions. In other words, a programmer can get to
and retrieve data from any specific part of a document and hence pass this data to any other part of the
application or write to a file stream for further use. The freedom in the defining the tags and attributes
of XML file are very useful in this content. That is, a programmer could name any nodes of the same
structure with differently which makes it possible to access and retrieve the data from each node
separately.
From ( White Paper by Bird Step Technologies, Norway September 13 2002
http://www.birdstep.com/collaterals/rdmxwhitepaper.pdf )
*********************************** un formatted text******************
XML is the meta language defined by the World Wide Web Consortium (W3C) that can be used to describe a broad
range of hierarchical mark up languages. It is a set of rules, guidelines, and conventions for describing structured data in
a plain text, editable file. Using a text format instead of a binary format allows the programmer or even an end user to
look at or utilize the data without relying on the program that produced it. However the primary producer and consumer
of XML data is the computer program and not the end-user.
Like HTML, XML makes use of tags and attributes. Tags are words bracketed by the ’<’ and ’>’ characters and
attributes are strings of the form ’name="value"’ that are inside of tags. While HTML specifies what each tag and
attribute means, as well as their presentation attributes in a browser, XML uses tags only to delimit pieces of data and
leaves the interpretation of the data to the application that uses it. In other words, XML defines only the structure of the
document and does not define any of the presentation semantics of that document.
Development of XML started in 1996 leading to a W3C Recommendation in February of 1998. However, the technology
is not entirely new. It is based on SGML (Standard Generalized Markup Language) which was developed in the
early 1980’s and became an ISO standard in 1986. SGML has been widely used for large documentation projects and
there is a large community that has experience working with SGML. The designers of XML took the best parts of
SGML, used their experience as a guide and produced a technology that is just as powerful as SGML, but much simpler
and easier to use.
XML-based documents can be used in a wide variety of applications including vertical markets, e-commerce, businesstobusiness communication, and enterprise application messaging.
From:
Java API for XML Processing
Rajiv Mordani
James Duncan Davidson
Scott Boag (Lotus)
Sun Microsystems, Inc.
901 San Antonio Road
Palo Alto CA 94303 USA
650 960-1300
November 18, 2000
**************************************************************************
In order to present your data (Excel file, Access database etc.) to the user in an attractive way in browser, mobile phone or PDF
format, the original data must first be converted to the necessary XML formats. XML documents and data can be available in
almost any structure whatsoever. Everyone can define his or her own XML document structure and describe it. However, a
standard database table (Excel spreadsheet, Oracle table/view) is a plane, two-dimensional representation of data content. When
you convert your data source table to XML format you get a plane column-row shaped XML output.
2.3 Speech Technologies
Keyboards remain the most popular input device for desktop computers. However,
performing input efficiently on a small mobile device is more challenging. This need
continues to motivate innovators. Speech interaction on mobile devices has gained in
currency over recent years, to the point now where a significant proportion of mobile
include some form of speech recognition. The value proposition for speech interaction
is clear: it is the most natural human modality, can be performed while mobile and is
hands-free.
grammar file
A grammar defines or lists the set of words, phrases or sentences that the recognizer of a speech
application is expecting the user to say. Moreover, the grammar defines the patterns in which these
words, phrases and sentences should be said. The grammar is defined in a special format called Java
Speech Grammar Format. This file should be provided to the application where in its turn would be
read and activated by the speech application. Each set of words, phrases or sentences are put in rules. A
single rule has the words that the user is expected to say. The recognizer part of the application will
listen to only the words defined in the rule and all other words and background noises will be ignored.
A rule could be declared as public in which case it could also be accessed and used by other
applications. The names of the rules should be unique in any grammar file meaning that no rule name
should repeat. The number of words and phrases in a rule and the number of rule in a grammar file is
limited only to the size of the dictionary in the Engine which could reach more than 60,000 words. As a
rule of thumb one could say that the larger the grammar the slower and more error porn the recognizer.
This is clear from the fact that for a large size of grammar file the speech application would compare
the audio input with all the words in the grammar until it finds the match and recognize the word. So it
is strongly recommended to keep the grammar short and thin.
A grammar file, same as an XML document, should be well formatted and valid, though the system
will not check for validation of the grammar file as is done in case of XML document. The format and
structure of a grammar file is exactly the same as that of XML document. This will be quite
understandable once we know that there is an built-in XML processor in the speech recognizer engine
which will process the input speech and the grammar so the XML shape format is a must for a
grammar file to ease the work of the recognizer.
A grammar file has a heading which declares the version of the grammar. The heading is followed by
import statements in case external grammars are used. And then comes the body of the grammar
where the rules are defined. Following is an example of a grammar file with a single public rule.
grammar javax.speech.demo;
public <sentence> = hello world | good morning |
hello mighty computer;
Feedback and error correction in speech recognizer
In everyday human interaction a lot of information is conveyed by bogy language, facial expressions or
pauses. Computers on the other hand are blindfolded in this case. An in fact pauses could cause of
errors and miss understanding. If the client says something and the computer is slow in the process of
recognition the client my repeat the message again resulting in recognition errors.
The error to a great extend be handled by some mechanisms of reporting the error messages. For
instance, if the recognizer fails to recognizer a word or a sentence the system should not repeat the
same error message. In other words, the system could provide progressive assistance. For example for
the first error in recognition the system could say “what?” If the error repeats again the error message
could be “Please rephrase your words” or something to make the user repeat his message in different
words. A much easier mechanism could be to make “yes and no” prompts and leave the grammar file
short.
Another factor which will contribute to the error handling and elimination is a Natural Dialog Study
and the Usability Study. The Natural Dialog Study should be considered prior to designing the
recognizer application. Doing so will enable the programmer to have a good grasp of the type of the
conversation expected from the user and produce a grammar containing words and phrases that the user
will naturally say. It will be much error porn if the grammar has unusual or irrelevant words. The
Usability Test should, on the other hand, be carried out once part of the application is produced to
check if the words are easy for the speech engine to recognize. This would also include avoiding use of
vocabulary too close in pronunciation. A good example of such vocabulary could be “to”, “too” or “2”
or vocabulary that is to difficult to pronounce which will cause the user to have pauses or wrong
pronunciation.
IBM Speech Engine
Application users and application builder for long had the fascination to have machines that speak and
understand human speech. This has now become a reality with advances in speech technology.
Applications now use speech to enhance the users experience and ease the use of applications. The
main parts
In Java Speech API the term “Speech Engine” refers to a system which deals with speech input and
speech output. In fact, both speech recognizer and speech synthesizer are instances of the speech
engine. The same could be said about Speaker Verification and Speaker Identification systems. Java
Speech API provides a complete implementation of all classes and interfaces used to facilitate the
speech engine functionality. All together could be implemented in a fully software or combination of
software and hardware for use.
Part 2
Chapter 3
Application Design
Chapter 4
Implementation
Chapter 5
Testes and Results
Chapter 6
Conclusion
References
[1]
Samuli Niiranen; “Location Services for Mobile Terminals”; Tampere University of
Technology, Department of Information Technology, Finland
[2]
Brett McLaughlin; “Java & XML” Second Edition; August 2001
[3]
Michael Bieg, Tobias Zimmer and Christian Decker; “A location model for
communicating and processing of context”; TecO, University of Karlsruhe
Vincenz_Priessnitz.Str, 76131 Karlsruhe, Germany
[4]
Ng Ping Chung; “Positioning of Mobile Devices”, Oresund Summer University, August
2003
[5]
DJUKNIC, G. M. AND RICHTON, R.E.; “Geolocation and Assisted GPS” IEEE
Computer, 34, 2, pp. 123-125”; 2001
[6]
CHEN, G. AND KOTZ, D; “A Survey of Context-Aware Mobile Computing Researc”
Dartmouth, Computer Science Technical Report TR2000-381. 2000
[7]
Andrew Chou, Wallace Mann, Anant Sahai, Jesse Stone and Ben Van Roy “Improving
GPS Coverage and Continuity: Indoors and Downtown” Per Enge, Stanford University
Rod Fan and Anil Tiwari, @Road Inc., Enuvis Inc.
Appendices
The idea of machines that speak and understand human speech has long been a fascination of
application users and application builders. With advances in speech technology, this concept has now
become a reality. Research projects have evolved and refined speech technology, making it feasible to
develop applications that use speech technology to enhance the user's experience. There are two main
speech technology concepts -- speech synthesis and speech recognition.
The Java Speech API makes only one assumption about the implementation of a JSAPI engine: that it
provides a true implementation of the Java classes and interfaces defined by the API. In supporting
those classes and interfaces, an engine may completely software-based or may be a combination of
software and hardware. The engine may be local to the client computer or remotely operating on a
server. The engine may be written entirely as Java software or may be a combination of Java software
and native code.
The basic processes for using a speech engine in an application are as follows.
1. Identify the application's functional requirements for an engine (e.g, language or dictation
capability).
2. Locate and create an engine that meets those functional requirements.
3. Allocate the resources for the engine.
4. Set up the engine.
5. Begin operation of the engine - technically, resume it.
6. Use the engine
7. Deallocate the resources of the engine.
4.2
Properties of a Speech Engine
Applications are responsible for determining their functional requirements for a speech synthesizer and/or speech
recognizer. For example, an application might determine that it needs a dictation recognizer for the local language
or a speech synthesizer for Korean with a female voice. Applications are also responsible for determining
behavior when there is no speech engine available with the required features. Based on specific functional
requirements, a speech engine can be selected, created, and started. This section explains how the features of a
speech engine are used in engine selection, and how those features are handled in Java software.
Functional requirements are handled in applications as engine selection properties. Each installed speech
synthesizer and speech recognizer is defined by a set of properties. An installed engine may have one or many
modes of operation, each defined by a unique set of properties, and encapsulated in a mode descriptor object.
The basic engine properties are defined in the EngineModeDesc class. Additional specific properties for speech
recognizers and synthesizers are defined by the RecognizerModeDesc and SynthesizerModeDesc classes
that are contained in the javax.speech.recognition and javax.speech.synthesis packages
respectively.
*****************From Java speech Programmers guide***********************
Human centered
• Computer
- human interaction is currently focused on the
computer (computer-centric)
Currently computers know little about their environment
Where are we?
Who is using me?
Is the user still there?
• Evolving Environment
awareness
Give computers senses via sensors
Environment
User identity and presence
• You wear your own personal
user interface
interface can be consistent across all appliances
not because each appliance supports the interface, but because the user’s
own interface provides consistency
• Make the human the focus of the computer’s interaction
(human-centric)
References
From
XML and Databases
Copyright 1999-2003 by Ronald Bourret
Last updated July, 2003
http://www.rpbourret.com/xml/XMLAndDatabases.htm#isxmladatabase
3.8
For More Information (References)
The following sources provide additional information on speech user interface design.

Fraser, N.M. and G.N. Gilbert, "Simulating Speech Systems," Computer Speech and
Language, Vol. 5, Academic Press Limited, 1991.

Raman, T.V. Auditory User Interfaces: Towards the Speaking Computer. Kluwer
Academic Publishers, Boston, MA, 1997.

Roe, D.B. and N.M. Wilpon, editors. Voice Communication Between Humans and
Machines. National Academy Press, Washington D.C., 1994.

Schmandt, C. Voice Communication with Computers: Conversational Systems . Van
Nostrand Reinhold, New York, 1994.

Yankelovich, N, G.A. Levow, and M. Marx, "Designing SpeechActs: Issues in Speech
User Interfaces," CHI '95 Conference on Human Factors in Computing Systems,
Denver, CO, May 7-11, 1995.
Using Speech Recognition with
Microsoft English Query
Ed Hess
Speech recognition is a rapidly maturing technology. It's a natural complement
to English Query, a package that lets you query a SQL Server database using
natural language.
My job is to help developers design GUIs from the point of view of the people who will use the
software. I'm currently doing research on how speech recognition can enhance the job
performance of users in a health care setting.
Speech recognition offers certain users the best way to interact with a computer and promises
to be the dominant form of human-computer interaction in the near future. The Gartner Group
predicts that by 2002, speech recognition and visual browsing capabilities will be integrated into
mainstream operating systems. According to a recent survey of more than a thousand chief
executives in health care organizations by Deloitte & Touche Consulting, 40% planned to use
speech recognition within two years. Recent advances in software speech recognition engines and
hardware performance are accelerating the development and acceptance of the technology.
Microsoft® invested $45 million in Lernout and Hauspie (http://www.lhs.com) in 1997 to
accelerate the growth of speech recognition in Microsoft products. Both IBM/Lotus and Corel are
delivering to the market application suites that feature speech recognition.
Most people are familiar with speech recognition applications based on dictation grammars, also
known as continuous speech recognition. These applications require a large commitment from
the user, who has to spend time training the computer and learning to speak in a consistent
manner to assure a high degree of accuracy. This is too much of a commitment for the average
user, who just wants to sit down and start using a product. Users of this technology tend to be
those who must use it or are highly motivated to get it working for some other reason, like
people with various physical disabilities. However, there are other forms of speech recognition
based on different grammars. These grammars represent short-run solutions that can be used by
more general audiences.
Grammars
A grammar defines the words or phrases that an application can recognize. Speech recognition
is based on grammars. An application can perform speech recognition by using three different
types of grammars: context-free, dictation, and limited-domain. Each type of grammar uses a
different strategy for narrowing the set of sentences it will recognize. Context-free grammar uses
rules that predict the next words that might possibly follow the word just spoken, reducing the
number of candidates to evaluate in order to make recognition easier. Dictation grammar defines
a context for the speaker by identifying the subject of the dictation, the expected language style,
and the dictation that's already been performed. Limited-domain grammar does not provide strict
syntax structures, but does provide a set of words to recognize. Limited-domain grammar is a
hybrid between a context-free grammar and a full dictation grammar.
Each grammar has its advantages and disadvantages. Context-free grammars offer a high
degree of accuracy with little or no training required and mainstream PC requirements. Their
drawback is that they cannot be used for data entry, except from a list of predefined phrases.
They do offer a way to begin offering speech capabilities in products without making large
demands on users before they understand the benefits of speech recognition. They represent an
ideal entry point to begin rolling this technology out to a general audience. You can achieve up to
97% recognition accuracy by implementing commands and very small grammars.
Dictation grammars require a much larger investment in time and money for most people to be
able to use in any practical way. They deliver speech recognition solutions to the marketplace for
those who need them now. Lernout and Hauspie's Clinical Reporter lets physicians use speech
recognition to enter clinical notes into a database, then calculates their level of federal
compliance. Speech recognition is an excellent fit for clinicians, who are accustomed to dictating
patient information and then having transcriptionists type that data into a computer. The
feedback from early adopter audiences is helping to accelerate the development of usable speech
recognition interfaces.
None of the current speech recognition vendors are achieving greater than 95% accuracy with
general English dictation grammars. That translates to one mistake for every 20 words, which is
probably not acceptable to most people. The problem is further magnified when a user verbally
corrects something and their correction is not recognized. Most users will not tolerate this and
will give up on the technology. If a more limited dictation grammar is used, levels of accuracy
over 95% can be achieved with a motivated user willing to put in months of effort.
Limited-domain grammars represent a way to increase speech recognition accuracy and
flexibility in certain situations without placing large demands on users. An application might use a
limited-domain grammar for the following purposes:

Command and control that uses natural language processing to interpret the meaning of
the commands

Forms data entry in which the scope of the vocabulary is known ahead of time

Text entry in which the scope of the vocabulary is known ahead of time
This type of grammar could be an interim step between context-free and dictation grammarbased applications.
English Query
I had been working with Microsoft Agent (http://www.microsoft.com/msagent) for a couple of
months before I saw Adam Blum's presentation on Microsoft English Query at Web TechEd. His
session inspired me to try hooking speech recognition up to English Query to find information in a
SQL Server™ database. I'd been showing around a speech-based Microsoft Agent demo, and
many people asked if I could somehow keep the speech recognition, but make the animated
character interface optional. Because I wanted to research that while still being able to use
different types of speech recognition grammar, I started looking into the Microsoft Speech API
(SAPI) SDK version 4.0, which is available at http://www.research.microsoft.com/research/srg/.
English Query has two components: the domain editor and the engine. The English Query
domain editor (mseqdev.exe) creates an English Query application. An English Query application
is a program that lets you retrieve information from a SQL Server database using plain English
rather than a formal query language like SQL. For example, you can ask, "How many cars were
sold in Pennsylvania last year?" instead of using the following SQL statements:
SELECT sum(Orders.Quantity) from Orders, Parts
WHERE Orders.State='PA'
and Datepart(Orders.Purchase_Date,'Year')='1998'
and Parts.PartName='cars'
and Orders.Part_ID=Parts.Part_ID
An English Query application accepts English commands, statements, and questions as input and
determines their meaning. It then writes and executes a database query in SQL and formats the
answer.
You create an English Query application by defining domain knowledge and compiling it into a
file that can be deployed to the user. More information about how to build English Query
applications can be found in the article "Add Natural Language Search Capabilities to Your Site
with English Query," by Adam Blum (MIND, April 1998). English Query was delivered with SQL
Server Version 6.5 Enterprise Edition, and is also part of SQL Server Version 7.0.
The English Query engine uses the application to translate English queries into SQL. The
Microsoft English Query engine is a COM automation object with no user interface. However, four
samples included with English Query provide a convenient UI for Internet, client-based, middletier, or server-based applications.
You must install the domain editor to build an English Query application. However, to use an
existing English Query application with a client user interface, you need only install the engine.
The English Query engine generates SQL for Microsoft SQL Server 6.5 or later; these queries may
generate errors on other databases, such as Microsoft Access.
With a patient orders database as a starting point, I went through the typical steps of creating
an English Query application (see Figure 1). I'll skip the details of setting up the English Query
application for my database. Since it's only a prototype, I just set up a couple of entities
(patients and orders) and minimal relationships between entities ("patients have orders").
Figure 1: Query Steps
I started with the sample Visual Basic-based query application that comes with English Query
and modified it to point to my application:
Global Const strDefaultConnectionFile = _
"D:\Program Files\Microsoft English Query\patient.eqc"
I then modified the connection string in the InitializeDB function to point to my SQL Server
database:
Set g_objrdocn = objEnv.OpenConnection("patientdata", , , "uid=sa;pwd=;dsn=Patient")
My English Query application then looked like what you see in Figure 2.
Figure 2: An English Query App
The next step was to add the Direct Speech Recognition ActiveX® control (xlisten.dll) to my
Visual Basic Toolbox. The control comes with the SAPI 4.0 SDK, so you will need to download and
install that first. After I added the control to my form, I set its Visible property to False and
added the following code to Form_Load:
On Error GoTo ErrorMessage
engine = DirectSR1.Find("MfgName=Microsoft;Grammars=1")
DirectSR1.Select engine
DirectSR1.GrammarFromFile App.Path + "\patient.txt"
DirectSR1.Activate
GoTo NoError
ErrorMessage:
MsgBox "Unable to initialize speech
recognition engine. Make sure an engine that
supports speech recognition is installed."
End
NoError:
The patient.txt file referenced in the DirectSR1.GrammarFromFile method contains my
grammar, or list of recognized voice commands. I wanted to make my demo as bulletproof as
possible, and I have found context-free grammars to be the most reliable. Because a contextfree grammar allows a speech recognition engine to reduce the number of recognized words to a
predefined list, high levels of recognition can be achieved in a speaker-independent environment.
Context-free grammars work great with no voice training, cheap microphones, and average
CPUs. (This demo should work fine on a Pentium 150 MMX notebook with a built-in microphone.)
The demo could be made even more powerful by using dictation grammars, voice training, more
powerful CPUs, and better microphones, but I wanted to make as few demands as possible on
the user and remain speaker-independent.
My grammar file (patient.txt) looks like this:
[Grammar]
langid=1033
type=cfg
[<start>]
<start>=... orders for patient <Digits>
<start>=submit "submit"
<start>=show sequel "show SQL"
<start>=close "close"
<start>=exit "exit"
<start>=... patient has the most orders "most"
Langid=1033 means the application's language is English type=cfg means this uses a contextfree grammar. The <start> tags define each of the recognized voice commands. The first
command translates to any words (...) followed by the phrase "orders for patient" followed by a
list of digits. <Digits> is a built-in item in the direct speech recognition control (DirectSR1),
which recognizes a series of single digits. If I can command my app to "Show the orders for
patient 1051762," they appear like magic (see Figure 3). In the commands after the orders
command, the words before the quotes are the values for the phrase object and the words in
quotes are values for the parsed object.
The SAPI SDK comes with a tool called the Speech Recognition Grammar Compiler for compiling
and testing your grammars with different speech recognition engines. The compiler lives under
the Tools menu after you install the SDK.
After you speak a phrase and a defined time period has passed, the event shown in Figure 4 is
fired. All of the Case options are based on the value of the parsed object that's captured after
each voice command and corresponds to buttons on the form. The ctrlQuestion object is my rich
text field. Normally, the user types their query here, but in this case they can be entered by
voice. The ctrlSubmit_Click submits the ctrlQuestion.SelText to the English Query application and
the results are immediately displayed by a DBGrid object.
Programming for the Future
I recently downloaded the Speech Control Panel from the Microsoft Agent Downloads Web site
at http://msdn.microsoft.com/msagent/agentdl.asp. The Speech Control Panel enables you to list
the compatible speech recognition and text-to-speech engines installed on your system, and to
view and customize their settings. When you install the file, it adds a speech icon to your Control
Panel. Note that this application will only install on Windows® 95, Windows 98, and Windows NT®
4.0-based systems.
I've suggested to Microsoft that the program allow the user to pick a default speech recognition
engine and TTS engine through this panel. If you could then programmatically pull a user's
choice out of their registry with SAPI, you could code it once and never change it. This would give
users more flexibility in their use of speech-enabled software. For example, a user might already
be using a product from, say, Dragon for their speech recognition engine. If they wanted to
continue using that engine and their training profiles, SAPI could allow that if it were defined as
the default speech recognition engine in the registry.
Summary
The combination of speech recognition and English Query represents a powerful way for a user
to access information in a SQL Server database very quickly. For users who work in an
environment where speed and ease of access are critical, it holds enormous promise for future
applications. As hardware continues to become more powerful and cheaper, speech recognition
should continue to become more accurate and useful to increasingly wider audiences.
See the sidebar "English Query Semantic Modeling Format".
From the June 1999 issue of Microsoft Internet Developer. Get it at your local newsstand, or
better yet, subscribe.
(c) 1999 Microsoft Corporation. All rights reserved. Terms of Use.
From : http://www.microsoft.com/mind/0699/equery/equery.asp
Search
for:
within
Use + - ( ) " "
All of dW
Search help
IBM home
IBM developerWorks
|
Products & services
|
Support & downloads
|
My account
> XML zone
Building an XML Application, Step 2:
Generating XML from a Data Store
Digital Earth Systems researchers, working with statistical modeling experts, have successfully completed field trials of the
GeoMode™ Positioning Engine with demonstrated accuracy of less than 20 meters (FCC 67% threshold specification). This
technology is the most commercially feasible approach to accurate wireless location. GeoMode can be installed on a Serving
Mobile Location Center (SMLC) or Gateway Location Mobile Center (GMLC) or a GeoMode ASP Server external to the
wireless network. The GeoMode GMLC and the ASP Server both require a SIM STK application.
GeoMode Positioning Engine And How It Works
In a GSM and similar digital systems, the network data between every Mobile Station
(MS) and all available Base Stations (BTS) is measured at sub-second intervals
(determined by the operator) and reported to the Base Station Controller (BSC) to
facilitate handover. The GeoMode Positioning Engine retrieves this existing network
data (i.e. LAC, BSIC, BCCH, RxLEV, RxLEV_Nbor [1-6] etc.) from the BSC Network
Management Report (NMR). Depending on the wireless operator network configuration,
this data is accessed via the GeoMode SMLC in the network, or via the SIM STK
application . Additionally, this data can be collected directly from the 'A' interfaces
using analyzing probes and collection software. There is no need for any costly network
transmitters or infrastructure upgrades and therefore GeoMode is less costly and less
complex to implement and manage.
The GeoMode Positioning Engine uses a unique location positioning process based on
advanced statistical modeling techniques and patented algorithms applied to the
subscriber MS data and network propagation data models. These data models are created
from existing network data parameters and data output from network planning tools. The
result is a consistant and accurate record of MS positions across the complete coverage
area.
Implementation Options
GeoMode can be implemented either within the wireless operator's network or external to the network
with an independent data center or application service provider (ASP).
The network measurement results (NMR) data is the only data required by GeoMode. The NMR for all
MS subscribers is available (up-link) at the BSC from where it can be accessed by an SMLC, or this
same data is available (down-link) from the BST directly via a SIM STK application on the
handset, where it can be sent to a GMLC or ASP. Therefore, the configuration and location of the
GeoMode server can vary depending on the network provider and the operator preferences.
Wireless Network APIs Available
• GSMMAP • UMTS MAP •ANSI 41
• CAMEL • WIN • JSTD 36
• BSSAP + • BSSAP-LE • BSSMAP
• BSSAP • IS –634 • CDGIOS
These API products can be used to provide several interface options for wireless operators.
From : http://www.geomode.net/pages/3/
Put this picture in the assisted GPS section the paragraph is referenced and this picture is
from the same article so no need to reference it