Voice Recognization for Mouse Control Using HCI

One of the most important research areas in the field of Human -Computer-Interaction (HCI) is gesture recognition as it provides a natural and intuitive way to communicate between people and machines. Voice-based HCI applications range from computer applications to virtual/augmented reality and is recently being explored in other fields. This work proposes the implementation of absolute virtual mouse based on the interpretation of voice reorganization control. The procedure is to control the mouse pointer as for the mouse movement to up/down/left/right, open the file, dragging the file. This virtual device is designed specifically as an alternative non-contact pointer for people with mobility impairments in the upper extremities. The implementation of the virtual mouse by voice control is to make HCI simplification for disabled persons especially for the person who are not having the hands and arms, and Alternative mouse cursor positioning system for laptops.


INTRODUCTION
Since HCI invention in 1964, the mouse has become the most popular input device used in computers providing full access to both computers and the power and capabilities of internet. In this work we present a noncontact absolute virtual mouse alternative specifically designed for people with mobility impairments [22]. This proposal allows cursor"s absolute screen positioning following user"s head movements and the generation of click events by detecting specific face gestures. Currently, there are some virtual alternatives to the physical mouse. Most of the non-contact alternatives use a camera to get images of the user and some image vision algorithms to convert head movements and face gestures into mouse actions: pointer movement and different clicks [14].
The differences between two consecutive images of the face of the user were used to detect eye blink and eyebrow raises and make clicks. The skin colours distribution was used to track the movement of the face of the user and move the pointer in the computer screen [10]. The face of the user was selected manually in a first image and used as a template to detect head movements in live images and move the pointer accordingly. The proposal was improved with automatic detection of the user"s face.
In the optical flow of the image was converted into cursor displacement [15]. Other noncontact alternatives were based on infrared illumination for accurate detection and localization of the user based on the reflection properties of the cornea or an attached infrared sticker reflector. This proposal allows robust and fast initial detection, fast and accurate head tracking, precise pointer control, and the generation of voluntary click actions; combined features not available in state of the art virtual head mouse devices [17].

Eye Gaze Tracking for Human Computer Interaction
This thesis researches interaction methods based on eye-tracking technology. After a discussion of the limitations of the eyes regarding accuracy and speed, including a general discussion on Fitts" law, the thesis follows three different approaches on how to utilize eye tracking for computer [8]. It also describes algorithms for reading detection. All approaches present results based on user studies conducted with prototypes developed for the purpose [4].

Developing a Voice Control System For Zigbee-Based Home Automation Networks
This paper presents the design and implementation of a voice control system for ZigBee-based home automation networks. In this system, one or more voice-recognition modules have been added to the ZigBee-based networks [9]. The recognized control messages are sent by these modules then be routed to the target device, and finally be carried out by controlling circuit. To improve the accuracy of the speech recognition control, button trigger mode, voice password trigger mode and circle recognition mode are provided to be selected, so users can choose different modes under different conditions [3].

EXISTING MODEL
We now use optical mouse and touch pads for navigating through personal computers and laptops. But if the optical mouse or the touch pad that we use experience a technical break down then there is no way we could do the navigation on the system [5]. Key board may prove to be inefficient in this case as we are highly dependent on these tools. We could operate a system whose keyboard is malfunctioning with a mouse but the other way around is really not easy. It is also not easy for elders and disabled person to use the mouse [16].

Retina Based Mouse Control (RBMC)
The paper presents a novel idea to control computer mouse cursor movement with human eyes. In this paper, a working of the product has been described as to how it helps the special people share their knowledge with the world. Exist for cursor control by making use of image processing in which light is the primary source [11]. Electro-oculography (EOG) is a new technology to sense eye signals with which the mouse cursor can be controlled. The signals captured using sensors, are first amplified, then noise is removed and then digitized, before being transferred to PC for software interfacing [12].

Eye Tracking in HCI
An eye tracker is a device that uses projection patterns and optical sensors to gather data about gaze direction or eye movements with very high accuracy. Most eye trackers are based on the fundamental principle of corneal-reflection tracking [6]. The eye gaze provides a very efficient way of pointing. We do it all the time in interaction with other humans. Eye tracking technology enables us to use our gaze in interaction with computers and machines. It's fast, intuitive and natural. Eye-gaze is an input mode which has the potential of an efficient computer interface [5].

Fig 1: Eye Tracking in HCI
Eye movement has been the focus of research in this area. Non-intrusive eyegaze tracking that allows slight head movement is addressed in this paper [6]. A small 2D mark is employed as a reference to compensate for this movement.
The iris centre has been chosen for purposes of measuring eye movement [7]. The gaze point is estimated after acquiring the eye movement data. Preliminary experimental results are given through a screen pointing application [4].

Head Tracking Driven Virtual Computer Mouse
A novel head tracking driven camera mouse system, called "hMouse", is developed for manipulating hand-free perceptual user interfaces [13]. The system consists of a robust real-time head tracker, a head pose/motion estimator, and a virtual mouse control module. For the hMouse tracker, we propose a 2D detection/tracking complementary switching strategy with an interactive loop. Based on the reliable tracking results, hMouse calculates the user's head roll, tilt, yaw, and scaling, horizontal, and vertical motion for further mouse control. Cursor position is navigated and fine tuned by calculating the relative position of tracking window in image space and the user's head tilt or yaw rotation [18]. Experimental results demonstrate that hMouse succeeds under the circumstances of user jumping, extreme movement, large degree rotation, turning around, hand/object occlusion, part face out of camera shooting region, and multi-user occlusion. It provides alternative solutions for convenient device control, which encourages the application of interactive computer games, machine guidance, robot control, and machine access for disabilities and elders [20].  Implemented on a typical PC our hMouse provides a specific virtual human interface for hand-free mouse connection. Like the general camera mouse, hMouse is composed of a virtual tracking module and a mouse control module [19]. By using a common consumer camera without calibrated lens, the system processes each frame of the captured video in real time.
User face/head is first automatically detected and tracked by a robust and reliable head tracker. The head pose and motion parameters are further estimated by analyzing visual cues [21]. With basic synchronization control and temporal smoothing, visual tracking module navigates cursor and control virtual mouse buttons using the received motion parameters. Operating system finally responses all mouse events generated by PUI.

PROPOSED SYSTEM
The proposed system is to control the mouse pointer through voice. By using this project we can control the mouse pointer by our voice signal. This project is very useful in handicapped persons. So human can be in the easily operate the mouse through voice signals.

Block Diagram Fig 2: Block diagram
This project is designed with microcontroller, microphone, and signal conditioning unit, level logic converter. The voice represents different operations such as. Micro phone is the one type of transducer which converts voice signal to electrical signal. These electrical signals are very small mill voltage signals, so it is given to signal conditioning unit which is also constructed with operational amplifier. In this circuit operational amplifier act as comparator and generate the square pulse given to microcontroller. Here the microcontroller is flash type reprogrammable microcontroller. In microcontroller we have already programmed so it receives the pulse signal from signal conditioning unit and send it to the pc with the help of the max232. In pc we can control the pointer depend upon the received value.

Keypad
A numeric keypad, or num pad for short, is the small, palm-sized, seventeen key section of a computer keyboard, usually on the very far right. The numeric keypad features digits 0 to 9, addition (+), subtraction (-), multiplication (*) and division (/) symbols, a decimal point (.) and Num Lock and Enter keys. Laptop keyboards often do not have a num pad, but may provide num pad input by holding a modifier key (typically lapelled "Fn") and operating keys on the standard keyboard [23].

LCD Display
Liquid crystal displays (LCDs) have materials which combine the properties of both liquids and crystals. Rather than having a melting point, they have a temperature range within which the molecules are almost as mobile as they would be in a liquid, but are grouped together in an ordered form similar to a crystal.

Power Supply
The power supply should be of +5V, with maximum allowable transients of 10mv. To achieve a better / suitable contrast for the display, the voltage (VL) at pin 3 should be adjusted properly. A module should not be inserted or removed from a live circuit. The ground terminal of the power supply must be isolated properly so that no voltage is induced in it. The module should be isolated from the other circuits, so that stray voltages are not induced, which could cause a flickering display [24].

Registers
The controller IC has two 8 bit registers, an instruction register (IR) and a data register (DR). The IR stores the instruction codes and address information for display data RAM (DD RAM) and character generator RAM (CG RAM). The IR can be written, but not read by the MPU. The DR temporally stores data to be written to /read from the DD RAM or CG RAM.
The data written to DR by the MPU, is automatically written to the DD RAM or CG RAM as an internal operation.

PIC 16F877
Various microcontrollers offer different kinds of memories. EEPROM, EPROM, FLASH,etc. are some of the memories of which FLASH is the most recently developed. Technology that is used in pic16F877 is flash technology, so that data is retained even when the power is switched off. Easy Programming and Erasing are other features of PIC 16F877.

RS-232C Serial Data Standard
RS-232C specifies 25 signal pins and it specifies that the DTE connector should be a male, and the DCE connector should be a female. A specific connector is not given, but the most commonly used connectors are the DB-25P male and the DB-25S female shown in figure 13-7. When you are wiring up these connectors. It is important to note the order in which the pins are numbered. The voltage levels for all RS-232C signals are as follows. A logic high, or mark, is a voltage between -3V and -15 V under load (-25 V no load). A logic low or space is a voltage between +3 V and +15 under load (+25 V no load). Voltage such as 12 V are commonly used RS-422A, RS-423A, and RS-449 A newer standard, RS -422A specifies that each signal will be sent differentially over two adjacent wires in a ribbon cable or a twisted pair of wires as shown in Figure 13-11 a. Differential signals are produced by differential line drivers such as the MC 3487 and translated back to TTL levels by differential line receivers such as the MC3486. Data rates for this standard are 10 MBd for a distance of 50 ft [1220 m].

Speech Recognition System based on HM2007
The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. This board allows you to experiment with many facets of speech recognition technology. It has 8 bit data out which can be interfaced with any microcontroller for further development. Some of interfacing applications which can be made are controlling home appliances, robotics movements, Speech Assisted technologies, Speech to text translation, and many more.

Speech Recognition
Speech recognition will become the method of choice for controlling appliances, toys, tools and computers. At its most basic level, speech controlled appliances and tools allow the user to perform parallel tasks (i.e. hands and eyes are busy elsewhere) while working with the tool or appliance. The heart of the circuit is the HM2007 speech recognition IC. The IC can recognize 20 words, each word a length of 1.92 seconds.

Using the System
The keypad and digital display are used to communicate with and program the HM2007 chip. The keypad is made up of 12 normally open momentary contact switches. When the circuit is turned on, "00" is on the digital display, the red LED (READY) is lit and the circuit waits for a command.
Press "1" (display will show "01" and the LED will turn off) on the keypad, then press the TRAIN key( the LED will turn on) to place circuit in training mode, for word one. Say the target word into the onboard microphone (near LED) clearly. The circuit signals acceptance of the voice input by blinking the LED off then on. The word (or utterance) is now identified as the "01" word. If the LED did not flash, start over by pressing "1" and then "TRAIN" key. You may continue training new words in the circuit. Press "2" then TRN to train the second word and so on. The circuit will accept and recognize up to 20 words .It is not necessary to train all word spaces. If you only require 10 target words that"s all you need to train.

Testing Recognition
Repeat a trained word into the microphone. The number of the word should be displayed on the digital display. For instance, if the word "directory" was trained as word number 20, saying the word "directory" into the microphone will cause the number 20 to be displayed.
The chip provides the following error codes. 55 = word to long 66 = word to short 77 = no match

Clearing Memory
To erase all words in memory press "99" and then "CLR". The numbers will quickly scroll by on the digital display as the memory is erased.

Changing & Erasing Words
Trained words can easily be changed by overwriting the original word. For instances suppose word six was the word "Capital" and you want to change it to the word "State". Simply retrain the word space by pressing "6" then the TRAIN key and saying the word "State" into the microphone. If one wishes to erase the word without replacing it with another word press the word number then press the CLR key. Word six is now erased.

Simulated Independent Recognition
The speech recognition system is speaker dependant, meaning that the voice that trained the system has the highest recognition accuracy. But you can simulate independent speech recognition. To make the recognition system simulate speaker independence one uses more than one word space for each target word. Now we use four word spaces per target word. Therefore we obtain four different enunciations of each target word. The word spaces 01, 02, 03 and 04 are allocated to the first target word. We continue do this for the remaining word space. For instance, the second target word will use the word spaces 05, 06, 07 and 08. We continue in this manner until all the words are programmed. If you are experimenting with speaker independence use different people when training a target word. This will enable the system to recognize different voices, inflections and enunciations of the target word. The more system resources that are allocated for independent recognition the more robust the circuit will become. If you are experimenting with designing the most robust and accurate system possible, train target words using one voice with different inflections and enunciation's of the target word.

Homonyms
Homonyms are words that sound alike. For instance the words cat, bat, sat and fat sound alike. Because of their like

Voice Security System
This circuit isn"t designed for a voice security system in a commercial application, but that should not prevent anyone from experimenting with it for that purpose. A common approach is to use three or four keywords that must be spoken and recognized in sequence in order to open a lock or allow entry.

Aural Interfaces
It"s been found that mixing visual and aural information is not effective. Products that require visual confirmation of an aural command grossly reduces efficiency. To create effective AUI products need to understand commands given in an unstructured and efficient method. The way in which people typically communicate verbally.

Speaker Dependent / Speaker Independent
Speech recognition is divided into two broad processing categories; speaker dependent and speaker independent. Speaker dependent systems are trained by the individual who will be using the system. These systems are capable of achieving a high command count and better than 95% accuracy for word recognition. The drawback to this approach is that the system only responds accurately only to the individual who trained the system. This is the most common approach employed in software for personal computers.

Recognition Style
In addition to the speaker dependent/independent classification, speech recognition also contends with the style of speech it can recognize. They are three styles of speech: isolated, connected and continuous. This is the most common speech recognition system available today. The user must pause between each word or command spoken. Connected: This is a half way point between isolated word and continuous speech recognition. It permits users to speak multiple words. The HM2007 can be set up to identify words or phrases 1.92 seconds in length. This reduces the word recognition dictionary number to 20.

More On the HM2007 Chip
The HM2007 is a CMOS voice recognition LSI circuit. The chip contains an analog front end, voice analysis, regulation, and system control functions. The chip may be used in a stand alone or CPU connected.

Application
It can be used for the handicapped persons. In Future we can use voice converter to communicate with the keyboard for commercial applications like online exams etc.

CONCLUSION
The developed system provides high efficient mouse control by voice communication in hands-free mode. The targeted use case is the environment like living room or office. The developed subsystem is integrated with PC/laptop. This led to a hands-free communication with advanced features like all mouse click operations. This speech recognition control system uses human-computer interaction to realize the virtual mouse function The experiments validate the easy and flexible control of all computer appliances for people, especially the elders and people with disabilities, such as quadriplegia patients. In future by added up this mouse control using voice reorganization module with voice to text converter application, surely it will be very useful to the quadriplegia patients. Because the user can use the virtual mouse as well as voice to text under different application like browsing, online exams, etc without using their hands. Addition to this, now we are using wired technology, i.e, RS-232 cable, the hardware module is embedded with PC/laptop. But in future that wired technology will be replaced by wireless technology such as zigbee; this system will show as highly efficient to the disable people.