Introduction From a previous research study conducted by the authors, through qualitative interviews, it was gathered that visually impaired users often used the camera on their smartphones. One notable instance of camera use that stood out was the use of an application called Envision AI. This app provides multiple functionalities specifically designed for visually impaired users. Some of these include the ability to read any text (even handwritten) using the camera, speech based screen description of what is in the camera frame, ability to search for specific objects (e.g. chair) in the surroundings. On further investigation, we found that even though the Envision AI app provides all these functionalities, a primary aspect of using the camera framing is hard for visually impaired users. Framing for a camera is often something that sighted users take for granted mainly because of the real time (visual) feedback loop that is available to them. Multiple applications like CamScanner, Evernote, Adobe Scan etc. offer document scanning and OCR abilities, using these, a visually impaired user could technically “listen” to any document they desire. The technology exists and has been implemented in numerous instances. The primary roadblock that prevents visually impaired users from using this technology to its full extent is that they cannot frame the document within their point of interest properly. The current work around for this is to ask people nearby to assist them. The aim of this work is to give users the independence to perform this activity themselves.
Interaction Design The first point of interaction is initialization of the tracking for the frame. In order for the system to work on horizontal, vertical or angular surfaces, we decided to use an adaptive system to recognize the plane on which the document of interest is kept. This is done in a novel way by keeping the phone against the surface of the document. Once this is done, users tap anywhere on the screen to initialize tracking. At this point the orientation values of the phone on the surface of the document are recorded and subsequent values are compared to this. Along with this, at the same time, user receives vibration feedback from the system that input has been received and that tracking has been enabled. Once the user has initialized the system, they proceed to lift (or move away) the phone from the document surface. While doing this, they are provided with continuous auditory feedback about the orientation of the phone as compared to the plane of the document. Non speech auditory tones were chosen as research has shown that tonal modalities are faster and in some instances more accurate than speech modalities. Another important factor in this decision was the urgent requirement of real time feedback. This is important so that the user can make changes quickly which are only possible in a real time closed feedback system. If we give the user the ability to understand the state of the system and what steps they need to take, we can also allow some more freedom and make the system less constrained. In non-speech audio, there are multiple ways of representing information. Some of these are using pitch, loudness, timber and simultaneous audio. Pitch was chosen as prior research has shown that pitch differences are one of the most effective methods to differentiate between values [ 1 ].
Pitch (or frequency) is proportionally mapped to how far away the user is from the required position. By giving the user feedback about the distance from the required position, we essentially transform three variables (yaw, pitch and roll) into a single feedback variable (tone). While multiple tonal variations can be used, research has shown that humans are best receptive of tones from MIDI 20 to MIDI 100. Along with this it is exceptionally hard to comprehend more than two tones at once. Thus this novel data conversion technique helps make feedback data concise and more intuitive.
On initial tests, we found that the tones when playing continuously were sometimes irritating to users. To overcome this, we added the functionality of the auditory feedback being paused whenever the user taps the screen once the tracking has been initialized. Audio is paused until the user holds their tap on the screen. If for some reason the user would like to restart the process which frame tracking is enabled, there is a button on the lower side of the screen which can be pressed to reset the process. A vibration feedback is used to indicate that an app reset has occurred.
References  Peres and Lane, “AUDITORY GRAPHS: THE EFFECTS OF REDUNDANT DIMENSIONS AND DIVIDED ATTENTION.”
Team Prabodh Sakhardande Santiago Arconada Alvarez