The capability of audio hardware in mobile devices is improving and supporting higher definition sound. In particular there has been a great improvement in recording and digitization hardware and the use of microphone arrays for stereo recording and noise cancellation. These advancements could have an impact on the feasibility of ranging and localization by mobile hardware, in particular for eavesdropping and keystroke detection. To listen in on and detect keystrokes an adversary could add malware into an app with microphone access and wait until the phone is placed near a keyboard, or leave a phone near a keyboard of the target. In the past, research has found limited possibility of keystroke snooping using mobile device microphones however the increasing use of multiple microphones on these devices creates new potential.
Liu et al. explored the question of keystroke snooping and show the feasibility of keystroke snooping with a technique that works without having trained the device or established context in advance. Prior techniques that relied on training the device to recognize the sound of typing required context specific data and so not needing this data is important because of the difficulty involved in obtaining this data. This was achieved by focusing on acoustic ranging techniques based on time-of-flight measurements, or the calculation of a source point based on the difference in time of sound detection at two fixed points.
They tested their method with a Samsung Galaxy Note 3 mobile device and a laptop with a pair of microphones; to simulate a potential future phone with improved audio sampling capability. Audio sampling capability, or the number of times per second that a device can collect and process sounds is of particular importance as it directly affects accuracy of keystroke detection. The method was tested with a Microsoft Surface keyboard, an Apple wireless Keyboard and a Razer Mechanical Keyboard to assess its effectiveness with keyboards of differing form factors and key noise levels.
A current model phone can discover passwords without training by exploiting mm-level acoustic ranging and fine-grained acoustic features. The researchers developed a method that exploits the geometrybased information and unique acoustic signatures of keystrokes to pinpoint their positions on a keyboard. The accuracy and precision of this system depends on several key factors notably the Sampling Rate, the Distance between the Microphones and the Placement of the Mobile Device.
At the 48kHz sampling rate commonly found in mobile devices at the moment it is possible to accurately identify keystrokes with over 85% accuracy or from 3 candidates at 97% accuracy. This increases to as high as 94% with a higher sampling rate of 192kHz, which is possible in future devices. This distance between microphones is important as greater distances between microphones allow greater accuracy, larger portable devices making this technique feasible. The placement of the device relative to the keyboard also is significant, with a relatively small range of locations being useful to an attacker.
The attack is currently only possible with the few mobile devices that expose stereo recording and have large microphone separation. Even at future higher audio sampling rates there is still only a moderate chance of accurately capturing a long password on the first attempt. Even so it still does allow the discovery of a small set of password candidates that can then be brute-forced. The research does highlight the difficulties of security in a sensor dense environment and raises the point that limited access to multiple microphones and higher sampling rates be prioritized.
Improving mobile hardware makes acoustic password detection possible, but not practical. Sensor dense environments require extra awareness to ensure security against snooping.