By recording keystrokes and training a deep learning model, three researchers claim to have achieved upwards of 90 percent accuracy in interpreting remote keystrokes, based on the sound profiles of individual keys.
In their paper A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards (full PDF), UK researchers Joshua Harrison, Ehsan Toreini, and Marhyam Mehrnezhad claim that the trio of ubiquitous machine learning, microphones, and video calls “present a greater threat to keyboards than ever.” Laptops, in particular, are more susceptible to having their keyboard recorded in quieter public areas, like coffee shops, libraries, or offices, the paper notes. And most laptops have uniform, non-modular keyboards, with similar acoustic profiles across models.
Previous attempts at keylogging VoIP calls, without physical access to the subject, achieved 91.7 percent top-5 accuracy over Skype in 2017 and 74.3 percent accuracy in VoIP calls in 2018. Combining the output of the keystroke interpretations with a “hidden Markov model” (HMM), which guesses at more-likely next-letter outcomes and could correct “hrllo” to “hello,” saw one prior side channel study’s accuracy jump from 72 to 95 percent—though that was an attack on dot-matrix printers. The Cornell researchers believe their paper is the first to make use of the recent sea change in neural network technology, including self-attention layers, to propagate an audio side channel attack.