Snowden documents may be considered the tipping point when the apprehensions about omnipresent government surveillance turned into confirmation. Systems where public cameras scan the faces of people walking the streets or webcams secretly recording every activity of their owner, which once where thought to exist only in movies, were confirmed to be operating in real-world. Why else do you think laptops and webcams these days come with a physical shutter?
The problem, however, goes deeper. Though such systems exist, they aren’t as efficient as they are portrayed in movies. After all, people generally don’t give perfect facial shots to cameras while walking down the street for facial recognition to do its wonders. But maybe it doesn’t have to. Maybe they have found a new means to their end- the smart home speakers like Echo and Google Home.
Voice recognition works much better than facial recognition for surveillance purposes. After all, people are generally aware of cameras and even putting on a cap can potentially throw-off the system. These speakers, however, remain quietly seated without hardly being noticed and after all, how long can a person remain without speaking? The point being, if instead of facial recognition, government agencies had tools to harness voice recognition, it would be much easier for data collection and profiling without putting off many alarms.
Second, it has been found that such voice-recognition systems do exist with NSA and it has been using them since as early as 2004, even before these smart speakers came into existence. No, we aren’t saying that NSA listens to everything you say like in Alexa or Google Assistant but just pointing out that they can if they want to.
Voice RT, as the system is called, can potentially intercept the vast amount of audio data these devices send on the cloud and profile individuals based on that. But didn’t we say NSA has been doing such interceptions for a very long time? So why this fuss now? Because the target initially used to be keywords like “terrorism” or “child pornography” but now has shifted to the voice of each individual.
Let’s face it, government agencies already can access your data by serving a court order to these companies and you wouldn’t even know because they are bound not to disclose it. And that seems perfectly fine because all they would get are the official commands you gave to the speaker as they start recording only after hearing the wake-word (okay, Google. hey, Alexa). Perhaps the only good news here is that security researchers till now have found no way to trigger those speakers without the wake-word, at least not remotely. But then, maybe those agencies don’t need to.
If they can get their hands on all your voice commands, they would have enough samples for profiling. What makes the matter worse is that both Google and Amazon keep all the user commands tied to identities, unlike Apple that creates an anonymous identifier to protect against such surveillance.
Plus, they also have no reason to store raw audio clips other than to improve their own algorithms. They could either simply use speech-to-text conversion to store transcripts or even perform such conversions on-device to keep themselves out of the loop. But then, they were just trying to create useful assistants that learned along the way and were never imagined to turn out to be such fantastic surveillance tools that they would need to protect. So, what next?
We don’t know for sure. Nobody does.