Skip to content
← Back to Blog

The Silent Record That Speaks The Loudest

By Marvin Hansen

When people think about security of voice dictation, they often think about securing the content. The words they spoke should be secure. However, there is a second category of data exposure that is significantly more dangerous when exposed: Metadata.

The content security conversation focuses on: Is the audio encrypted? Is it stored? Who has access? When protecting content, these are the right questions to ask. However, even if the content were perfectly protected, the metadata generated along the data transmission tells its own story that often is more valuable than the content itself.

The Origin of Metadata Mining

Because of the NSA leak by whistleblower Edward Snowden in 2013, we know that intelligence agencies predominantly analyze metadata because, in most cases, they only need to know who spoke to whom, when, and where, in order to make an assessment without even looking into the content of the conversation. The reason is simple. If law enforcement and intelligence agents deem someone a target, they can organize wiretapping and data exfiltration at a later stage. However, when searching for a particular individual or trying to infiltrate an organized crime network, knowing who spoke to whom, when, and where is more than enough to decide how to proceed.

The NSA’s Section 215 bulk telephony metadata program, revealed by Edward Snowden in 2013, collected call records on every American. NSA General Counsel Stewart Baker summarized the operational logic plainly: “metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.” Former NSA Director Michael Hayden confirmed this view, adding: “We kill people based on metadata.” New York Review of Books, 2014

The pioneering work of intelligence agencies has meanwhile been commercialized to serve repressive governments seeking to suppress journalists and human rights defenders. A global investigation by Forbidden Stories and Amnesty International identified at least 180 journalists across 20 countries selected for surveillance between 2016 and 2021. [Amnesty International, 2021] The operational logic is identical to the NSA model: metadata establishes who the journalist communicates with, which identifies both the journalist and their sources as persons of interest. Device infection with tools like Pegasus follows. Citizen Lab documents this chain explicitly, finding that such spyware gives government clients the means to monitor journalists, their sources, and the specific stories under development. [Citizen Lab, 2020] The Jamal Khashoggi assassination is the most documented consequence: his associate Omar Abdulaziz was under Pegasus surveillance in the months prior, mapping the network before the operation was executed.

Metadata as Intelligence

AI-based cloud dictation services unavoidably generate metadata that reveals critical information that is valuable regardless of whether anyone has access to the content of the dictation. Specifically, metadata such as timestamps reveal exactly when a dictation session occurred and for how long. Multiple timestamps then reveal a frequency and pattern of usage. And beyond that, device identifiers are often implicitly shared through which client of the cloud service is used, because an iOS client can only be used on an iPhone. Depending on the API configuration, it is often customary to transmit the IP address, which then reveals the country. Language and vocabulary settings indicate which language is used. Location information is not supposed to be shared, but depending on a combination of other metadata or accidentally recorded background noise, triangulating location is not impossible.

At this point, a fairly complete user profile can be derived without even knowing the content of the voice transcription. Frequent usage of voice dictation on a certain day and time presumably indicates somebody spoke to a source that was passing on important information or working on an unpublished article. A sudden spike of frequent voice dictation outside regular usage may indicate that new information is being processed.

Chief Justice Roberts in Carpenter v. United States (2018) established that metadata can provide “an intimate window into a person’s life, revealing not only his particular movements, but through them his familial, political, professional, religious, and other associations” and thus falls under the Fourth Amendment, and therefore the Government needs a warrant for access. [Case] It is doubtful that the NSA’s Section 215 ever was lawful [Just Security, 2015], but following the Snowden Leak, the mass surveillance program was eventually shutdown in late 2015 [Lawfare, 2015].

However, for journalists, lawyers, and human rights defenders in high-risk jurisdictions, the reality is vastly different because law enforcement will be on their heels and their government neither needs a warrant nor respects privacy. Here, the metadata trail becomes a tangible risk with real consequences.

The Absence of Metadata

A voice dictation tool that processes all data locally and offline generates zero metadata, leaves no traces and no trail. What does not exist cannot be found. No warrant can be issued for something that does not exist. No vendor can be subpoenaed for data that does not exist. Therefore, no usage pattern can be reconstructed, no location can be triangulated, and no user profile can be reconstructed.

The implications extend to people in safe jurisdictions. A patent lawyer working with sensitive IP prior to legal protection needs data that does not leave the device and never exits the building, so it cannot be intercepted or stolen. The journalist who writes an exclusive article presumably wants to ensure that nobody can get any hint of the story until it breaks. Medical professionals who take patient privacy seriously want to ensure that no patient information ever leaves the hospital. Researchers in private and public research institutions want to ensure that either their IP is protected, or their pre-published research is secured until publication day.

AirGap Voice dictation takes information security one step further and works guaranteed offline without any metadata by design. The fundamental design decision was to not store any data at all in AirGap Voice, therefore no metadata can ever exist. The AirGap Voice security architecture and how to verify every single security claim are detailed in the Enterprise Security Guide.

This article was dictated with AirGap Voice in 32 minutes using the precision model in batch mode and edited for clarity in 16 minutes.

Sources:

New York Review of Books, “We Kill People Based on Metadata,” May 10, 2014. https://www.nybooks.com/online/2014/05/10/we-kill-people-based-metadata/

Amnesty International, The Pegasus Project, July 2021. https://www.amnesty.org/en/latest/press-release/2021/07/the-pegasus-project/

Supreme Court Case, Carpenter v. United States 2018, Constitution Center. https://constitutioncenter.org/the-constitution/supreme-court-case-library/carpenter-v-united-states

Bill Marczak et al., Stopping the Press: New York Times Journalist Targeted by Saudi-linked Pegasus Spyware Operator, Citizen Lab Research Report No. 124, University of Toronto, January 2020. https://citizenlab.ca/research/stopping-the-press-new-york-times-journalist-targeted-by-saudi-linked-pegasus-spyware-operator/

Michael Price, Just Security, “The Legal Legacy of the NSA’s Section 215 Bulk Collection Program”, November 16, 2015. https://www.justsecurity.org/27685/legal-legacy-215-bulk-collection-program/

Cody M. Poplin, Lawfare Media, “NSA Ends Bulk Collection of Telephony Metadata under Section 215”, November 30, 2015. https://www.lawfaremedia.org/article/nsa-ends-bulk-collection-telephony-metadata-under-section-215

← Back to Blog
Designed for mission critical security