Here comes a quick project idea. It’s meant to provide deep-fake + phishing resistance while at the same time not introduce “big bang changes” to how you operate. This allows you to retrofit a system like this into an existing legacy system.

The setting is this:

  • you run security sensitive meetings with a bunch of participants. say you run those using a laptop over (say) google meet / zoom / whatever.
  • you are remote/distributed
  • you are concerned about a participant getting pwned and getting replaced by a deep fake
  • you can’t change 100% how you operate. you are stuck with google meet / zoom / whatever

The problem:

  • you are concerned some participant is getting MITM / phished and replaced by a deep fake

Solution in a nutshell:

  • each participant has an iPhone, separate from your corporate laptop
  • iPhones are live transcribing what you say + what others say thru the google meet into text
  • they encrypt + authenticate this text and send it to a “group chat”
  • at the same time, the iphone is monitoring this group chat and detecting whether other iphones are saying in the group chat actually matches its own transcription.
  • if they are consistent, the screen of the iPhone is green. If they are not, a deep fake is detected and it should be red and complain loudly

Product notes:

  • This is invisible in case everything goes right
  • This is loud in case security issues are detected

Implementation details:

  • preferably, this is a personal iPhone separate from corporate infrastructure
  • all transcription + diarization can happen locally, say with openAI’s whisper models
  • iPhones can use secure enclaves for storing keys
  • iPhones can communicate out of band
  • the transcription match won’t be perfect, we need some sort of “approximate match”
  • of course this assumes remote-only attacker that can’t get a hold of these iPhones