Speech Samples

Five models were evaluated with Youtube videos from: VoxCeleb2 Dataset [1]:

  • AV-ConvTasNet [2]
  • Visualvoice [3]
  • AVLIT [4]
  • CTCNet [5]
  • RTFSNet

Here are some examples of interactive multimodal speech separation. You can use the mouse to
hover over the lips of a speaker to hear the separated sound.

Demo 1

Demo 2

Demo 3



