Date: 2023-01-10 12:14 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
It's always struck me as amazing how hard speech synthesis is – or perhaps I should say, how high our standards are for what we'll accept.

The obvious demonstration is cartoons. If you want to show a video of a person doing something, nobody has any difficulty accepting even a simplistic animated line drawing in place of a photorealistic video of a live human. You might have an aesthetic preference about which you like more, but there's basically nobody who just can't watch cartoons because the people don't look realistic enough.

And yet, those cartoon characters still have to be voiced by real live human actors, because in the audio domain, we'll accept no substitutes! If you made a cartoon in which the voices were computer-synthesised, I think everyone would hate it.

Date: 2023-01-10 01:27 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
It's tempting to say that the difficult part is conveying the right emphasis and emotions, rather than just pronouncing all the words intelligibly. A human actor voicing a cartoon character has to function as an actor, after all, not just someone reading out a script any old way.

But even there, the video side is much easier than the audio side, because cartoon artists have no difficulty producing line-art facial expressions from which we can interpret emotions.

Date: 2023-01-10 03:46 pm (UTC)
bens_dad: (Default)
From: [personal profile] bens_dad
There is a point with animation where more accurate images are less acceptable; this is known as the uncanny valley effect.

Maybe with speech synthesis comprehensive speech starts in the uncanny valley ?
What is the speech equivalent of a line-drawing ?
Edited (html typo) Date: 2023-01-10 03:47 pm (UTC)

Date: 2023-01-10 10:20 pm (UTC)
foms: (Default)
From: [personal profile] foms
My understanding is that this goes back to some very early speech synthesis, too. Even when the voice was intended to be inhuman, as with the HAL 9000 computer.

Another part of this subject that has been in my thoughts is about the relative value of creators (e.g. writers) and presenters (e.g. actors) in producing memorable content. I've had some very interesting conversations (and witnessed others) about how different people perceive this. Some that a mediocre text can be made great by a great presenter and others that a mediocre presenter cannot ruin a great text but, often, neither vice versa. I have not found any universal way of looking at this.

More recently, I've been thinking about this in the context of some of the particular cadences of Youtube video presenters (and wondering how many of them are computer-generated voices) and slam poets and my own preferences for interpretation when reading aloud nonfiction versus prose fiction versus poetry.

This was probably an excuse to name-drop. I knew Lou Gerstman. His wife introduced my parents to each other and our families remained close. https://en.wikipedia.org/wiki/Louis_Gerstman.

Date: 2023-01-15 02:48 pm (UTC)
From: [personal profile] doubtingmichael
Have you read Scott McCloud's Understanding Comics? He has a theory about comics that applies to animation as well: a simplified representation of the human form can represent our own self-image, based on our internal senses (proprioception etc), rather than what we look like to other people. (He then explains that this makes it easier to identify with an abstract character, which is why some manga will use a simpler representation of the protagonist, and a more realistic representation of the antagonist. But I digress.)

I think that's one reason why simple animations work. But given speech is a much more evolutionarily recent development, and only concerned with communicating with others, it makes sense that we wouldn't have similar perceptual shortcuts available for it.

June 2025

S M T W T F S
1 2 3 4 5 6 7
8 9 1011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 11th, 2025 06:48 am
Powered by Dreamwidth Studios