Date: 2023-01-10 12:14 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
It's always struck me as amazing how hard speech synthesis is – or perhaps I should say, how high our standards are for what we'll accept.

The obvious demonstration is cartoons. If you want to show a video of a person doing something, nobody has any difficulty accepting even a simplistic animated line drawing in place of a photorealistic video of a live human. You might have an aesthetic preference about which you like more, but there's basically nobody who just can't watch cartoons because the people don't look realistic enough.

And yet, those cartoon characters still have to be voiced by real live human actors, because in the audio domain, we'll accept no substitutes! If you made a cartoon in which the voices were computer-synthesised, I think everyone would hate it.

Date: 2023-01-10 01:27 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
It's tempting to say that the difficult part is conveying the right emphasis and emotions, rather than just pronouncing all the words intelligibly. A human actor voicing a cartoon character has to function as an actor, after all, not just someone reading out a script any old way.

But even there, the video side is much easier than the audio side, because cartoon artists have no difficulty producing line-art facial expressions from which we can interpret emotions.

Date: 2023-01-10 03:46 pm (UTC)
bens_dad: (Default)
From: [personal profile] bens_dad
There is a point with animation where more accurate images are less acceptable; this is known as the uncanny valley effect.

Maybe with speech synthesis comprehensive speech starts in the uncanny valley ?
What is the speech equivalent of a line-drawing ?
Edited (html typo) Date: 2023-01-10 03:47 pm (UTC)

Date: 2023-01-10 10:20 pm (UTC)
foms: (Default)
From: [personal profile] foms
My understanding is that this goes back to some very early speech synthesis, too. Even when the voice was intended to be inhuman, as with the HAL 9000 computer.

Another part of this subject that has been in my thoughts is about the relative value of creators (e.g. writers) and presenters (e.g. actors) in producing memorable content. I've had some very interesting conversations (and witnessed others) about how different people perceive this. Some that a mediocre text can be made great by a great presenter and others that a mediocre presenter cannot ruin a great text but, often, neither vice versa. I have not found any universal way of looking at this.

More recently, I've been thinking about this in the context of some of the particular cadences of Youtube video presenters (and wondering how many of them are computer-generated voices) and slam poets and my own preferences for interpretation when reading aloud nonfiction versus prose fiction versus poetry.

This was probably an excuse to name-drop. I knew Lou Gerstman. His wife introduced my parents to each other and our families remained close. https://en.wikipedia.org/wiki/Louis_Gerstman.

Date: 2023-01-15 02:48 pm (UTC)
From: [personal profile] doubtingmichael
Have you read Scott McCloud's Understanding Comics? He has a theory about comics that applies to animation as well: a simplified representation of the human form can represent our own self-image, based on our internal senses (proprioception etc), rather than what we look like to other people. (He then explains that this makes it easier to identify with an abstract character, which is why some manga will use a simpler representation of the protagonist, and a more realistic representation of the antagonist. But I digress.)

I think that's one reason why simple animations work. But given speech is a much more evolutionarily recent development, and only concerned with communicating with others, it makes sense that we wouldn't have similar perceptual shortcuts available for it.

Date: 2023-01-10 12:47 pm (UTC)
naath: (Default)
From: [personal profile] naath
If I pay 4 figures for a meal the restaurant ought to be able to afford to PAY THE STAFF. I would expect the issue tobe with budget food

Date: 2023-01-10 01:35 pm (UTC)
channelpenguin: (Default)
From: [personal profile] channelpenguin
I would have thought that there is no upper limit on the price that a restaurant could charge for a unique ultra-luxury experience. IF the clientele is truly global. But then I suppose they won't fill the place every night. Hence the plan of pop-up events actually makes sense.

Date: 2023-01-10 08:42 pm (UTC)
naath: (Default)
From: [personal profile] naath
I'm cheating because that was for 3, or on a different occasion in kroner, and about half. I didn't anticipate the staff needs. I know you can pay more in some places

Date: 2023-01-10 10:51 pm (UTC)
foms: (Default)
From: [personal profile] foms
The octopus item (#8) took me immediately to one of my go-to examples for the alien points of view not corresponding to our own in one of the Hoka stories, from Poul Anderson and Gordon R. Dickson. The phrase "slimeless conformation of boned flesh" has stuck in my mind. I note that it finally comes up with a web search. It's nice when one's pet obscure examples become available for easy reference. https://tvtropes.org/pmwiki/pmwiki.php/Main/FantasticSlurs

June 2025

S M T W T F S
1 2 3 4 5 6 7
891011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 8th, 2025 02:27 am
Powered by Dreamwidth Studios