Interesting Links for 06-09-2019
Sep. 6th, 2019 01:13 pm- Sinn Fein open to Westminster electoral pact with other pro-Remain parties to challenge the DUP
- (tags:Ireland politics NorthernIreland UK Europe wtf )
- A rambling post on the meaning of words, including "homophobia". With footnotes and everything.
- (tags:LGBT homophobia libdem language )
- Human speech may have a universal transmission rate: 39 bits per second
- (tags:language information viaSwampers )
- Why "You Got This!" Feels Like A Threat
- (tags:failure life support )
- Conversion therapy leader for 2 decades, McKrae Game disavows movement he helped fuel (and comes out as gay)
- (tags:lgbt )
- Brexit: 200,000 people register to vote in 72 hours - most of them under 35
- (tags:uk europe voting elections )
- Houses are assets not goods: What the difference between bulbs and flowers tells us about the housing market
- (tags:housing rental economics )
- Public underestimates threat of climate crisis and plastic pollution
- (tags:misunderstanding science globalwarming pollution plastic )
no subject
Date: 2019-09-07 12:16 am (UTC)no subject
Date: 2019-09-07 03:11 pm (UTC)and then they multiplied that by the syllables per second (from reading the language aloud).
no subject
Date: 2019-09-08 11:01 pm (UTC)no subject
Date: 2019-09-09 09:30 am (UTC)(This is from my understanding of some basic Information Theory) https://en.wikipedia.org/wiki/Information_theory
Human speech isn't binary - but lots of things can be encoded in binary.
If you allow 32 options, for instance, (26 characters + .,;:'") then that can be encoded in 5 bits. 64 options (upper and lower case, plus another 6 punctuation marks) would be 6 bits. For the full 128 options of ASCII you need 7 bits: http://www.asciitable.com/
So the question is how many "options" you have. From that you can work out how many bits you need to encode them.
Now, from the article, Japanese has 6,949 syllables. So you'd think you'd need nearly 13 bits to cover them all(2^13 is 8,192). But there's a lot of redundancy in language. And lots of rules which limit the options you have. (Using English for examples, as that's the only language I speak) There aren't any words which have 8 vowels in a row, for instance. And if your word already consists of the letters "UOIAUA" then the only letter that can go on the end is "I" - so once you hit 6 letters you already know the word, and (for the sake of a computer) you don't need the seventh letter. So you can compress the data, and use less bits to store that word.
Similarly, if I type "Enjoy" the only words I can possibly mean are enjoy, enjoyable, enjoyableness, enjoyably, enjoyed, enjoyer, enjoyers, enjoying, enjoyingly, enjoyment, enjoyments, or enjoys. That's 12 options, encodable in less than 4 bits, despite consisting of a wide variety of letters and lengths.
So by looking at the actual variation that exists in the language, and the various combinations that are allowed, you can discern how many combinations you can get out of each of its syllables. And that turns out to be about 32 combinations for Japanese and about 128 out of English (I suspect that because English is a mishmash of languages there are more choices of how you can construct words). Or about 5 bits and 7 bits.
That make sense? If not I'll try and clarify.
no subject
Date: 2019-09-09 02:59 pm (UTC)I don't know how many distinct syllables exist in the spoken Chinese languages, but apparently, Chinese languages are monosyllabic.
What's missing is how "they calculated the information density of each language" in their 17 translations of their texts. It's not from the number of syllables uttered per second and it's not from how long it took to read the texts.
Example: several Germans read a text in German (5.5 syllables per second), it takes them an average of 30 seconds for them to express the text clearly. Information content of the text = (information transmission rate per syllable) x (number of syllables spoken) = [(universal information transmission rate of 39 units per second) รท (5.5 syllables/second)] x [(30 seconds) x (5.5 syllables/second)] = 1170 units.
That can't be right, it implies that all translations of the text will take the exact same amount of time to read. After all, they all have the exact same amount of information and information is universally transmitted at "39 bits per second".
To summarize, Italians chirp for a long time while Germans just let out a long grunt, but in the end, they've transmitted the same amount of information and at about as fast as the brain can handle it. Eureka! Let's publish an article! 9_6
Giving the result in bits/second just annoys me. They made a measurement based on something they can't measure (information density) and camouflaged it in a glib bitrate reference and that had me believing they'd actually measured something inherent in human speech, something other than syllables per second.
That article would have been more interesting and informative if it had stated that the vehicle of speech is the uttered syllable, that the syllabaries of some languages are more information rich than others and that speech rates are adjusted to convey information at roughly the same rate regardless of the language spoken. Then trotted out the methods, without any of this hipster bitrate nonsense.