> I would like to apologize.
> I would like to apologize.
> It’s Not Wrong that “🤦🏼♂️”.length == 7 But It’s Better that “🤦🏼♂️”.len() == 17 and Rather Useless that len(“🤦🏼♂️“) == 5
> The string that contains one graphical unit consists of 5 Unicode scalar values. First, there’s a base character that means a person face palming. By default, the person would have a cartoonish yellow color. The next character is an emoji skintone modifier the changes the color of the person’s skin (and, in practice, also the color of the person’s hair). By default, the gender of the person is undefined, and e.g. Apple defaults to what they consider a male appearance and e.g. Google defaults to what they consider a female appearance. The next two scalar values pick a male-typical appearance specifically regardless of font and vendor. Instead of being an emoji-specific modifier like the skin tone, the gender specification uses an emoji-predating gender symbol (MALE SIGN) explicitly ligated using the ZERO WIDTH JOINER with the (skin-toned) face-palming person. (Whether it is a good or a bad idea that the skin tone and gender specifications use different mechanisms is out of the scope of this post.) Finally, VARIATION SELECTOR-16 makes it explicit that we want a multicolor emoji rendering instead of a monochrome dingbat rendering.
And then we move on from there, in quite some depth.
Women's Romanization for Hong Kong
> This is not to say that this type of ad hoc, spontaneous Romanization of Cantonese has not already existed for some time. Indeed, young people have been using it extensively for texting, on social media, etc. for years. What’s new is that it is now consciously being employed to out fake protesters who do not know Hong Kong Cantonese and its informal writing system.
> Probably because of something my ancestors did.
FUCT in the brain
> Scientists have found that swearing most likely originates in the right hemisphere of the brain, and within that half, in the “primitive” part of the brain, the limbic system. The right half of the brain [which] is responsible for nonpropositional or automatic speech, which includes greetings, conventional expressions such as ‘not at all,’ counting, song lyrics, and swearwords. Propositional speech—words strung together in syntactically correct forms to create an original meaning—occurs in the left hemisphere.
> But the evidence for this conclusion is weak, in my opinion.
This map shows the most commonly spoken language in every US state, excluding English and Spanish
> English is, unsurprisingly, the most commonly spoken language across the US, and Spanish is second most common in 46 states and the District of Columbia. So we excluded those two languages in the above map.
Alphabetical order in Korean
> Alphabetical order in Korean has an interesting twist I haven’t seen in any other language.
> In Korean, alphabetization is also done at the syllable level.
> So “-bachi” is now an English suffix for any food prepared live by Asians on a metal plate.
Emily Wilson on Translations and Language
> In a recent Twitter thread, Emily Wilson listed some of the difficulties of translating Homer into English. Among them: “There aren’t enough onomatopoeic words for very loud chaotic noises” (#2 on the list), “It’s very hard to come up with enough ways to describe intense desire to act that don’t connote modern psychology” (#5), and “There is no common English word of four syllables or fewer connoting ‘person particularly favored by Zeus due to high social status, and by the way this is a very normal ordinary word which is not drawing any special attention to itself whatsoever, beyond generic heroizing.’” (#7).
> Using Twitter this way is part of her effort to explain literary translation. What do translators do all day? Why can the same sentence turn out so differently depending on the translator? Why did she get stuck translating the Iliad immediately after producing a beloved translation of the Odyssey?
> She and Tyler discuss these questions and more, including why Silicon Valley loves Stoicism, whether Plato made Socrates sound smarter than he was, the future of classics education, the effect of AI on translation, how to make academia more friendly to women, whether she’d choose to ‘overlive’, and the importance of having a big Ikea desk and a huge orange cat.
> “Whaumau” is a well-formed but non-existent Māori word, which would be pronounced /faʉmaʉ/ — that is, basically the same as the English pronunciation of the internet acronym FOMO, Fear Of Missing Out. And that’s what it means.
Size Venn Diagram
The large dipper and great potatoes.
German for Programmers
> After 2 years of learning German I’ve noticed that, for the most part, you can go a long way by mapping foreign concepts to ones that you already know. In particular, I’ve had success mapping aspects of German grammar to programming concepts I use every day. After all, programmers deal with weird grammars all the time, why not take advantage of that skill?
Emoji Law 2018 Year-in-Review
> As I’ve mentioned before, I track every U.S. court opinion in Westlaw and Lexis that references “emoji” or “emoticon.” This is not a comprehensive census for several reasons, including my inability to set up alerts when a court displays the symbol without calling it an emoji or emoticon (which, in many emoji cases, aren’t even displayed in Westlaw or Lexis) and the other known skews and limits of Westlaw’s and Lexis’ case collections. Still, FWIW, I’ve posted the updated roster of cases.
Zero-shot transfer across 93 languages: Open-sourcing enhanced LASER library
> To accelerate the transfer of natural language processing (NLP) applications to many more languages, we have significantly expanded and enhanced our LASER (Language-Agnostic SEntence Representations) toolkit. We are now open-sourcing our work, making LASER the first successful exploration of massively multilingual sentence representations to be shared publicly with the NLP community. The toolkit now works with more than 90 languages, written in 28 different alphabets. LASER achieves these results by embedding all languages jointly in a single shared space (rather than having a separate model for each). We are now making the multilingual encoder and PyTorch code freely available, along with a multilingual test set for more than 100 languages.
Common Grammar Mistakes to Avoid
> 1. LESS/FEWER. This one is really embarrassing. You may think pointing out the difference between these two at every opportunity makes you CLEVER and INTERESTING. In fact, it makes you TEDIOUS.
Caduceus as a symbol of medicine
> The caduceus (☤) is the traditional symbol of Hermes and features two snakes winding around an often winged staff. It is often mistakenly used as a symbol of medicine instead of the Rod of Asclepius, especially in the United States. The two-snake caduceus design has ancient and consistent associations with trade, eloquence, negotiation, alchemy, wisdom, and controversially, thievery, lying, and the passage into the underworld.
> The modern use of the caduceus as a symbol of medicine became established in the United States in the late 19th and early 20th century as a result of documented mistakes, misunderstandings and confusion.
The good news is that both have their own unicode symbols, ⚕ and ☤.
How Google’s Autotype Contradicts Orwell’s Advice
> In a Lingua Franca post headed “Elimination of the Fittest” five years ago I poured scorn on Orwell’s insistence that you should “never use a metaphor, simile, or other figure of speech which you are used to seeing in print.” Silly, I said. There must always be some phrases that are currently the most popular. Banning them ipso facto would pointlessly whittle away the language, phrase by phrase, forever.
> I didn’t propose going to the opposite extreme and championing clichés, of course. Yet as Gmail filled in that phrase for me, I realized that it was automating exactly what Orwell recommended against. The program lies in wait for the beginning of a letter sequence that it is used to seeing in Gmail messages, and fills in the rest for your approval, constantly tempting you toward familiar phrases.
Taking shit from the chancellor
> “It generated quite a shitstorm,” she said, using the English term — because Germans, it turns out, do not have one of their own.
Everything You Wanted to Know About Emojis and the Law
> For the past couple of years, I have invested significantly in all things emojis. This post rounds up everything I’ve done during that period.
A better way to calculate pitch range
> Today’s topic is a simple solution to a complicated problem. The complicated problem is how to estimate “pitch range” in recordings of human speakers. As for the simple solution — wait and see.
> You might think that the many differences between the perceptual variable of pitch and the physical variable of fundamental frequency (“f0“) arise because perception is complicated and physics is simple. But if so, you’d be mostly wrong. The biggest problem is that physical f0 is a complex and often fundamentally incoherent concept. And even in the areas where f0 is well defined, f0 estimation (usually called “pitch tracking“) is prone to errors.