[225004330010] |3-1 [225004330020] |Why are there three times as many male bloggers as female ones? [225004330030] |Dave Munger runs the numbers: [225004330040] |In the aggregate, it seems clear that women are—whether actively or tacitly—discouraged from blogging about science. [225004330050] |Aside from a few superstars like Skloot (who is in such demand that she’s been on a non-stop international book tour for the better part of a year), I’ve seen little evidence to convince me otherwise. [225004330060] |Despite the fact that women are getting science PhDs in nearly the same numbers as men, they are blogging much less. [225004330070] |I even looked at the average number of posts about peer-reviewed research they had done, and again, men outpaced women by nearly 50 percent, which means men may have written as many as 80 percent of the posts on ResearchBlogging.org. [225004330080] |Even more strikingly, women may be discouraged from pursuing academic careers at all—from 1999 to 2003, 32 percent of chemistry PhDs were women, but only 18 percent of applications to tenure-track positions came from women. [225004330090] |(HT Razib Khan). [225004340010] |More On Jelinek... [225004340020] |The NYT's obit on speech recognition and computational linguistics pioneer Frederick Jelinek includes some surprising facts about the man: He was childhood buddies with Milos Foreman. [225004340030] |How cool is that? [225004340040] |It also includes the intriguing morsel that JFK helped get Jelinek's blacklisted screenplay writing wife out of Communist Czechoslovakia (if only indirectly). [225004340050] |(HT Mr. Verb) [225004350010] |pullum bait [225004350020] |Jeremy Porter decided to adapt Strunk and White's infamous Elements of Style for Tweeting here. [225004350030] |C'mon Geoffrey, you know you wanna respond..I dare ya... [225004360010] |can language affect blood flow? [225004360020] |Do languages affect blood flow in the brain differently? [225004360030] |Apparently, yes! [225004360040] |In a recent fMRI study, researchers showed that Cantonese verbs and nouns are processed in (slightly) different parts of the brain than English nouns and verbs in bilinguals. [225004360050] |The researchers used a lexical decision task to contrast the processing of English and Cantonese verbs and nouns in the brains of bilingual speakers. [225004360060] |Chinese nouns and verbs showed a largely overlapping pattern of cortical activity. [225004360070] |In contrast, English verbs activated more brain regions compared to English nouns. [225004360080] |Specifically, the processing of English verbs evoked stronger activities of left putamen, left fusiform gyrus, cerebellum, right cuneus, right middle occipital areas, and supplementary motor area. [225004360090] |The cognition of English nouns did not evoke stronger activities in any cortical regions. [225004360100] |This is truly language affecting thought, no? [225004360110] |The point of general interest to linguist is that bilingual speakers seem to process words in their two languages differently. [225004360120] |Cantonese words are processed using diffuse brain regions and English words are processed using localized regions (this is a simplified explanation of course). [225004360130] |Now, I have to admit that this is not my specialty so I am not familiar with the background literature. [225004360140] |However, as interesting as this is, I must say I have some serious questions about their methodology and underlying assumptions. [225004360150] |I [225004360160] |First, they use orthography as their base for determining the "similarity" and "complexity" of languages. [225004360170] |That is, if two languages use an alphabet, they are considered similar. [225004360180] |While they give some passing references to other linguist measures, ultimately it is orthography that they use to compare "complexity" of stimuli (their word, not mine). [225004360190] |So, they compared the mean number of strokes in a Chinese character with the number of letters in an English word to determine which was "more complex" than the other. [225004360200] |I found this weird. [225004360210] |Then they made an assumption that Cantonese words are more ambiguous with respect to parts of speech. [225004360220] |I do not klnow if this is true, but it certainly is true that English has plenty of POS ambiguity (just ask Eric Brill), so it's not obvious to me that this is a fair assumption. [225004360230] |Furthermore, they provider no evidence for this. [225004360240] |Unfortunately, they do not publish their actual sets of stimuli, so it's not possible (this morning while googling around) to look at which words they actually use, but I suspect there's plenty of ambiguity to be found in the English words. [225004360250] |Based on earlier work, they conjecture that morphological simplicity leads the brain to distribute where words are processed in the brain: [225004360260] |...a recent fMRI study examining monolingual Chinese adults in our own laboratory indicated that Chinese nouns and verbs activate a wide range of overlapping brain areas (without a significantly different network) than those reported in the English studies cited above (Li et al., 2004). [225004360270] |Relatively fewer distinctive grammatical features of nouns and verbs at the lexical level are likely to be responsible for this finding, but the question may be addressed more directly by employing bilingual individuals. [225004360280] |And the corollary should be true: the fact that English has tense and number markings means English verbs and nouns are processed ion more isolated parts of the brain. [225004360290] |This is my wording of their conjecture. [225004360300] |I may be oversimplifying just a bit, but I'm trying to wrap my head around the underlying claim. [225004360310] |It's not clear to me why this would be true. [225004360320] |Next (and this may be a bit nit-picky), they judged the level of bilingual proficiency using a self-assessment questionnaire. [225004360330] |Call me a cynic, but I just don't trust people's perceptions of their own language skills. [225004360340] |Then, the researches used frequency data from really dated sources including Francis and Kuceras 1982. [225004360350] |I love F&K as much as the next guy, but in the age of the BNC, Davies's freely available 400 million word COCA, and the redonkulous Web 1T corpus of 1 trillion words (yes, 1 Trillion!), I see no reason to use resources so old. [225004360360] |Their basic conclusions are a tad confusing too. [225004360370] |They never clearly explained the connection between bilingualism and morphological complexity, imho. [225004360380] |The interplay is complicated and requires thorough discussion, which they simply did not provide. [225004360390] |When I used to teach writing to college freshmen, I always told them that their job when writing a paper was to make my job as a reader easy. [225004360400] |Explain things clearly so I don't have to work too hard to figure out what you mean. [225004360410] |These authors failed to make my job easy. [225004360420] |I had to figure things out too much for myself. [225004360430] |Ultimately, they found something interesting, I'm just not sure what it means and without more thorough linguistic vetting of their underlying assumptions, their results remain a head scratcher. [225004360440] |Chan, A., Luke, K.K., Li, G., Li, P., Weekes, B., Yip, V., &Tan, L.H. (2008). [225004360450] |Neural correlates of nouns and verbs in early bilinguals. [225004360460] |Annals of the New York Academy of Sciences, 1145, 30–40. (pdf) [225004360470] |Chan, A., Luke, K., Li, P., Yip, V., Li, G., Weekes, B., &Tan, L. (2008). [225004360480] |Neural Correlates of Nouns and Verbs in Early Bilinguals Annals of the New York Academy of Sciences, 1145 (1), 30-40 DOI: 10.1196/annals.1416.000 [225004380010] |in praise of William Yardley [225004380020] |To take Pullum's lead, I hereby praise William Yardley, NYT writer extraordinaire who actually performed a little linguistic fact checking. [225004380030] |I think he deserves to be linked to as much as possible. [225004380040] |Give him a little google-juice. [225004380050] |Pullum's coverage here, Yardley's story here. [225004390010] |Rankings? [225004390020] |The newest National Research Council PhD program ranking are out. [225004390030] |I'm not quite willing to dish out the $125 for my own personal copy. [225004390040] |Anybody care to look to see if they ranked linguistics departments? [225004400010] |bastardizing a snowclone! [225004400020] |Andrew Sullivan used this as the title of a post recently: Palinites, Latinos, Tea Partiers, Women, Oh My! [225004400030] |I love the X, and Y, and Z, oh my! snowclone as much as the next guy, but the construction has to be respected. [225004400040] |You can't just add a fourth member of the list all willy nilly! [225004400050] |There are rules!! [225004410010] |what recession? [225004410020] |Apparently there is no recession in the machine translation market. [225004410030] |Systran just posted no fewer than 8 openings for computational linguists!! [225004410040] |See here for 7 here for 1-2 senior positions. [225004410050] |Too bad Systran (and every other MT company) gave up hiring real linguists after the dot com bubble burst. [225004410060] |But alas, lots of NLPers should be happy. [225004410070] |Work! [225004410080] |Work! [225004410090] |Work! [225004420010] |do boys need more language help than girls? [225004420020] |No. [225004420030] |UPDATE: Much thanks to Dorothy Bishop, Professor of Developmental Neuropsychology, Department of Experimental Psychology, University of Oxford for emailing me a copy of the original paper. [225004420040] |I am reading it now and hope to post a more substantive review of the actual article later. [225004420050] |For now, I've added just a few points in orange below. [225004420060] |But that's the conclusion of the anonymous journalist/stenographer from the Science Daily who wrote the recent story Building Language Skills More Critical for Boys Than Girls, Research Suggests. [225004420070] |The author states Developing language skills appears to be more important for boys than girls in helping them to develop self-control and, ultimately, succeed in school. [225004420080] |Unfortunately I cannot find the original article (citation below) freely available, so all I have to go on is the brief description from the Science Daily piece: [225004420090] |The researchers examined data on children as they aged from 1 to 3 and their mothers who participated in the National Early Head Start Research and Evaluation study. [225004420100] |As with previous research, Vallotton and Ayoub found that language skills -- specifically the building of vocabulary -- help children regulate their emotions and behavior and that boys lag behind girls in both language skills and self-regulation. [225004420110] |What was surprising, Vallotton said, was that language skills seemed so much more important to the regulation of boys' behavior. [225004420120] |While girls overall seemed to have a more natural ability to control themselves and focus, boys with a strong vocabulary showed a dramatic increase in this ability to self-regulate -- even doing as well in this regard as girls with a strong vocabulary (emphasis added). [225004420130] |I cannot speak directly to the methodology without access to the original article. [225004420140] |My guess is that there was some attempt to qualitatively correlate scores on vocabulary tests to either records of bad behavior or observed behavior. [225004420150] |I could be wrong. [225004420160] |UPDATE: They measured two linguistic features, talkativeness and vocabulary, in 120 kids aged 14 months, 24 months, and 36 months: "Mother–child dyads were videotaped at home for 10 min in a semi-tructured play task ... [225004420170] |Every vocalization by mothers and children was transcribed ... a trained observer used the Bayley Behavior Rating Scale (BBRS; Bayley, 1993) to rate the child’s ability to self-regulate. [225004420180] |Children were rated on each of seven items which included behaviors such as their ability to maintain attention on the tasks, their degree of negativity, and their adaptation to changes in testing materials." [225004420190] |But I'm skeptical about the claims in Science Daily because it strikes me as the sort of thing that would take years of studying and dozens of researchers to come to any definite conclusions about (UPDATE: I remain skeptical about the Science Daily claims, but those are distinct from the claims in the original article). [225004420200] |Yet we have just this one study. [225004420210] |It also draws a causal connection between a language skill (vocabulary) and a non-language behavior (emotion and "self-regulation"). [225004420220] |It is extremely difficult, under even the best circumstances, to do that. [225004420230] |And even when this is done, there are typically teams of neuroscientists using fMRIs and such involved. [225004420240] |I mean no disrespect to the authors of the study. [225004420250] |They are both accomplished professors of psychology, a very important and challenging field. [225004420260] |But they are not, as far as I can tell, either neuroscientists or psycholinguists. [225004420270] |The second author, Catherine Ayoub, appears to have a specialty in "Legal mental health issues with children" (see PDF here). [225004420280] |UPDATE: According to the original article, there are well established empirical methods for judging a child's "expressive language". [225004420290] |This seems to be a case of over-interpretation with the intent of building actionable policy directives. [225004420300] |I understand and sympathize with the impulse to translate scientific research into something directly useful that a teacher can implement today. [225004420310] |Look, all you have to do is help boys build their vocabulary and they will behave themselves better! [225004420320] |Unfortunately, it is rarely wise to make that leap so quickly. [225004420330] |I suspect there is no there there. [225004420340] |UPDATE: There certainly is something here. [225004420350] |I'll need more time to digest the methods and results to comment further. [225004420360] |Vallotton, C., &Ayoub, C. (2010). [225004420370] |Use your words: The role of language in the development of toddlers’ self-regulation Early Childhood Research Quarterly DOI: 10.1016/j.ecresq.2010.09.002 [225004440010] |language and thought video [225004440020] |Lera Boroditsky on Blogginheads. [225004440030] |It's a 45 minute video and my hotel connection is not playing nice, so I haven't watched it. [225004450010] |the pennebaker effect [225004450020] |Currently reading Larker &Zakolyukina "Detecting Deceptive Discussions in Conference Calls." [225004450030] |Heard about it on NPR Morning Edition. [225004450040] |They used a Naive Bayesian classifier to classify the conference call contributions of corporate CEOs and CFOs. [225004450050] |Interestingly they used Pennebaker's LWIC word/phrase classifier as a domain specific dictionary builder. [225004450060] |I'm only 6 pages in so I don'ut know anything beyond that, but my interested is piqued. [225004460010] |a linguist on broadway [225004460020] |The NYT reviews a new play centering on an historical linguist called The Language Archive. [225004460030] |Money quote: [225004460040] |“The Language Archive” does contain some bewitchingly fine speeches on the manner in which words can sometimes fail to convey the overwhelming nature of feeling and its capacity for flux. [225004460050] |In one of the best, Mary addresses the audience on the subject of the odd proximity between states of extreme emotion. [225004460060] |“Sometimes you can feel so sad, it begins to feel like happiness,” she muses. [225004460070] |“And you can be so happy that it starts to feel like grief.” [225004470010] |anatomy of internet plagiarism [225004470020] |I saw this headline on Huffington Post this morning: [225004470030] |
  • Tina Fey Dusts Off Her Sarah Palin Impression On Letterman (VIDEO): Posted: 11- 4-10 09:47 AM.
  • [225004470040] |This afternoon, I see this headline on The Daily Dish: [225004470050] |
  • Tina Fey Dusts Off Her Palin Impression: Posted 04 NOV 2010 01:47 PM
  • [225004470060] |My curiosity piqued, I followed the link to the CBS site, which has this as a caption for the video: [225004470070] |
  • What's new with Sarah Palin? [225004470080] |Tina Fey dusts off her impression for a midterm election update: no post time listed
  • [225004470090] |Is this minor? [225004470100] |Sure. [225004470110] |Is this trivial? [225004470120] |Sure. [225004470130] |It's the innerwebz, whaddaya expect! [225004480010] |dolphin gibberish [225004480020] |In truly one of the weirdest and most awesomest studies in a long time, Laura May-Collado* discovered that dolphins speak gibberish just to fuck with each other! [225004480030] |I'll give fair warning that my entire understanding of this study comes from a BBC article, so gawd knows what the facts really are, but this version is too awesome not to pass along. [225004480040] |The "facts", as I understand them, are: [225004480050] |
  • There are two species of dolphins that often swim together (big Bottlenose and small Guyana).
  • [225004480060] |
  • When swimming within Bottlenose-only groups, the big Bottlenose dolphins emit long, low frequency whistles to each other.
  • [225004480070] |
  • When swimming within Guyana-only groups, the small Guyana dolphins emit high frequency whistles to each other.
  • [225004480080] |
  • Sometimes the dolphins swim in mixed-species groups.
  • [225004480090] |
  • The big Bottlenose dolphins often harass the small Guyana dolphins (assholes).
  • [225004480100] |
  • When swimming in mixed-groups, the dolphins emit intermediate frequency whistles.
  • [225004480110] |It's that last point that is the crux of the study. [225004480120] |Why do they change their whistles when swimming in mixed groups? [225004480130] |Unfortunately, Collado's equipment was not designed to tease apart exactly which dolphins were emitting the intermediate whistles, so it's pure speculation what's going on here. [225004480140] |But one hypothesis is this: It could even be that the Guyana dolphins are attempting "to emit threatening sounds in the language of the intruder", in a bid to make the bottlenose dolphins desist, Dr May-Collado says. [225004480150] |The kids over at Language Log have discussed the phenomenon of speaking gibberish in other languages before (see here) and now it appears dolphins do the same. [225004480160] |I love it! [225004480170] |*Associate Researcher &Adjunct Professor, Universidad de Puerto Rico, Facultad de Ciencias Naturales Departmento de Biologia, University of Puerto Rico, Rio Piedras (that's a hell of a title!). [225004490010] |The Perils of Pretty Pictures [225004490020] |As is too often the case, bad NLP starts with bad linguistics. [225004490030] |The journalist and data visualization advocate David McCandless gave a TED talk recently on The beauty of data visualization which included a reference to a chart about when people in relationships break-up based on scraping “10,000 Facebook status updates for the phrases "breakup" and "broken up" (see here). [225004490040] |(image from The Daily Dish) He did not go into detail about his actual scraping technique, so it’s not clear what he actually scraped for*, but let’s assume he literally only extracted occurrences of those two constructions. [225004490050] |What’s wrong with this? [225004490060] |Well, it just seems unnatural for people to use those particular phrases to talk about breaking up. [225004490070] |Under what conditions would someone use these constructions? [225004490080] |
  • breakup = a bare NP, single token
  • [225004490090] |
  • broken up = past participle, particle verb
  • [225004490100] |I’m sure we can construct some examples, but they would be low probability, right? [225004490110] |My intuition is that the following are more likely ways of talking about a break up: [225004490120] |
  • I broke up with my boyfriend last night.
  • [225004490130] |
  • I dumped that asshole last night.
  • [225004490140] |McCandless seems distracted by the visualizations, as if they are the data. [225004490150] |They are not. [225004490160] |A visualization is only as good as the data underlying it, and I fear McCandless’ pretty charts are masking fundamentally vacuous data (like the nearly worthless Facebook data). [225004490170] |But in the TED forum, a journalist like McCandless can sell a little snake oil and convince his audience that it’s perfume. [225004490180] |I respect his point about relativizing data and I definitely think visualization is important, but it is not THE point of data. [225004490190] |This reminds me of the difference between the meaning of the term “model” in the social sciences and the hard sciences. [225004490200] |In many cases, a social science model is little more than a visualization of concepts, masking a lack of data to support it; whereas a model in the hard sciences is almost always a computational algorithm that takes in data and spits out predictions. [225004490210] |*On the image of the chart, it says the searches were for "we broke up because", but McCandless says in the talk that he scraped for the phrases breakup and broken up. [225004500010] |the death of philosophy [225004500020] |It's because of statements like this that philosophy as a profession is dead: "philosophy is not a quest for knowledge about the world, but rather a quest for understanding the conceptual scheme in terms of which we conceive of the knowledge we achieve about the world. [225004500030] |One of the rewards of doing philosophy is a clearer understanding of the way we think about ourselves and about the world we live in, not fresh facts about reality." [225004500040] |This is from an interview with Oxford philosopher Peter Hacker (see full interview here). [225004500050] |I don't really understand what he means (and there's nothing in the article that clears it up). [225004500060] |In any case, can't I make this same claim about the rewards of doing psychology, or artificial intelligence, or linguistics, or mathematics, or virtually any intellectual discipline that requires disciplined reasoning? [225004500070] |Hacker becomes downright confusing when discussing his distaste for neuroscience: [225004500080] |“Merely replacing Cartesian ethereal stuff with glutinous grey matter and leaving everything else the same will not solve any problems. [225004500090] |On the current neuroscientist’s view, it’s the brain that thinks and reasons and calculates and believes and fears and hopes. [225004500100] |In fact, it’s human beings who do all these things, not their brains and not their minds. [225004500110] |I don’t think it makes any sense to talk about the brain engaging in psychological or mental operations” (emphasis added). [225004500120] |Hacker makes a three-way distinction between human beings, brains, and minds, with nothing more than fluff to draw the distinction. [225004500130] |I happily admit that I'm pretty strongly on the meat-puppet end of the spectrum, so I see no reason to posit that there exists a thing HUMAN_BEING that is somehow magically not a function of the physical stuff that makes up the human body. [225004500140] |But more to the point, Hacker seems incapable of discussing this in a way that is easy to follow. [225004500150] |Exactly what is Hacker's HUMAN_BEING? [225004500160] |I wish I had a clearer understanding of what he means. [225004500170] |How do I objectively distinguish this from new-age hippie gibberish? [225004500180] |It sounds remarkably similar to this passage: "It doesn't require a three-dimensional descriptive identification as the totality of it's unseen dynamics can be seen everywhere, in everything. [225004500190] |Without the spirit, the physical and the mental would have no reason to exist as neither would be whole." [225004500200] |This quote is from the wise sage Shirley MacLaine. [225004500210] |I'm a reasonable adult with a graduate level education and yet I cannot follow what should be a simple interview about what this man does for a living without encountering vague claims and incoherent distinctions. [225004500220] |Am I supposed to sit through suffocatingly boring and pretensions philosophy seminars in grad school before I can come to an understanding of what Hacker means? [225004500230] |If that is true, then philosophy is dead, truly. [225004500240] |The ironical part is that Andrew Sullivan referenced this interview with the pompous heading The Hubris Of Neuroscience. [225004500250] |The only hubris I found in the interview was Hacker's. [225004510010] |buffalo syntax [225004510020] |Ugh! [225004510030] |That frikkin sentence made the HuffPo! [225004510040] |(see here) [225004520010] |what's the -o in neato? [225004520020] |Just wondering out loud how one would analyze the morphological role of the -o in neato? [225004520030] |It's a word I used near constantly when I was ten [225004520040] |Wiktionary actually has a page on this (duh, there's a wiki page for EVERYTHING!) and they list a group of words using an -o morpheme, but they don't really form a natural class: bucko, cheapo, daddy-o, kiddo, lesbo, neato, preggo, righto, sicko, wacko, whammo, wino, weirdo, yobbo. [225004520050] |I've never heard of some of these words (yobbo?), but even with those I do recognize, they do not seem to fall into the neato class. [225004520060] |The Online Etymology Dictionary claims neato's earliest recorded usage was 1968, but gives no citation. [225004520070] |My dad used to say el cheapo and I can buy the Wiktionary claim that it's a pseudo-Spanish homage (I don't know what else to call that kind of construction), but did neato form that way? [225004520080] |I have a hard time believing that daddy-o formed that way. [225004520090] |Again, the Online Etymology Dictionary claims daddy-o goes back to 1949 (from "bop talk", I love that phrase). [225004520100] |el cheapo is an interesting construction too. [225004520110] |Are there other examples where we take a foreign morpheme* and adopt it as a signifier in this way? [225004520120] |*Let's ignore the question of whether or not there really is an -o morpheme in Spanish. [225004520130] |Somewhere along the lines American English speakers believed there was and adopted it. [225004530010] |Pronouncify.com and the fictional Princeton Linguistics department [225004530020] |I spent Thursday night on a plane so I missed 30 Rock and the most linguistics oriented sit-com episode since ... since ... um ... okay, the most linguistics oriented sitcom episode EVER! [225004530030] |But thanks to the innerwebz, I have caught up on my TV addiction. [225004530040] |The set-up has Jack Donaghy being the voice of Pronouncify.com, (I BEG you to sign up, PLEASE!!!!) a website that demonstrates the correct pronunciation of all English words. [225004530050] |Apparently, when Jack was a poor undergrad at Princeton, he was hired by the "Linguistics Department" to pronounce every word in an English dictionary to preserve the correct pronunciation for generations to come. [225004530060] |But they sold his readings, and hence his voice is now the voice of Pronouncify.com (as well as the first perfect microwave...). [225004530070] |Here is as faithful a transcript of the critical dialogue as I can muster: [225004530080] |Jack: Those bastards! [225004530090] |Liz: Who bastards? [225004530100] |Jack: Part of my Princeton scholarship included work for the Linguistics department. [225004530110] |They wanted me to record every word in the dictionary to preserve the perfect American accent in case of nuclear war. [225004530120] |Well, the cold war ended, and Princeton began selling the recordings. [225004530130] |Liz: So people can just buy your voice? [225004530140] |Jack: Ohhhh, the things it's been dragged into. [225004530150] |Thomas the Tank Engine; Wu-Tang songs... [225004530160] |This must have been the glory days before the hippies took over and started "protecting" undergrads from "exploitation." [225004530170] |Whatever... [225004530180] |In any case, it's understandable that this trivial tid-bit of academic minutia blew right by most people, but it is a fact of the world we live in that Princeton University does not have a linguistics department per se. [225004530190] |They do offer an Undergraduate Program in Linguistics in which students can "pursue a Certificate in Linguistics," but this is not an official department as far as I understand it. [225004530200] |Jack, if he is the same age as the actor Alec Baldwin, would have been at Princeton in late 1970s. [225004530210] |Maybe they had a full fledged department back then, I honestly don't know. [225004530220] |You can watch the episode College at NBC, or wherever else you prefer. [225004530230] |BTW, there's an awesome ode to color perception conundrums at the end as well. [225004530240] |It's all kinda linguisticee/cog sciencee (I never know how to add the -ee morpheme?). [225004530250] |Random after-point: Near the end of Thursday's episode of Community, Dean Pelton actually utilized the Shakespearean subjunctive construction Would that X were Y... [225004530260] |He says "Would that this hoodie were a time hoodie" around the 19:20 mark (see Hamlet, would it were not so, you are my mother). [225004530270] |Just thought that was kinda awesome. [225004530280] |And not for nuthin', but if you haven't seen Tina Fey's Mark Twain Prize speech, it's a gem: HERE. [225004540010] |unaxseptable [225004540020] |kottke rants against Google's biased calculator (because it provides an answer rather than search results) then finishes with this: [225004540030] |Google! [225004540040] |This. [225004540050] |Is. [225004540060] |Un. [225004540070] |Acce. [225004540080] |Ptable! [225004540090] |Huh? [225004540100] |Does kottke have dyslexia*? [225004540110] |I get the staccato pronunciation he's representing, but that's not at all how I would say it. [225004540120] |I would say (and write): Un. [225004540130] |Ac. [225004540140] |Cept. [225004540150] |Able! [225004540160] |Even if you want to pose your own variation, can we agree that kottke's is simply not a viable analysis of the syllable structure of the word? [225004540170] |The double "c" spelling makes it a bit odd looking, but it actually helps in the analysis. [225004540180] |Each "c" represents a different sound, and each sound should go in a different syllable. [225004540190] |And how does he start the final syllable with "pt"! [225004540200] |That is unaxseptable! [225004540210] |*Dyslexics can have difficulty counting syllables. [225004560010] |94,000 language deaths! [225004560020] |History's only Emmy-nominated linguist* K. David Harrison answers questions over at The Johnson blog about language death, his favorite topic. [225004560030] |He repeats what he's been saying for the last few years about language death, and he generally makes good points; however, he says two things worth responding to: [225004560040] |
  • "The human knowledge base is eroding as we lose languages"
  • [225004560050] |
  • "...bilingualism strengthens the brain"
  • [225004560060] |The first one is a vague and complicated claim often promoted by language-deathers** and the second is a goofy metaphor (at best). [225004560070] |Let's walk through the reasons why these statements should not be a part of a serious discussion of language death: [225004560080] |The human knowledge base is eroding as we lose languages My primary critique of this claim is that it's just not clear what it really means. [225004560090] |In what way does a language uniquely encode information? [225004560100] |Harrison provides a few simple examples, mostly lexical items that show us how a particular language fore-fronted particular features to encode, and the argument is that that tells us something about that culture's perceptions of what was important to them. [225004560110] |This is probably true to some extant, but honestly, we still do not understand language well enough to truly understand what lexical features tell us about a culture. [225004560120] |This is hyperbole at best. [225004560130] |But this is NOT an argument against language death per se, it's just a fact. [225004560140] |So what if we lose some facts about a culture's perceptions of the world? [225004560150] |Let's assume there are 6000 language alive today. [225004560160] |How many have already died? [225004560170] |We don't know. [225004560180] |For a rough estimate, let's draw an analogy and ask the question, how many humans have ever lived? [225004560190] |A few years ago, the Population Reference Bureau did a "semi-scientific" guesstimate of this question and determined that less than 6% of all people who had ever lived, were still alive in 2002. [225004560200] |If we assume that languages come and go at a pace that correlates with populations, then we can assume that the current 6000 living languages are about 6% of the total number of languages that ever existed. [225004560210] |That means the total number of languages that have ever existed is around 100,000***. [225004560220] |This means we've already lost 94,000 languages that were never documented. [225004560230] |94,000 language deaths. [225004560240] |94,000 lost knowledge bases. [225004560250] |Oh, the horror, the horror! [225004560260] |Exactly how bad off should we currently be if Harrison is correct about the ill effects of language death now that we know we've lost 94,000 languages? [225004560270] |Are we really that bad off? [225004560280] |Clearly the answer is no, we're not that bad off. [225004560290] |If losing 94,000 languages has not caused grave danger to humanity, why would losing another 3,000? [225004560300] |Yes, I agree that all languages have unique linguistic properties that are worth studying in themselves. [225004560310] |But just because we find interesting data in every language does NOT mean we should stop language death per se. [225004560320] |We need a broader understanding of the system of language interaction and language evolution, otherwise stopping language death may be as irresponsible as causing language death. [225004560330] |Genetics blogger Razib Khan has made a compelling argument that "high linguistic diversity is not conducive to economic growth, social cooperation, and amity." [225004560340] |This is just one speculative claim, but at least it's a voice on the other side of this issue. [225004560350] |bilingualism strengthens the brain This is just goofy phrasing. [225004560360] |He's referencing important neurolinguistic research, so why trivialize it by using such patently absurd language? [225004560370] |*I actually don't know this to be true, definitively. **Ooooh, I'm being a little caustic there, hehe. ***This estimate is remarkably similar to the ones David Crystal discusses in his book Language Death. [225004560380] |In that book, he says anywhere from 64,000 to 140,000 is a reasonable guesstimate. [225004560390] |My 100,000 splits that damn near down the middle. [225004600020] |A new drink from the SPECULATIVE GRAMMARIAN: [225004600030] |The Psycholinguist [225004600040] |wine (any kind: color is not a dependent variable in this study) several glasses 1 stopwatch [225004600050] |Pour the wine into a glass while whining about how no one has properly modeled the process of wine pouring. [225004600060] |Observe the wine under controlled conditions for an hour. [225004600070] |Present a wordy but content-less paper to an international conference on what wine might look like in infants. [225004600080] |Rerun the analysis in a different glass in case the receptor affects the nature of the process. [225004600090] |Wait another hour. [225004600100] |Drink the wine. [225004600110] |Drink more wine. [225004600120] |Fall onto the floor drunk, bumping your head on a pipe on the way down. [225004600130] |Write an even less coherent paper on the effects of head bumping on linguistic processing. [225004600140] |Gain professorship. [225004610010] |pimp grammar [225004610020] |There's a pimp's handwritten business plan floating around the interwebz. [225004610030] |While the soundness of its basic logic cannot be denied ("Treat This Pimpin Like it's a Business" indeed), the former writing teacher in me could not help but pull out the old red pen and make a few suggestions. [225004610040] |But here's the thing, it's a fact of contemporary college education that most writing teachers are loath to outright criticize or correct their students (they're paying tuition after all). [225004610050] |You see, outside of the Ivy League, most college writing teachers are faced with whole classrooms filled with pimps like Keep It Pimpin', and they're our bread and butter (we can't all be blessed with students like the Winkelvi, can we?). [225004610060] |As a result, we are careful to word our feedback delicately, so as not to offend the senses of the ones who pad our, admittedly thin, paychecks*. [225004610070] |*Absolute truth: I taught college level research writing courses for the whopping total price of $1250/semester. [225004610080] |The MOST I ever got paid for teaching a college level course was $2800. [225004610090] |In the (modified) words of my literary hero DJay: [225004610100] |You know it's hard out here for a [rhetoric &writing instructor]. [225004610110] |When he tryin to get this money for the rent. [225004610120] |For the Cadillacs and gas money spent Because a whole lotta [students] talkin [nonsense]. [225004610130] |HT kottke. [225004620010] |a debate! [225004620020] |From The Economist: This house believes that the language we speak shapes how we think. [225004620030] |Discuss... [225004630010] |so you want to study linguistics? [225004630020] |Recently, a reader asked me for advice about studying linguistics. [225004630030] |She is an undergraduate in the USA at a college that does not offer a BA in linguistics and she likes math and language, particularly historical linguistics. [225004630040] |I've posted advice to students before here, but this new request was a particularly interesting variation. [225004630050] |What do you do if you're a smart 20 year old at a school that does not quite offer what you want? [225004630060] |What follows is an edited version of the email I sent back: [225004630070] |I must begin with a warning: academic linguistics is a small field, there is precious little room for mediocrity. [225004630080] |There are two kinds of academic linguists, the top 15% and the unemployed. [225004630090] |With that said, if your school doesn't offer linguistics as a degree, then I suggest psychology (the experimental, lab-based kind) or computer science. [225004630100] |Get hands-on experience in lab settings where you are collecting and analyzing data. [225004630110] |Learn basic scientific method. [225004630120] |Both psychology and computer science can offer that. [225004630130] |Computational linguistics is a hot field with lots of opportunities in all sub-fields of linguistics. [225004630140] |Plus, they can get jobs, hehe. [225004630150] |High paying jobs! [225004630160] |Computational linguists are one the the few who can get jobs outside of academia, but the truth is most industry CL jobs are really programming jobs where your programing skills are the real reason you get a job; your Natural Language Processing (NLP) skills are little more than icing on the cake. [225004630170] |The industry is really looking for engineers with some NLP experience, not linguists with some programming skills. [225004630180] |There's nothing wrong with majoring in math (I definitely think all 21st Century linguists should study math), though I think knowing stats is preferable, and that's really a separate field. [225004630190] |There is some controversy regarding whether linear algebra or calculus is better for linguistics (see here, especially the comments), but I really do think stats is key. [225004630200] |Studying biology or genetics is a possibility (neurolinguistics is a hot field). [225004630210] |Liberman posted about genetics and linguistics here. [225004630220] |Probably the single best thing you can do for yourself right now is work your way through the NLTK book. [225004630230] |This will teach you about basic concepts, plus teach you basic tools as well, and it's completely free! [225004630240] |You could also start learning the R language, a great stats based language that many linguists are using these days. [225004630250] |You could also work your way through Tarski's World because basic logic is a sound foundation for all disciplines. [225004630260] |If you want a serious challenge, get your hands on the late Partha Niyogi's ' The Computational Nature of Language Learning and Evolution'. [225004630270] |He passed away recently, far too young for a rising star. [225004630280] |He was a pioneer in using mathematical models to understand linguistics. [225004630290] |If you're interested in cognitive science and linguistics, I suggest regularly reading the Child's Play blog, written by two Stanford cognitive science grad students. [225004630300] |My general advice to any undergrad is simple: don't sweat your undergrad too much; it's the least important part of your education. [225004630310] |Just get it done, regardless of which major you choose, and move on to the good stuff in grad school. [225004640010] |harvard jumps the linguistic shark [225004640020] |Harvard Business Review editor Julia Kirby adds to the mountain of pseudo-scientific bullshit filling the innerwebz by taking the modest results of a small study (about the fact that mimicking accents helps sentence comprehension) and jumping to the wild and unfounded conclusion that salespeople should start faking accents. [225004640030] |It would make a great Monty Python skit, but it's a sad blog post from an editor of a prestigious business magazine. [225004640040] |Money quote: [225004640050] |But this study suggests another possibility. [225004640060] |Perhaps part of why mirroring and matching works is not because of how it operates on the prospect in a sales conversation, but how it operates on the salesperson. [225004640070] |When we switch into another person's mode, however superficially, perhaps our brains are triggered to do so on a deeper level, and we become more able to receive the information that person is trying to convey. [225004640080] |We all know the key to empathy is to walk a mile in another's shoes. [225004640090] |That can never literally be done, especially in brief sales encounters. [225004640100] |But at least we can put on their brogues. sigh... [225004650020] |From Stephen Fry's twitter feed: [225004650030] |Just had an fMRI scan at UCL (part of BBC doc on language I'm making). [225004650040] |Had to play Just A Minute while being scanned. [225004650050] |Fun. [225004650060] |A BBC documentary about language? [225004650070] |Ugh...I don't think even the talents of Stephen Fry can save that one. [225004660010] |google has a huge tool [225004660020] |NPR ran a story today called Google Book Tool Tracks Cultural Change With Words. [225004660030] |It's about "the biggest collection of words ever assembled*", Google's 500 billion word corpus is drawn from the books they've scanned, but here's the catch: many of those books are copywrited, so what Google did is pull a trick that goes back to the very beginnings of computational linguistics, they present the words as an unordered set, or bag o' words: [225004660040] |Many of these books are covered by copyright, and publishers aren't letting people read them online. [225004660050] |But the new database gets around that problem: It's just a collection of words and phrases, stripped of all context except the date in which they appeared. [225004660060] |I first learned about this technique back in 1999 in an intro to computational linguistics course (bit of trivia: we we're using an incomplete pre-print of Martin and Jurafsky; as I recall, the discourse chapter was composed entirely of one page that read 21 Computational Discourse write something here...) and I remember being appalled at its crass simplicity. [225004660070] |I mean, how dare those idiot engineers reduce language down to simple lists of words. [225004660080] |How dare they try to use simple word lists to discover important facts about language and devise important linguistic tools. [225004660090] |It took less than a week for me to change my tune. [225004660100] |The fact is, the bag o' words technique is remarkably powerful and useful. [225004660110] |No, it doesn't solve all problems in one swoop, but it solves a hell of a lot more than I could possibly predict as a naive 2nd year linguistics grad student. [225004660120] |For example: [225004660130] |Irregular verbs are used as a model of grammatical evolution. [225004660140] |For each verb, researchers plotted the usage frequency of its irregular form in red ("thrived"), and the usage frequency of its regular past-tense form in blue ("throve/thriven"). [225004660150] |Virtually all irregular verbs are found from time to time used in a regular form, but those used more often tend to be used in a regular way more rarely. [225004660160] |Google labs lets you play with its tool here (hehe). [225004660170] |*Not sure where this claim originated, but Google has already released a 1 trillion word corpus via LDC, the Web 1T 5-gram Version 1. [225004670010] |magical machine translation [225004670020] |This is un-fucking-believable: [225004670030] |The future has arrived. [225004670040] |HT kottke [225004680010] |ngram or n-gram? [225004680020] |The hottest story of the day is clearly Google's Ngram Viewer. [225004680030] |It's all over blogs, twitter and even the MSM. [225004680040] |But why did Google call it the Ngram Viewer and not the N-gram Viewer? [225004680050] |The hyphenated form is more common in the NLP industry and in general search results (by a 10-1 margin at that). [225004680060] |Nunberg's LL post and Languagehat's post both prefer n-gram when speaking about the tokens themselves and only use Ngram when referencing Google's named product. [225004680070] |Even Google's own people used n-gram in a blog post here. [225004680080] |You gotta wonder what kind of branding process Google went through to decide on ngram (they are notoriously conscious about that kind of thing). [225004680090] |The popularity of this story also demonstrates how much more media savvy Google is because Microsoft has almost exactly the same tool, but no one knows about it. [225004680100] |See here. [225004680110] |The difference is that Microsoft didn't link its use to studying culture and history and give us a nifty online tool to play with, making it more dull sounding than perhaps it otherwise would. [225004680120] |Also, note Microsoft uses N-gram ... frikkin Microsoft. [225004690010] |how NOT to interpret ngrams [225004690020] |Andrew Sullivan has predictably misunderstood the value of Google's Ngram Viewer. [225004690030] |He spent all day yesterday posting trite and simplistic mis-interpretations of the data. [225004690040] |For example, [225004690050] |
  • the concept of ideology is a relatively recent one because the word ideology has become more frequent recently (this is almost certainly false).
  • [225004690060] |
  • Jesus "wins" (his word, not mine) against the Beatles because the word Jesus is more frequent.
  • [225004690070] |I like the Ngram Viewer, but simply plotting the frequency of words against each other to determine something about culture or concepts is a very weak technique that leads to massive mis-interpretations, as we've seen recently with things like counting the number of times President Obama uses pronouns in his speeches. [225004690080] |I discussed the failings of simple word counts as a technique here. [225004690090] |To sum up, [225004690100] |
  • We don't know what causes word frequencies.
  • [225004690110] |
  • We don't know what the effects of word frequencies are.
  • [225004690120] |
  • There are good alternatives.
  • [225004700010] |the linguistics of the simpsons [225004700020] |The magnificent and admiral Snowclone X is the Y of Z made a surprise and instructive appearance on The Simpsons tonight*: [225004700030] |Marge -- Don't worry Lisa, you could still go to McGill, it's the Harvard of Canada. [225004700040] |Lisa --Anything that is the something of the something isn't the anything of anything... [225004700050] |Too true, Lisa, too true. [225004700060] |It's never good to be the shadow of something else. [225004700070] |*This appears to have been a repeat of the 10-10-2010 episode MoneyBART (a nice allusion to Moneyball, btw). [225004710010] |ngram roundup [225004710020] |It's not difficult to find glee and excitement surrounding Google's new Ngram Viewer. [225004710030] |Hyperbolic praise is whirling around the innerwebz like mad. [225004710040] |As an antidote and a nod to the role skepticism should play in our contemporary society, I present a brief round up of criticisms: [225004710050] |Geoffrey Nunberg: ...there are still a fair number of misdated works, and there's no way to restrict a query by genre or topic. [225004710060] |But in the end, the most important consequence of the Science paper, and of allowing public access to the data, is that it puts "culturomics" into conversational play. [225004710070] |Mark Davies: Google Books can't use wildcards to search for parts of words. [225004710080] |For example, try searching for freak* out (all forms: freak_, freaked, freaking, etc) or even a simple search like teenager* ... if Google Books doesn't know about part of speech tags or variant forms of a word, then how can it look at change in grammar? ... [225004710090] |To use collocates with Google Books, you would have to manually download thousands or millions of hits to your hard drive, and then use another program to look for and categorize the collocates. [225004710100] |Mark Liberman: The Science paper says that "Culturomics is the application of high-throughput data collection and analysis to the study of human culture". [225004710110] |But as long as the historical text corpus itself remains behind a veil at Google Books, then "culturomics" will be restricted to a very small corner of that definition, unless and until the scholarly community can reproduce an open version of the underlying collection of historical texts. [225004710120] |David Crystal: ...this is just a collection of books - no newspapers, magazines, advertisements, or other orthographic places where culture resides. [225004710130] |No websites, blogs, social networking sites. [225004710140] |No spoken language, of course, so over 90 percent of the daily linguistic usage of the world isn't here...The approach, in other words, shows trends but can't interpret or explain them. [225004710150] |It can't handle ambiguity or idiomaticity.. [225004710160] |The Binder Blog: The value of the Ngrams Viewer rests on a bold conceit: that the number of times a word is used at certain periods of time has some kind of relationship to the culture of the time. [225004710170] |For example, the fact that the word “slavery” peaks around 1860 suggests that people in 1860 had a lot to say about slavery. [225004710180] |Another spike around the 1970s meshes nicely with the Civil Rights Movement. [225004710190] |Well, that’s sort of interesting. [225004710200] |However, I didn’t need ngrams to tell me that a lot of people were writing about slavery in 1860. [225004710210] |These data are broad but not deep, which makes them relatively useless to most humanities majors interested in intensive study. [225004710220] |The one positive comment that I think bears repeating is the role this fun little tool might play is sparking the imagination of young students interested in the role technology can play in the humanities. [225004710230] |Geoffrey Nunberg: Whatever misgivings scholars may have about the larger enterprise, the data will be a lot of fun to play around with. [225004710240] |And for some—especially students, I imagine—it will be a kind of gateway drug that leads to more-serious involvement in quantitative research. [225004720010] |digg's c**ktail [225004720020] |[UPDATE below) [225004720030] |I couldn't help but notice a story on Digg: Images of alcoholic drinks under the microscope from vodka c**ktails to pina colada. [225004720040] |I checked the original Daily Mail story and saw that the word cocktail was not censored. [225004720050] |I looked for other instances of cocktail on Digg's site and found that all instances look censored, except when the string c-o-c-k occurs in a user name, as the image below demonstrates: [225004720060] |This appears to be a candidate for unnecessary censorship. [225004720070] |I sent an email to Digg asking them if this is intentional censorship or an inside joke within the site. [225004720080] |I'll report any response (don't hold your breath). [225004720090] |[UPDATE: 3:01 Eastern) [225004720100] |Digg support did in fact reply, noting that it was a function of a profanity filter that can be turned off: [225004720110] |Hello , You see that because you have the profanity filter enabled. [225004720120] |To disable it just log in and go to: http://digg.com/settings/preferences --Digg Support [225004730010] |language and thought votes [225004730020] |On the eve of the conclusion to Mark Liberman and Lera Boroditsky's debate at The Economist, there are two vote totals that are interesting to compare. [225004730030] |The obvious one is the lopsided results so far on the main question: Do you agree with the motion? [225004730040] |Here, Boroditsky has a 77%-23% advantage. [225004730050] |However, if you mouse-over each day's vote, it tells you how many yes's have switched to no and vice versa. [225004730060] |The totals there are the near exact opposite: by a 5-1 margin yes's have switched to no. [225004730070] |You are free to interpret this as you wish. [225004730080] |Unfortunately I don't see any raw totals for the number of people voting, so it's anyone's guess what proportion of votes the 6 changes represent (likely, a very small percentage). [225004740010] |half a million language deaths? [225004740020] |Lera Boroditsky's recent concluding statement in The Economist's debate about how language shapes thought states "At the moment we have good linguistic descriptions of only about 10% of the world's existing languages (and we know even less about the half a million or so languages that have existed in the past) (emphasis added). [225004740030] |In my previous post on language death here, I used the number 100,000 to estimate how many languages have previously existed and related it favorably to David Crystal's 64,000 to 140,000 reasonable guesstimate. [225004740040] |I'm just curious to know where Boroditsky came up with the half million number? [225004740050] |I've managed to come up with a few references to this 500,000 number, but they claim it's a "radical estimate" (e.g., see here). [225004740060] |My hunch is that this is yet another example of Boroditsky's profound-problem. [225004740070] |She has a tendency to call modest results profound when they are not. [225004740080] |She is, I suspect, a tad prone to hyperbole. [225004750010] |i know your email address...so what? [225004750020] |Cory Doctorow over at Boing Boing makes the bold claim that there's no compelling evidence that obscuring your email address online using techniques like john DOT smith at host DOT com actually reduces the amount of spam you recieve. [225004750030] |As long as his spam filters are catching the spam effectively, he doesn't mind sharing his email address with the world. [225004750040] |Are you willing to follow his lead? [225004760010] |my bad, global edition [225004760020] |Manute Bol is often credited with coining the phrase my bad (see here and here, or here for alternate hypotheses). [225004760030] |It has apparently made the jump, in some way, to international usage, it's just not clear to me how. [225004760040] |While watching The Girl Who played with Fire again last night, I noticed Lisbeth says something that is translated as my bad, but what she actually says is in Swedish, of course. [225004760050] |(screen shot from Netflix) To my non-Swedish speaking ears, it sounds like she says mitt viel, which would mean something closer to my very, if Google translate is any help. [225004760060] |Google translates my bad into Swedish as mitt dåliga (dåliga appears to be a literal translation of bad). [225004760070] |I'm pretty sure that's not what she said, but I'd have to re-listen to be sure. [225004760080] |So, the linguistic questions are these: [225004760090] |
  • What does she say in Swedish?
  • [225004760100] |
  • What is the history of the Swedish phrase?
  • [225004760110] |
  • Is my bad the best English translation (given its history in slang and in pop culture)?
  • [225004770010] |bustin' a cap [225004770020] |Watching the original True Grit on teevee and what do I hear? [225004770030] |Ned Pepper (Robert Duvall) says something to the effect "I ain't never busted a cap in no girl before." [225004770040] |I thought only contemporary gansta movies and rap lyrics used that phrase (and yes, I did find some examples of bust(-ed) a cap using the Ngram Viewer). [225004790010] |true grit [225004790020] |I posted recently about the phrase "bust a cap" occurring in the original 1969 John Wayne movie True Grit. [225004790030] |I got a chance to see the new Coen Bros version and my reactions are worth airing...or not, you decide... [225004790040] |First, it turns out the phrase true grit has a storied history in the history of English letters: [225004790050] |But this review is destined to be of the non-linguistic kind... [225004790060] |I also had the chance to re-watch the original John Wayne version just a couple days before watching the new one. [225004790070] |While it may be the case that this is a bit unfair because it means the recent version is asked to live up to the original is some ways, nonetheless, it is instructive (insofar as it does NOT). [225004790080] |I hereby forgive the Coen Bros for not watching the original again in preparation for their version. [225004790090] |Surely this would have scuttled their project. [225004790100] |Let me make it clear that the individual performances in the Coen Bros movie alone make it worth watching. [225004790110] |Each actor is given great opportunity to breath life into their character and I respect the Coen Bros for allowing that. [225004790120] |They are truly dedicated to the fine craft of acting and I enjoyed watching their version of True Grit. [225004790130] |Frankly, I could watch Jeff Bridges eat oatmeal and be amazed at how weird and wonderfully he did it. [225004790140] |Nonetheless, my primary complaint is devastating: the new Coen Bros version lacks the basic narrative structure and emotional depth that made the original so fundamentally enjoyable and satisfying. [225004790150] |For the record, I have never read the novel, so I have no clue what it says and the Coen Bros based their new version entirely on that. [225004790160] |However, I can say that one of the most deeply satisfying elements of the John Wayne movie is the development of the relationships that evolve between the child Mattie Ross, the drunken but courageous Rooster Cogburn, and the goofy, but basically decent La Boeuf. [225004790170] |Throughout the original movie, those three characters find a way to forge a sort of dysfunctional, yet basically good and meaningful family unit between them. [225004790180] |This family unit is completely absent from the new version. [225004790190] |And I missed it. [225004790200] |One of the most touching and important moments of the original movie involves Rooster finally opening up to Mattie about his past and his wife and son while the two sit and wait for Ned Pepper's gang to arrive. [225004790210] |This scene reveals Rooster's humanity and deeply emotional character. [225004790220] |It is this scene that helps forge a familial bond, almost like an uncle/niece relationship, between Rooster and Mattie. [225004790230] |And this deep relationship is played out for the rest of the movie. [225004790240] |Developing this scene during a crucial moment of patience and waiting is pure narrative brilliance. [225004790250] |Yet, the Coen Bros took this and turned it into camp and parody. [225004790260] |The lines about his wife and son are basically thrown away in a drunken mumbling as his horse barely manages to contain his heavy frame while they trod along meaninglessly. [225004790270] |What should be a deeply emotional connection forged in a tense moment of expectation becomes slapstick and meaningless. [225004790280] |Why throw this away? [225004790290] |I would need a copy of the new film to point out all of the moments lacking narrative continuity, but here are a few to suffice: [225004790300] |Late in both movies, Mattie stumbles upon her nemesis Tom Chaney while gathering water from a river. [225004790310] |In the original film, the proximity of Ned Pepper's gang is made clear and ominous. [225004790320] |The likelihood that she would find trouble while going for water is made plain. [225004790330] |But in the new version, it plays out like some wildly random coincidence. [225004790340] |The ending of both movies requires these events to take place, but the original movie at least gives us some reasons behind the events, not just chaos and random nothingness. [225004790350] |Ned Pepper is a critical character in the story. [225004790360] |In the original movie, the truly great actor Robert Duvall is given the chance to give the man some decency and honor. [225004790370] |He is a killer, yes, but he also saves Mattie's life, despite claiming to be willing to end it. [225004790380] |In fact, it is Ned Pepper, more than anyone else (in the original), who keeps Mattie alive (until the snake-hole scene at least). [225004790390] |Robert Duvall was given the opportunity to create a Ned Pepper who is full and complex. [225004790400] |In the Coen Bros version the actor Barry Pepper (seriously, no joke, that's his name, weird right?) is barely a grubby and dirty (really seriously dirty, nasty dirty, disgustingly dirty...) killer. [225004790410] |The pathos of Ned Pepper is gone. [225004790420] |By far, the most iconic moment of the original movie is the scene where Rooster takes the reigns of his horse in his mouth and single handedly draws down against four armed opponents. [225004790430] |This is one of the greatest moments of American Western lore, involving the single greatest actor of American Western mythology. [225004790440] |It is truly a moment of cinematic greatness. [225004790450] |Leading up to this, Rooster describes a previous moment in his storied life much like this (earlier in both films) and it forms a crucial part of his legend and character. [225004790460] |When the ultimate moment arrives in the original version, it is a moment of destiny, built up by the dialogue and scenes that have come before it. [225004790470] |But in the Coen Bros version, the whole raison d'etre has been obscured by mumbling and misdirection. [225004790480] |It's almost as if this were every bit as random as everything else that came before it. [225004790490] |You may well argue that randomeness and chaos is in fact the Coen Bros' raison d'etre, and I can't argue against that. [225004790500] |Fair enough. [225004790510] |But then, why bother making a movie about a story for which destiny and courage is so crucial a factor? [225004790520] |Without the great inevitable showdown of Rooster's grit against the despots' manpower, well, why make this movie at all? [225004790530] |If you believe in pure chaos, fine, make No Country For Old Men over and over, got it. [225004790540] |That makes sense. [225004790550] |That's coherent. [225004790560] |But why take this novel and make a movie? [225004790570] |If your primary goal as movie makers is to take previous material well loved by the public and trash it for your own philosophical gain, that's just pure douchebaggery, so screw you Joel and Ethan. [225004800010] |another lingo toy... [225004800020] |I love free online lingo toys like BYU's Corpora and Google's Ngram Viewer and now there's a new one: The Human Speechome Project from MIT" provides a look into the most complete record of a single child’s speech development ever created. [225004800030] |The data has been organized to show the age of the child when he spoke each of his first 400 words." [225004800040] |It's profiled in Forbes here. [225004800050] |And they provide a nifty interactive graph to sort the data: [225004820010] |the linguistics of brand names [225004820020] |The Neurocritic reviews evidence for the whopping increase in drug brand names beginning with the letters z and x starting in 1986 and quotes the conclusion of the study's authors: [225004820030] |Reflecting their infrequent occurrence in English words, x and z count for 8 and 10 points in Scrabble, the highest values (along with j and q) in the game. [225004820040] |So names that contain them are likely to seem special and be memorable. [225004820050] |“If you meet them in running text, they stand out,” is the way one industry insider explained. [225004820060] |Generally, they are also easy to pronounce. [225004820070] |The last point about being easy to pronounce is basically nonesense, so forgive them that, but their basic point that infrequent sounds are more memorable is basically a restatement of Zipf's Law and may have some truth to it. [225004820080] |I can tell you this, there are entire companies that charge high fees to help manufacturers develop brand names (see here for a discussion of what brand name developers do). [225004820090] |I worked at one of them ever so briefly and I found there to be a mix of legitimate linguistics and voodoo linguistics mixed together in the "research" they prepared for their customers. [225004820100] |I also found a resistance to serious linguistics for two reasons: 1) the customers didn't like science (I'm not joking; this was a serious obstacle) and 2) serious linguistics took too long and didn't come to firm conclusions. [225004820110] |Typically, we were asked to initiate, perform, and complete linguistic research on brand names in a matter of weeks. [225004820120] |Ultimately, though, it was my conclusion that a product's name simply was not that crucial to its success, which teetered on the manufacturers overall marketing strategy more than the name. [225004820130] |Think about Google vs. Microsoft. [225004820140] |So, the rise in z and x named drug products is a fad based more in the board room than in the marketplace. [225004830010] |non-linguistic CAPTCHA [225004830020] |David Bradley, writing at sciencetech, reports on a new face-based CAPTCHA process, quoting the team that created it, "Unlike a text-based CAPTCHA, a major benefit of the proposed image-based face detection CAPTCHA is that it does not have any language barriers..." [225004830030] |I guess it never really occurred to me that there would be language barriers in CAPTCHAs because so many of the strings are in fact nonesense words, but I guess language specific phonotactics are helpful (often the identity of a single letter is quite ambiguous). [225004840010] |not any or not one?? [225004840020] |The NYTs recent The Number of None grammar blog post brings up an interesting question: is none semantically closer to not any or not one? [225004840030] |And what should its morphosyntactic agreement be, singular or plural? [225004840040] |The Times takes the not any, plural position, but I am inclined to disagree based on my intuition about substitution. [225004840050] |Below are the two sentences the Times uses to illustrate: [225004840060] |
  • None of the interim employers or temporary agencies have contributed to a 401(k)
  • [225004840070] |
  • None of the works have gained a foothold in the seasonal repertory.
  • [225004840080] |Now, with the substitutions and my personal acceptability rating (where * means mildly unacceptable/not sure and ** means completely unacceptable). [225004840090] |
  • Not one of the interim employers or temporary agencies has contributed to a 401(k)
  • [225004840100] |
  • Not one of the works has gained a foothold in the seasonal repertory.
  • [225004840110] |
  • **Not any of the interim employers or temporary agencies have contributed to a 401(k)
  • [225004840120] |
  • **Not any of the works have gained a foothold in the seasonal repertory.
  • [225004840130] |
  • Not one of the interim employers or temporary agencies have contributed to a 401(k)
  • [225004840140] |
  • Not one of the works have gained a foothold in the seasonal repertory.
  • [225004840150] |
  • **Not any of the interim employers or temporary agencies has contributed to a 401(k)
  • [225004840160] |
  • **Not any of the works has gained a foothold in the seasonal repertory.
  • [225004840170] |The above ratings suggest that I make no distinction in acceptability between none has and none have. [225004840180] |But wait, there's more. [225004840190] |Let's remove the lengthy PP and see how this pans out: [225004840200] |
  • *Not one of them has contributed to a 401(k)
  • [225004840210] |
  • *Not one of them has gained a foothold in the seasonal repertory.
  • [225004840220] |
  • **Not any of them have contributed to a 401(k)
  • [225004840230] |
  • **Not any of them have gained a foothold in the seasonal repertory.
  • [225004840240] |
  • Not one of them have contributed to a 401(k)
  • [225004840250] |
  • Not one of them have gained a foothold in the seasonal repertory.
  • [225004840260] |
  • **Not any of them has contributed to a 401(k)
  • [225004840270] |
  • **Not any of them has gained a foothold in the seasonal repertory.
  • [225004840280] |I seem to slightly prefer the singular reading when the word none is close to the verb but with a plural noun heading the PP. [225004840290] |But this is not true if we delete the PP altogether: [225004840300] |
  • Not one has contributed to a 401(k)
  • [225004840310] |
  • Not one has gained a foothold in the seasonal repertory.
  • [225004840320] |
  • *Not one have contributed to a 401(k)
  • [225004840330] |
  • *Not one have gained a foothold in the seasonal repertory.
  • [225004840340] |It would appear I have an incoherent grammar (surely this is true as I believe all grammars are, in some way, incoherent. [225004840350] |As Sapir said, all grammars leak). [225004840360] |But, there's at least one other factor muddying the linguistic waters. [225004840370] |The fact that one also acts a pronoun as in one does one's duty. [225004840380] |When acting as a pronoun, it takes 2nd pers, SG agreement, as in one has to do one's duty (think he has to do his duty), not *one have to do one's duty. [225004840390] |It may be that this pronoun agreement is interfering with my reading when one occurs right next to the verb. [225004840400] |Also, I did this pretty fast, so I wouldn't be surprised if I change my mind by COB... [225004840410] |Of course, how could I resist: [225004840420] |I believe I got the full paradigm: [225004840430] |
  • not one of them has
  • [225004840440] |
  • not one of them have
  • [225004840450] |
  • not any of them has
  • [225004840460] |
  • not any of them have
  • [225004840470] |
  • none of them has
  • [225004840480] |
  • none of them have
  • [225004840490] |
  • none has
  • [225004840500] |
  • none have
  • [225004840510] |It appears as though none have had a hell of a start to the 18th 19th Century, but got killed off along with the Buffalo. [225004850010] |refudiate, the word that won't die [225004850020] |Thanks in no small measure to the Oxford University Press naming refudiate its Word Of The Year plus The Daily Dish rekindling its favorite topic, we have a new round of he-said-she-said to deal with. [225004850030] |Made famous by Sarah Palin this past summer (see Liberman's original post here, and others here), it is yet again the object of speculation as to why Palin used the form to begin with. [225004850040] |Palin herself poured fuel on this fire two days ago by tweeting that it was a typo. [225004850050] |Liberman thinks that explanation didn't hold water the first time around because she first said it aloud on teevee: the original example [on teevee] wasn't a slip of the tongue, but a symptom of the fact that Ms. Palin had a blend of repudiate and refute as a well-established entry in her mental lexicon [note added]. [225004850060] |Why the fuss? [225004850070] |There's nothing particularly interesting or telling about the linguistic blending of repudiate and refute. [225004850080] |Everyone does this kind of thing now and again and sometimes it sticks. [225004850090] |Some people like to beat up on public figures any time they can, so something like this is a target. [225004850100] |But the more serious speculation is that the Palin Camp's public responses expose something important about Sarah Palin's inner circle and consultation. [225004850110] |I'll leave it to the political pundits to fight that one out. [225004850120] |For now, [225004860010] |etymologists , unite! [225004860020] |A buddy wrote me an interesting question (to which I did not have an answer): [225004860030] |It's been driving me crazy, is there a term of art for when the etymological root of a word is the opposite of the word's modern meaning? [225004860040] |For example, asbestos means "an unquenchable fire"; philander means "a lover of men" etc. [225004860050] |Cheers, A., [225004860060] |Anyone know this? [225004870010] |dialects map [225004870020] |Extremely detailed North American English Dialects, Based on Pronunciation Patterns. [225004870030] |The site could use a bit of a web re-design ... looks circa 1999. [225004870040] |Anyone care to offer free web design help to clean up this otherwise useful resource a little? [225004880010] |does asbestos really mean 'unquenchable'? [225004880020] |Yes, at least etymologically. [225004880030] |The Online Etymology Dictionary explains its etymology this way: ...from O.Fr. abeste, from L. asbestos "quicklime" (which "burns" when cold water is poured on it), from Gk. asbestos, lit. "inextinguishable," from a- "not" + sbestos, verbal adj. from sbennynai "to quench," from PIE base *(s)gwes- "to quench, extinguish" (cf. Lith. gestu "to go out," O.C.S. gaso, Hittite kishtari "is being put out") (emphasis added). [225004880040] |Like people, every word has lived its own peculiar and unique life. [225004880050] |Riffing on my post below regarding words that have the opposite meaning of their etymology, my friend Andy (who did graduate work in Classics, and hence, actually reads Greek) challenged me to help him understand why the word asbestos, whose etymology literally means 'unquenchable' is used today to mean a substance that cannot burn. [225004880060] |With some Googling, I found this (PDF): "First mention of asbestos appeared in the Greek text On Stones, written by Theophrastus, one of Aristotle’s students. [225004880070] |Theophrastus referred to a substance that resembled rotten wood and burned (right) without being harmed when doused with oil." [225004880080] |So, Ol' Theophrastus kept pouring oil onto this stuff, but it never burnt, so he kept pouring, but the stuff was never quenched by oil/fire. [225004880090] |Hence, it was unquenchable. [225004880100] |That's my story and I'm sticking to it (for now). [225004880110] |Andy did some follow-up of his own and provides the following: Yes, that's one of the more likely explanations. [225004880120] |In my research I came across the use of asbestos as permanent wicks in lamps, but never noted the bit about being unquenchable with oil. [225004880130] |That Theophrastos citation really belongs in the dictionary entry below, as it's the only cite that explains the meaning under A. [225004880140] |The lexicon below is massively comprehensive (if you couldn't tell) so it's odd they missed Theo. [225004880150] |The other possible explanation is II. or "unslaked lime", as quick lime burns underwater. [225004880160] |This was a key component in later "Greek fire", but so far I haven't been able to find any ancient source that cites an unquenchable substance (Greek Fire dates to 500 AD, white phosphorus, which also burns underwater, dates to 1600 AD, and sodium, which explodes on contact with water, dates to 1800 AD). [225004880170] |If I had the time and language skill I used to have I would search my CD of all Greek text up to 600 AD for cites of asbestos and then comb thru them, but that would be a day's worth of work I'm pleased that we got close to the meaning in online research and I'm not sure that looking up every instance of asbestos would change anything. [225004880180] |Andy also provided the following reference [225004880190] |ἄσβεστος , ον, also η, ον Il.16.123:— A. unquenchable, inextinguishable, “φλόξ” Il. l. c.; not quenched, “πῦρ .” [225004880200] |D.H.3.67, Plu.Num.9; “κλέοςOd.4.584; “γέλωςIl.1.599; “βοή11.50; “ἐργμάτων ἀκτὶς καλῶν . αἰείPi.I.4(3).42; . πόρος ὠκεανοῦ ocean's ceaseless flow, A.Pr.532 (lyr.); πῦρ, of hell, Ev.Marc.9.43. [225004880210] |II. as Subst., ἄσβεστος (sc. τίτανος), h(, unslaked lime, Dsc.5.115, Plu.Sert.17, Eum.16; “. κονία” Lyc. ap. [225004880220] |Orib.8.25.16. [225004880230] |2. a mineral or gem, Plin.HN37.146. ἀσβεστώδης: tofus, Gloss. [225004880240] |Henry George Liddell. [225004880250] |Robert Scott. [225004880260] |A Greek-English Lexicon. revised and augmented throughout by. [225004880270] |Sir Henry Stuart Jones. with the assistance of. Roderick McKenzie. [225004880280] |Oxford. [225004880290] |Clarendon Press. 1940. [225004890010] |plagiarism and n-grams [225004890020] |Big media plagiarism is once again in the news as ESPN has suspended an on-air host for plagiarizing three sentences from a newspaper columnist. [225004890030] |The on air host has admitted the plagiarism*, issued an apology, and asked for forgiveness. [225004890040] |The multiple and confusing ethical standards for plagiarism has have been the subject of of several LL posts (recently here) and this led me to wonder about what counts as plagiarism in the first place. [225004890050] |Clearly a three sentence, 45 word passage, almost word for word identical with another, in the same semantic domain with the same referents, is a case of plagiarism. [225004890060] |But what about a 20 word passage? [225004890070] |10 word? [225004890080] |4 word**? [225004890090] |Many short phrases are highly frequent, right? [225004890100] |You couldn't felicitously accuse me of plagiarism for using the phrase "I am going..." could you? [225004890110] |Even though, there can be no doubt, that someone else before me used it first. [225004890120] |Yes, I know you can find guidelines for plagiarism in college student handbooks and such. [225004890130] |I dealt with those for years when I taught college writing courses (and I recall flunking at least three students for plagiarism, but those were whole papers, really stupid stuff). [225004890140] |But I wonder, now that we have a 500 million word corpus available to us, couldn't we simply compare all n-grams to discover how likely it is that any given 5-gram is repeated? [225004890150] |I'd prefer to do this up to 20-gram and such, but wouldn't we predict that there comes a point at which the likelihood that a particular phrase was plagiarized (given that we had found two alike) would be based solely on the general likelihood that n-grams of that size are repeated. [225004890160] |The situation would be this: you discover that a particular 11 word passage has an identical twin from 2 years ago. [225004890170] |Without bothering to look into whether or not the author had access to the previous work, you simply look up the likelihood that any 11-gram passage is repeated and discover that there is a 0.0002% chance that a phrase that long will be repeated. [225004890180] |With some effort, you could then derive predictions for near identical passages (using WordNet and similar resources).... [225004890190] |..just thinking out loud... [225004890200] |*I am ignorant of the role ESPN's producers play in the writing of on air speeches, but the quote seems clearly to have been written on a teleprompter at the time of speaking, which means someone else was involved, even if unwittingly. [225004890210] |Nonetheless, the host is taking the fall willingly. [225004890220] |**Excluding obviously famous phrases like Ich bin ein Berliner. [225004920010] |how we hear ourselves speak [225004920020] |Science Daily has a nice article on new neurolinguistic research out of Cal linking auditory and speech processes: [225004920030] |"We used to think that the human auditory system is mostly suppressed during speech, but we found closely knit patches of cortex with very different sensitivities to our own speech that paint a more complicated picture," said Adeen Flinker, a doctoral student in neuroscience at UC Berkeley and lead author of the study. [225004920040] |"We found evidence of millions of neurons firing together every time you hear a sound right next to millions of neurons ignoring external sounds but firing together every time you speak," Flinker added. [225004920050] |"Such a mosaic of responses could play an important role in how we are able to distinguish our own speech from that of others." [225004920060] |HT Linguistic News Feeds [225004930010] |the germans fear my language too, muahahaha [225004930020] |It's a mighty era to be a native speaker of English. [225004930030] |It seems the world fears my language and is instituting fruitless policies to protect their languages against my own. [225004930040] |First the Chinese banned English words and phrases. [225004930050] |Now, the Germans are getting on the banning bandwagon: [225004930060] |Germany's Transport Minister claimed to have struck an important blow for the preservation of the German language yesterday after enforcing a strict ban on the use of all English words and phrases within his ministry. [225004930070] |Peter Ramsauer stopped his staff from using more than 150 English words and expressions that have crept into everyday German shortly after being appointed in late 2009. [225004930080] |His aim, which was backed by Chancellor Angela Merkel, was to defend his language against the spread of "Denglish" –the corruption of German with words such as "handy" for mobile phone and other expressions including "babysitten" and "downloaden". [225004930090] |As a result, words such as "laptop", "ticket" and "meeting" are verboten in Mr Ramsauer's ministry. [225004930100] |Instead, staff must use their German equivalents: "Klapprechner", "Fahrschein" and "Besprechung" as well as many other common English words that the minister has translated back into German. [225004940010] |naive bayes knows restaurants better than 5,000 mechanical turks [225004940020] |Yelp recently sponsored a bake-off between a Naive Bayes classifier and the online crowd-sourcing site Mechanical Turk. [225004940030] |The task was classifying web sites according to their business category (i.e., is it a restaurant or a doctors office?). [225004940040] |The classifier beat the turkers handily: [225004940050] |Money quote: In almost every case, the algorithm, which was trained on a pool of 12 million user-submitted Yelp reviews, correctly identified the category of a business a third more often than the humans. [225004940060] |In the automotive category, the computer was twice as likely as the assembled masses to correctly identify a business. [225004940070] |There are a variety of qualifications (why did 99% of Turkers who applied for the task fail the basic test? [225004940080] |ESL issues perhaps?). [225004940090] |But it's an interesting result. [225004940100] |HT kdnuggets [225004950010] |jobs for linguists [225004950020] |As the economy slowly starts to wake, I hope and expect to see more jobs like this one popping up where general linguistics skills are being sought by innovative tech companies (these were a dime a dozen in the glory days of the tech boom 90s). [225004950030] |Were I a bit younger, and less well-payed, I'd probably consider applying myself. [225004950040] |We are seeking a Linguist interested in joining a rapidly growing organization. [225004950050] |The Linguist will work closely with our NLP Team in researching and developing lexica and grammars specific to various languages (“Language Packs”) that will be used for various NLP tasks. [225004950060] |She/he will be expected to contribute substantive insight/action with regard to developing language packs and must have a keen eye for understanding the end-user experience. [225004950070] |Specific responsibilities include: [225004950080] |- Research specific languages for their lexical, morphological, and grammatical structures [225004950090] |- Develop original lexicons and reformat acquired lexicons [225004950100] |- Create grammatical rules using the research done above or other sources [225004950110] |- Analyze results from the system for mistakes and plan for improvement [225004950120] |- Willingness to focus research and development of Language Packs on meeting the end-user’s needs [225004950130] |If you're a linguist interested in a non-academic career, you could do worse than apply here. [225004950140] |And for the record, I have no association with this company, have never worked for them, get nothing from posting this, but I do know one of their employees (we went to grad school together). [225004960010] |annals of unnecessary censorship, literary canon edition [225004960020] |Upcoming NewSouth 'Huck Finn' Eliminates the 'N' Word. [225004960030] |Twain scholar Alan Gribben and NewSouth Books plan to release a version of Huckleberry Finn, in a single volume with The Adventures of Tom Sawyer, that does away with the "n" word (as well as the "in" word, "Injun") by replacing it with the word "slave." [...] [225004960040] |"What he suggested," said La Rosa, "was that there was a market for a book in which the n-word was switched out for something less hurtful, less controversial. [225004960050] |We recognized that some people would say that this was censorship of a kind, but our feeling is that there are plenty of other books out there—all of them, in fact—that faithfully replicate the text, and that this was simply an option for those who were increasingly uncomfortable, as he put it, insisting students read a text which was so incredibly hurtful." [225004960060] |I'm curious about this notion of replacement as an "option" for two reasons. [225004960070] |First, it reminds me of Ted Turner's infamous and ill-fated 1980s colorization project whereby he went back and artificially colorized black and white movies. [225004960080] |As I recall, Turner also spoke of it as an "option", but it failed miserably as a cultural movement. [225004960090] |Second, now that eReaders are becoming commonplace I wonder if publishers will begin to offer sanitized versions of books as an option. [225004960100] |I don't have an eReader, so maybe this is already available, but I could imagine a filter that you click on and magically Henry Miller's Tropic of Cancer becomes a weirdly different novel. [225004960110] |HT kottke [225004970010] |adults process language in a baby way! [225004970020] |Do babies process language in a "grown-up" way? [225004970030] |First, read this from UCSD: [225004970040] |Babies, even those too young to talk, can understand many of the words that adults are saying –and their brains process them in a grown-up way. [225004970050] |Combining the cutting-edge technologies of MRI and MEG, scientists at the University of California, San Diego show that babies just over a year old process words they hear with the same brain structures as adults, and in the same amount of time. [225004970060] |Moreover, the researchers found that babies were not merely processing the words as sounds, but were capable of grasping their meaning [emphasis added]. [225004970070] |It certainly is an interesting finding to discover that infant and adult lexical processing may be similar, but why couch it in asymmetrical phrasing? [225004970080] |Given the facts as this press release states them, could we equally as well say that adults process language in a baby way? [225004970090] |This wouldn't get any press attention, though, would it. [225004970100] |Or worse, it would be mocked. [225004970110] |The author of the press release, Debra Kain, is referred to as a spokesperson for the UCSD Medical Center in this article. [225004970120] |But it's not clear she consulted Jeff Elmen, a very well respected cognitive scientist who participated in the research. [225004970130] |I'm not sure how comfortable he would have been with the somewhat excitable language. [225004980010] |biggest linguistics story of 2010? [225004980020] |I have nothing but respect and admiration for Erin McKean, CEO and Co-Founder of the awesome Wordnik project as well as the person who has given by far the single greatest lingo-TED-talk ever; nonetheless, I take exception to her most recent column in the Boston Globe titled The year in language which is an article about the best and worst language stories of 2010. [225004980030] |She notes many worthy events, yet... [225004980040] |With no offense meant, I can say that I was shocked, SHOCKED! to discover that no mention whatsoever was made of what I consider to be the single most important and shocking linguistics related story of 2010: the revelation that Harvard's Marc Hauser fabricated data regarding rule learning by monkeys. [225004980050] |For years, Hauser has posed as a giant in the Chomsky camp, and created an ivy-league cottage industry based on his research. [225004980060] |2010's revelations of his still-unclear-yet-nonetheless-obvious-forgery is a shock-wave whose full power and ramifications have yet to be fully understood. [225004980070] |Plus, it was the Boston Globe itself, the paper Erin publishes in, that broke the original story. [225004980080] |Language Log's extensive discussions of the Hauser story can be found here. [225004990010] |replace QWERTY with little circles? [225004990020] |Android users can look forward to a new typing layout specifically designed for one handed, hand-held device typing by 8pen. [225004990030] |There have long been alternatives to the traditional QWERTY layout, but this one replaces keys with hand motion, so rather than landing your finger on the letter you want to type (the conceptual foundation of most keyboard concepts) this one rests on the idea that you make little circles on the screen while different letters are accessed. [225004990040] |In the words of the horse from Ren and Stimpy, no sir, I don't like it. [225004990050] |Why not? [225004990060] |While inefficient and clumsy, the classic idea of touching the letter you want is fundamentally natural and clear. [225004990070] |Any child or lazy adult can grasp it immediately. [225004990080] |The little circles idea creates an artificial and unnatural interface that puts you multiple steps away from what you want. [225004990090] |I'm not trying to make circles, I'm trying to type a frikkin k. [225004990100] |I'm sure with practice anyone could get good at this, but I don't wanna practice typing for frik's sake! [225004990110] |That's why I've been a clumsy hunt and pecker for 30 years with the damned QWERTY. [225004990120] |I could have practiced typing on this damn thing also, but I didn't for the same reason I'm not gonna practice the little circles: I'm lazy. [225004990130] |But at least with keys I can just touch the letter I want and get it. [225004990140] |It's clear and obvious. [225004990150] |I'm sure the little circles would drive me mad. [225005000010] |The Psychological Functions of Function Words [225005000020] |Here is Chung &Pennebaker's 2007 paper on function words which crucially relies on Pennebakers' LWIC data: The Psychological Functions of Function Words (pdf). [225005000030] |I have long felt that function words have been wrongly ignored by computational linguists and SEO specialists. [225005000040] |While the use of stop lists have sped up processing time considerably, they have also wiped out huge amounts of semantically meaningful data. [225005000050] |Nonetheless, I also feel the Pennebaker's LWIC corpus is not as transparent or as comprehensive as I would prefer it to be. [225005010010] |Do rich families talk to their kids more than poor families? [225005010020] |Are Children in professional families talked to three times as much as the children in welfare families? [225005010030] |That's the underlying assumption behind a new program at Bellevue hospital designed to coach "poor families on how to talk to their infant children, encouraging more interaction." [225005010040] |At least, that's how the Huffington Post wants you to think about this story: [225005010050] |University of Kansas graduate student Betty Hart and her professor, Todd Risley, wanted to figure out the cause of the education gap between the rich and poor. [225005010060] |So, they targeted early education and headed a study that recorded the first three years of 40 infants' lives. [225005010070] |The conclusion? [225005010080] |Rich families talk to their kids more than poor families. [225005010090] |Pretty impressive, huh? [225005010100] |Sounds cutting edge, right? [225005010110] |With a little searching I discovered the following: [225005010120] |
  • Betty Hart was a grad student at KU in the 1960s.
  • [225005010130] |
  • The research data for this study was collected in the early 1980s.
  • [225005010140] |
  • The paper publishing these results was published in 1995.
  • [225005010150] |I have no problem with the common sense underlying these notions: talking to babies a lot helps them achieve higher success in academics later in life. [225005010160] |Good advice all around, no doubt. [225005010170] |But I'm suspicious of several assumptions about the finding of the original paper. [225005010180] |From Alix Spiegel: [225005010190] |According to their research, the average child in a welfare home heard about 600 words an hour while a child in a professional home heard 2,100. [225005010200] |"Children in professional families are talked to three times as much as the average child in a welfare family," Hart says [emphasis added]. [225005010210] |Hearing words in your environment and talking to children are two different things and need to be distinguished, as well as child-directed speech. [225005010220] |All I have are secondary sources not the 1995 book (Spiegel's article is the most thorough) so I can't tell how the data was coded and what they looked for (did the make the above three distinctions?). [225005010230] |But more to the point is the contemporary rush to paint these old findings as rationale to create new programs aimed at poor parents as if being poor makes your language use wrong somehow. [225005010240] |It strikes me as convoluted logic to take a 15 year old book (based on 20 year old data) and decide that poor parents need linguistic intervention. [225005010250] |Exactly how much grant money did Dr. Mendelsohn spend on this program? [225005010260] |Even if the 3-1 ratio holds true (I suspect it would not under close scrutiny), what other factors might be affecting this? [225005010270] |It struck me that people with basically good intentions took a small amount of science out of context and used it to reinforce class stereotypes and class pressure. [225005020010] |doggie do do at the the HuffPo [225005020020] |The Huffington Post is resetting the bar for astoundingly stupid science reporting: They report on a dog, Chaser, who has been trained to accurately fetch over 1000 toys by sound of the name and conclude that the dog's abilities, wait for it, place her at an intelligence level equivalent to a three-year-old human child! [225005020030] |My oh my, their view of the cognitive ability of 3 year olds is as depressing as it is profoundly wrong. [225005020040] |Sorry, 3 year old humans can do more than make one-to-one correspondences between sounds and objects. [225005020050] |They can, for example, recognize that the sound swing can mean an object with a seat attached to ropes OR the action you perform when you move your body back and forth on that thing with the ropes, they can watch TV and follow plot developments, ... sigh, I mean fuck it, it's not worth debunking ... [225005020060] |UPDATE: Sean at Replicated Typo reviews the original research involving Chaser. [225005040010] |how distinctive is app store? [225005040020] |Microsoft is arguing that Apple cannot trademark the term app store because it is a generic term. [225005040030] |"An 'app store' is an 'app store'," Russell Pangborn, Microsoft's associate general counsel, said, according to the BBC. [225005040040] |"Like 'shoe store' or 'toy store', it is a generic term that is commonly used by companies, governments and individuals that offer apps." [225005040050] |A commenter at Hacker News begs to differ: [225005040060] |Ngram data shows no usage of "App Store" or "app store" from the time of 1800 to 2008. [225005040070] |I was suspicious of this, but using the terms "app,store" separately produced lots of data points. [225005040080] |My tentative hypothesis is that Ngram is using data that existed before the App Store went public and thus will not show up in Ngram. [225005040090] |I'm no trademark expert, but the basic idea, as Wikipedia defines it, is distinctiveness: A trademark may be eligible for registration, or registrable, if amongst other things it performs the essential trademark function, and has distinctive character. [225005040100] |Registrability can be understood as a continuum, with "inherently distinctive" marks at one end, "generic" and "descriptive" marks with no distinctive character at the other end, and "suggestive" and "arbitrary" marks lying between these two points. [225005040110] |First, I used BYU's Corpus of Contemporary American English and found an instance in 2009 of 'app store" being used to describe Zune's product: Oh, the Zune has an app store, all right. [225005040120] |As of today, there are exactly nine programs in the Zune App Store. [225005040130] |A quick google search reveals that it commonly gets applied to non-Apple related products as well: Yep, Amazon Launching Their Own App Store For Android Too. [225005040140] |While it may be the case that Apple introduced the term in 2008, it seems to have expanded to generic use in less than a year and now gets used at least semi-regularly for non-Apple products. [225005040150] |I'm not an Apple user myself and my own reading of app store is definitely generic. [225005040160] |It does not distinctly mean Apple's product at all, to me. [225005040170] |I have no clue if a court would agree. [225005050010] |true grit phonological ambiguity [225005050020] |Thanks to Jeff Bridges' now infamous mumbling performance, the clever folks at College Humor give True Grit a version of the lip reading treatment that Star Trek received not too long ago. [225005050030] |See more funny videos and funny pictures at CollegeHumor. apologies for the weird embedding, I don't know how to fix it (I just pasted the embed code into the Blogger HTML with no option to adjust size)...and yes, I'll have some more of that woop woop, please... [225005060010] |god awful is an odd phrase [225005060020] |I used the phrase god awful in a comment at Language Log and it occurs to me that it's an odd little creature. [225005060030] |From the OED*: [225005060040] |Pronunciation: /ˌgɒdˈɔːfʊl/ Forms: Also God awful, Godawful.(Show More) Etymology: extremely unpleasant. [225005060070] |(In quot. [225005060080] |1878 the sense is ‘impressively large’.) [225005060090] |1878 J. H. Beadle Western Wilds xxxvii. [225005060100] |611 Put thirty acres into wheat, and went to work with a hurrah in 1874 to make a God-awful crop. [225005060110] |1897 C. M. Flandrau Harvard Episodes 88 Ellis is such a God awful fool. [225005060120] |1930 W. S. Maugham Breadwinner ii. 124 Your affairs are in a god-awful mess. [225005060130] |1946 ‘S. Russell’ To Bed with Grand Music i. 14 Listen to the most godawful programmes on the radio. [225005060140] |1958 R. Graves in Times Lit. [225005060150] |Suppl. [225005060160] |15 Aug. p. x/4 The credible and vivid story that any context (red-brick, yellow-brick, or otherwise God-awful) offers. [225005060170] |1959 P. McCutchan Storm South iv. 63, I heard the most God-awful racket above my head. [225005060180] |The meaning is derived from using god as an intensifier like very. [225005060190] |Fine, I get this analysis, it makes sense. [225005060200] |But is god ever used in any other construction to intensifier a negative quality like awful? [225005060210] |This is a case where corpora are not terribly useful because the instances of god are so frequent, and so frequently NOT in this kind of construction, it's difficult to discover automatically. [225005060220] |I could go all qualitative and just read a million phrases with god in them, but that would take a really long time and still have a low probability of success. [225005060230] |HT to the OED for making their site freely available this month! [225005060240] |Use name/password trynwoed/trynewoed. [225005070010] |the most difficult linguistics sentence ever? [225005070020] |Imagine I give you the sentence template that follows: [225005070030] |
  • If speakers omit X to avoid Y, optional Z should be less likely if W.
  • [225005070040] |Question: What X, Y, Z and W could possibly make that sentence EASIER to understand? [225005070050] |For no particular reason other than (that) I love linguistics and will read any free article that catches my fancy, I've been reading Florian Jaeger's Phonological Optimization and Syntactic Variation: The Case of Optional that. [225005070060] |Submitted for Proceedings of 32nd BLS (pdf). [225005070070] |I have nothing but respect for Jaeger as a linguist* and this is a very interesting paper that I have enjoyed reading**. [225005070080] |But flo*** has a knack for producing very difficult to read sentences. [225005070090] |Here's the original that produced the template above: [225005070100] |If speakers omit optional that to avoid segmental OCP violations with the immediately preceding or following segment, optional that should be less likely if the segments was to share some articulatory feature with the adjacent segment of that. [225005070110] |It actually got worse WITH context, right? [225005070120] |And I read the actual paper, with all kinds o' context. [225005070130] |And I still had to re-read that sentence many many times. [225005070140] |I'm still not sure I understand it. [225005070150] |I may have to whip out PowerPoint, a laser pointer, and a flashlight before I figure it out for sure. [225005070160] |Now, I'm prepared to admit that the three pints of BBC Bourbon Barrel Stout at Galaxy Hut may have influenced my critique ... [225005070170] |...but not entirely for the worse. [225005070180] |If I ever get around to typing up my awesome and prodigious commentary, it might make a great blog post ... but don't hold your breath. [225005070190] |I have a stack of linguistics articles I've read and reviewed over the last 12 months and yet somehow, I just never get around to typing up my truly awesome comments (including in-depth discussion of flo's partner-in-crime Peter Graff's Longitudinal Phonetic Variation in a Closed System -- I got mad comments on that one). [225005070200] |Maybe I should have called this blog The Lazy Linguist? [225005070210] |*I've never met the guy so maybe he's a bastard in person, I dunno, I hope not... **Not in the least because it has some tangential connection to my somewhat defunct dissertation research. ***Hey, he calls himself that on his site... [225005080010] |oh snap! daume talkin trash 'bout "stupid" penn tree bank [225005080020] |Hal Daume at his excellent NLPers blog is wondering aloud about parsing algorithms doing "real" syntax: [225005080030] |One thing that stands in our way, of course, is the stupid Penn Treebank, which was annotated only with very simple transformations (mostly noun phrase movements) and not really "deep" transformations as most Chomskyan linguists would recognize them [emphasis added]. [225005080040] |Oh no he di'nt! [225005080050] |[UPDATE: hal responds thoughtfully in the comments and properly corrects my misunderstandings of his post.] [225005080060] |It's certainly fair to say that the Penn Treebank is not annotated for everything. [225005080070] |Sure. [225005080080] |But show me the perfect resource and I'll let you throw all the stones you want. [225005080090] |More to the point, once you get beyond deciding what the basic chunks are (NPs,VPs, PPs, etc), there's little agreement on what is and what is not a "real" syntactic thing. [225005080100] |In order to annotate anything above this level, you have to choose a theoretical camp to park your tent in. [225005080110] |You have to take sides. [225005080120] |Daume is happy to be a Chomskyan. [225005080130] |He's taken his side. [225005080140] |Good for him. [225005080150] |In order to annotate Daume's beloved deep transformations, one must first admit such things exist. [225005080160] |I do not. [225005080170] |And if Daume started annotating the Penn Treebank with such things, I wouldn't care. [225005080180] |I would argue he is wasting his time chasing unicorns. [225005080190] |Daume may believe that Chomskyan theory is "real" syntax, but I do not. [225005080200] |Nor do most linguists (if you surveyed all linguists throughout the world, yes I do believe a majority would disagree with the statement I believe in Chomskyan deep structure). [225005080210] |UPDATE: Daume's comments and his responses are well worth reading. [225005090010] |worst word play of the year? [225005090020] |Your call... [225005100010] |like wikipedia with a voice? [225005100020] |It can be difficult to get a feel for what some tech start-ups are going for. [225005100030] |This demo of Qwiki at a Tech Crunch event asks us to think of information as an experience. [225005100040] |I'm pretty sure the voice is synthesized because of some odd prosody and the weird way Yelp is pronounced (oh, and the unlikelihood that they could pre-record all the possible narration ... yeah, that too). [225005100050] |At the end, all I could think of was "it's like Wikipedia with a voice..." [225005100060] |Qwiki at TechCrunch Disrupt from Qwiki on Vimeo. [225005110010] |the perils of translation: does und mean well? [225005110020] |I'm watching the truly powerful 2009 Oscar winning German film The White Ribbon on Netflix. [225005110030] |Even after a few minutes it has grabbed me and impressed me with its simplicity and power, in the style of many great films. [225005110040] |Hollywood used to make films like this. [225005110050] |Films that mattered. [225005110060] |Films that taught deep truths about what it means to be human. [225005110070] |Films like "Inherit The Wind", "Guess Who's Coming To Dinner", "To Kill A Mockingbird". [225005110080] |Now Hollywood makes three Jennifer Anister rom-coms a year and panders to fan boys... but I digress ... [225005110090] |My German is pretty rusty, but the film's dialogue is simple enough (in a good way) for me to catch most of it even without the subtitles, which is exactly the source of the linguistic point I want to discuss. [225005110100] |In one early scene, the narrator, a teacher, recounts an incident involving himself* and a student wherein the student was endangering himself, so the teacher demands the student explain his actions. [225005110110] |When he fails to get a proper response, he says repeatedly Und? ... [225005110120] |Und? ... [225005110130] |German und is easily translated as and but the film's translators choose to use the English word well instead. (screen grab from Netflix) As a native speaker of English, I can see the reasoning behind well, yet I must say, it's equally plausible to use and in that situation as well, maybe even more so. [225005110140] |The use of well in English would suggest a certain formality that the translator felt was proper, but it also makes me, as an English speaker, feel a bit awkward, like I'm being fed an anachronism. [225005110150] |Perhaps that's appropriate for the movie, I'm not sure, it just struck me as an interesting linguistic choice. [225005110160] |It's a nice example of the beautiful ambiguity of lexical items, really. [225005110170] |For example, just a few scenes later the teacher encounters Eva and asks her about who she is and what he's heard about her, namely that she's a new nanny in town, and her response is und, but it is translated as English so: [225005110180] |(screen grab from Netflix) Again, as a native speaker of English, I can "get" the translation, but still, I'd be perfectly happy with and in both. [225005110190] |I've never been a translator and I have much respect for the difficult job professional translators do navigating these treacherous waters. [225005110200] |I don't mean to second guess. [225005110210] |Rather, it strikes my as an interesting point of discussion. [225005110220] |*why can't I say hisself? [225005110230] |Oh, where are you Jeff Runner when I need you! [225005120010] |Obama's State Of The Union and word frequency [225005120020] |In anticipation of President Obama's 2011 State Of The Union speech tonight, and the inevitable bullshit word frequency analysis to follow, I am re-posting my post from last year's SOTU reaction, in hope that maybe, just maybe, some political pundit might be slightly less stupid than they were last year ... sigh .. here's to hope ... [225005120030] |(cropped image from Huffington Post) It has long been a grand temptation to use simple word frequency* counts to judge a person's mental state. [225005120040] |Like Freudian Slips, there is an assumption that this will give us a glimpse into what a person "really" believes and feels, deep inside. [225005120050] |This trend came and went within linguistics when digital corpora were first being compiled and analyzed several decades ago. [225005120060] |Linguists quickly realized that this was, in fact, a bogus methodology when they discovered that many (most) claims or hypotheses based solely on a person's simple word frequency data were easily refuted upon deeper inspection. [225005120070] |Nonetheless, the message of the weakness of this technique never quite reached the outside world and word counts continue to be cited, even by reputable people, as a window into the mind of an individual. [225005120080] |Geoff Nunberg recently railed against the practice here: The I's Dont Have It. [225005120090] |The latest victim of this scam is one of the blogging world's most respected statisticians, Nate Silver who performed a word frequency experiment on a variety of U.S. presidential State Of The Union speeches going back to 1962 HERE. [225005120100] |I have a lot of respect for Silver, but I believe he's off the mark on this one. [225005120110] |Silver leads into his analysis talking about his own pleasant surprise at the fact that the speech demonstrated "an awareness of the difficult situation in which the President now finds himself." [225005120120] |Then, he justifies his linguistic analysis by stating that "subjective evaluations of Presidential speeches are notoriously useless. [225005120130] |So let's instead attempt something a bit more rigorous, which is a word frequency analysis..." [225005120140] |He explains his methodology this way: [225005120150] |To investigate, we'll compare the President's speech to the State of the Union addresses delivered by each president since John F. Kennedy in 1962 in advance of their respective midterm elections. [225005120160] |We'll also look at the address that Obama delivered -- not technically a State of the Union -- to the Congress in February, 2009. [225005120170] |I've highlighted a total of about 70 buzzwords from these speeches, which are broken down into six categories. [225005120180] |The numbers you see below reflect the number of times that each President used term in his State of the Union address. [225005120190] |The comparisons and analysis he reports are bogus and at least as "subjective" as his original intuition. [225005120200] |Here's why: [225005120210] |
  • We don't know what causes word frequencies.
  • [225005120220] |
  • We don't know what the effects of word frequencies are.
  • [225005120230] |
  • His sample is skewed.
  • [225005120240] |
  • Silver invented categories that have no cognitive reality.
  • [225005120250] |
  • There are good alternatives.
  • [225005120260] |We don't know what causes word frequencies.Why does a person use one word more than another? [225005120270] |WE. [225005120280] |DON'T. KNOW. [225005120290] |I understand the simple intuition that this should mean something, but no one actually knows what it means. [225005120300] |We simply don't understand the workings of the brain well enough to study the speech production system well enough to answer this question (despite these guys' suspect claims). [225005120310] |So we are left with pure intuition (which is generally bad in the cognitive sciences because we don't think the way we think we do). [225005120320] |So, again, this methodology is not "objective" as Silver claims (not the simplistic way he implemented it, anyway). [225005120330] |We don't know what the effects of word frequencies are.The correlate to #1: When a person hears another person use one word more than another, what effect does it have? [225005120340] |WE. [225005120350] |DON'T. KNOW. [225005120360] |Same reasons as above. [225005120370] |This remains the realm of intuition and guesswork. [225005120380] |His sample is skewed. [225005120390] |While I understand that to the lay person, the set of SOTU speeches seems like a coherent category to analyze, it is in fact a linguistically incoherent grouping because these sorts of speeches are constructed slowly, painfully, over time, by teams of individuals, NOT spoken extemporaneously by a single individual. [225005120400] |Silver could spin this as a positive in the sense that the speeches represent presidential administrations as a whole, but this makes the "evidence" (i.e., word frequency) extremely messy. [225005120410] |What factor is driving the frequency of a particular word in a speech? [225005120420] |No clue. [225005120430] |The variables are numerous and unknown (two bad things for "rigorous" analysis). [225005120440] |Having such a messy data set makes interpretation nearly impossible even if we DID know the answers to #1 and #2 (which we don't). [225005120450] |Silver invented categories that have no cognitive reality. [225005120460] |Silver's 70 buzzwords are shoved into six arbitrary categories. [225005120470] |Linguists have bee keen on word categories for ... well ... let's say at least 2500 years. [225005120480] |This we care about. [225005120490] |Deeply. [225005120500] |William Labov famously wrote, "If linguistics can be said to be any one thing it is the study of categories" (full text here). [225005120510] |More recently, in the last few decades, linguists have expanded their repertoire of tools for analyzing lexical categories using psycholinguistic, cognitive linguistic, and computational linguistic tools and methods. [225005120520] |None of these were employed by Silver in determining whether or not his six categories have any coherence or cognitive reality. [225005120530] |He just made them up. [225005120540] |How is this MORE objective than intuition? [225005120550] |There are alternatives.Let me be clear. [225005120560] |I am a fan of corpus linguistics. [225005120570] |Counting words is good (as Nunberg says, and as many linguists say. [225005120580] |We like this). [225005120590] |But this is just the beginning of a long road of analysis. [225005120600] |It must be done in a systematic and sophisticated way to be of any use. [225005120610] |There are numerous software tools and methodologies that Silver could have made use of that would have given him a more nuanced analysis. [225005120620] |There are whole books that teach people how to do this, such as Corpora in Cognitive Linguistics (just one of many). [225005120630] |Again, I have a lot of respect for Silver and his advanced skill set in stats. [225005120640] |I would love to see Silver bring the full weight of his skills to bear on linguist analysis (as I've said, every linguist should study math and stats), but this experiment falls far short of the mark and he should know better. [225005120650] |To a certain extent, this critique is unfair to Silver because he implicitly seemed to be acknowledging many of these deficits. [225005120660] |All he wanted to do was get a more objective picture of what the SOTU speech meant and how it fits into a bigger picture. [225005120670] |On the other hand, it's a fair critique because he put in a lot of effort and posted the results to his popular and influential blog (yes, I note my blog is neither); one ought not to waste such effort. [225005120680] |There is the glaringly negative possibility that his popularity and influence as a statistician will actually serve to further strengthen the popular but wrong notion that simple word counts are somehow meaningful. [225005120690] |This would be bad. [225005120700] |*By "simple word frequency counts" I mean counting the words a person uses (say, in a speech) without counting anything else or adding any other data to give the frequency counts meaning and context.