[225002970010] |The Audrey Fino Failure [225002970020] |Steven Levy has a new article out on Google's search algorithm (HT Boing Boing). [225002970030] |It has a brief discussion of the problem of parsing n-grams (e.g., how do you know what Times goes with in "New York Times" vs "New York Times Square"). [225002970040] |I spent a brief time working with a person name parsing group and they were just branching out into the business name parsing field while I was there, so I know how challenging this is (you noted how I just helped you with italics, right, hehe). [225002970050] |Unfortunately levy's article is actually quite a light weight puff piece of the "gee wiz, ain't Google swell" variety. [225002970060] |Anyone who has spent some time in a morphology class or computational linguistics 101 course will likely find it simplistic at best. [225002980010] |"Gay" vs "Homosexual" [225002980020] |Chris Good at The Atlantic contributes to the discussion that American opinion poll results about DADT are strongly tied to the wording used to describe the sexual orientation of the individuals affected. [225002980030] |Money quote: [225002980040] |Marc has noted that there's a nomenclature issue at play: gays in the military poll a lot better as "gays" in the military, while people don't seem to like "homosexuals" serving as much. [225002980050] |The above phenomenon in CNN's results probably furthers that point--personal opposition to "homosexual relationships" doesn't mean opposition to letting "people who are openly gay or lesbian" serve--but it's hard to see CNN's results not expressing a willingness, on the part of some, to put aside personal moral feelings in their support of a Don't Ask, Don't Tell repeal. [225002980060] |Language Log recently discussed this same issue: Words and opinions. [225002980070] |Nate Silver' has also discussed the issue: Republicans are Conservative -- but are they this Conservative? [225002990010] |The Linguistics Of Urine [225002990020] |A nice discussion of the origin of the phrase piss poor over at The Grammarphobia blog. [225002990030] |Money Quote: [225002990040] |The word "piss" here is "an intensifier, usually implying excess or undesirability," according to the Oxford English Dictionary. [225002990050] |The usage originated in the United States in the mid-20th century. [225003000010] |"Welfare" vs "Aid to The Poor" [225003000020] |More discussion of how wording affects polling results. [225003000030] |Unfortunately, as Liberman has pointed out, none of this addresses the fundamental question of why. [225003000040] |Why do the words "homosexual" and "welfare" cause more negative polling results than "Gay Men &Lesbians" and "aid to the poor"? [225003000050] |My own weak attempt at a first pass answer (in the LL comments) is that "in both cases cited ("Homosexuals" vs. "Gay Men &Lesbians" &"welfare" vs. "caring for the poor"), the first, seemingly more controversial term is a single word and the second is a phrase. [225003000060] |It may be the case that we silly humans find it easier to attach strong emotional semantics to a single lexical item. [225003000070] |One could imagine a study that looked at the role syntactic heaviness plays in survey response." [225003000080] |It may also be the case that longer phrases are harder to categorize. [225003010010] |City of Languages Game [225003010020] |The Goethe Institute gets all 21st century on our asses: City of Languages (and yes, I'm alluding to It's Always Sunny in Philadelphia). [225003010030] |HT: Giselda dos Santos (via Twitter #linguistics) [225003020010] |Neuro-gestures [225003020020] |Are gestures and words all the same to the brain? [225003020030] |According to this article, yes. [225003020040] |I haven't had time to review it yet, but it's a tantalizing morsel. [225003020050] |Of course, the fact that's it's a Business Week article does not bode well. [225003020060] |We'll see. [225003020070] |Money Quote (for now): [225003020080] |But new research, co-authored by Patrick J. Gannon, a physical anthropologist and chairman of basic science education at Hofstra University School of Medicine, suggests that the brain doesn't really care how it receives information. [225003020090] |A waving hand up in the air to summon a waiter for "check please" works just fine. [225003020100] |The language areas of the brain -- the highly evolved frontal and temporal lobes -- process simple gestures with the same snippet of tissue that's used to hear the prose of Shakespeare, according to Gannon's study. [225003030010] |How Many Linguists Are There? 5379. [225003030020] |Previously, I ranted, just a bit, about the suggestion that there are more linguists than languages. [225003030030] |I guessed that, in fact, this may not be true. [225003030040] |Thanks to the LSA update email that was just sent out, I was able to follow up a bit. [225003030050] |That email referenced the results of the American Academy of Arts and Sciences’ survey of linguistics departments (pdf, it doesn't load every time, so repeated clicking might be warranted). [225003030060] |Table LN1 (below) gives an estimated 1630 faculty member in linguistics departments across the United States. [225003030070] |That strikes me as a fair base to start a back-of-the-napkin estimation of total linguists worldwide (I noted in my previous rant the problems with defining a linguist, but I'll take this survey as my authority for now). [225003030080] |Let the number games begin. [225003030090] |First, let's assume that this initial estimation is conservative. [225003030100] |I'll throw in another 10% to make up for that. [225003030110] |Let's assume there are about 1793 linguists in the US. [225003030120] |I think it's fair to assume there are about as many linguists in Europe (though you'd never know it by the poor rate at which American linguists cite Europeans, but that's another rant). [225003030130] |So that's another 1793 for Europe. [225003030140] |I'd wager that there are at best an equal number of linguists in the rest of the world as in either the States or Europe, so that's another 1793. [225003030150] |By this estimation, there are approximately 5379 linguists in the world (1793 x 3). [225003030160] |That sounds about right to me. [225003030170] |And if this is correct, then my original point stands, there are NOT more linguists than languages. [225003040010] |Qatar, Rhymes With Butter .. or Susan [225003040020] |UPDATE: Fivethirtyeight recently (12.02.2010) tweeted about ESPN broadcasters pronouncing Qatar and linked to this site with variations, neither of which rhyme with butter, hehe: howjsay.com. [225003040030] |I've noticed a lot of Americans pronouncing the country name Qatar as something like kutter* ([kʌɾɚ]). [225003040040] |This is particularly true of US military personnel serving in Iraq who are regularly traveling through there, but I also just heard it on teevee by ESPN's Chris Fowler referencing a tennis tournament in Doha, Qatar. [225003040050] |As a native speaker of American English, I don't think my default pronunciation assignment of the alphabetic string Q-a-t-a-r would be [kʌɾɚ]. [225003040060] |If I were presented with the string of Romanized letters Q-a-t-a-r for the first time, I think my first attempt at a pronunciation would be somewhat closer to American English guitar, something like [kʌt'ɑːʳ]**. [225003040070] |So why do so many Americans use this non-standard, may I say, deviant, pronunciation? [225003040080] |First, I suspect that the soldiers and sports announcers flowing through the region have little confidence in their own default reading of the Romanized letters, so they willingly mimic whomesoever says the name first, and then, heck, that's how you say it. [225003040090] |It's a nice example of follow-the-leader linguistics. [225003040100] |But why is the dominant American pronunciation of Qatar →[kʌɾɚ] to begin with? [225003040110] |It's a nice "proximate cause" question. [225003040120] |And my answer is??? [225003040130] |My disappointing answer is this: I'm not sure. [225003040140] |But my more intriguing answer is that it may have something to do with the fact that most Americans going through Qatar are military personnel. [225003040150] |And most military personnel are wont to use highly Americanized pronunciations of foreign names. [225003040160] |Almost pathologically so. [225003040170] |It's akin to trash talking, imho. [225003040180] |Let's face facts. [225003040190] |Throughout history, when people from one powerful region enter another, less powerful region, it is common for the dominant culture to emasculate the locals and trivialize the foreign culture. [225003040200] |And this emasculation is often linguistic. [225003040210] |Like a cultural domination machine, taking local words and names and running them through a linguistic meat grinder is a way to gain some control over a group. [225003040220] |I'm reminded of The Tick's use of intentional mispronunciations of the Thrakkorzog's name (thank gawd for YouTube and IMDB, right?). [225003040230] |Tick: It's your turn now, Thorace-bog. [225003040240] |Thrakkorzog: It's Thrakkorzog. [225003040250] |Thrakkorzog. [225003040260] |With a K. Tick: We're only serving humble pie, Whatchamazog. [225003040270] |Thrakkorzog: For the last time, it's... [225003040280] |Tick: Thorax-and-a-bog. [225003040290] |Four-yacks-and-a-dog. [225003040300] |Thrakkorzog: No. Tick: Ah, laxative-log. [225003040310] |Thrakkorzog: No, no, no. Tick: Sapsucker-frog. [225003040320] |Thrakkorzog: Thrakkorzog. [225003040330] |Tick: Susan? [225003040340] |Thrakkorzog: Now you're doing it on purpose. [225003040350] |How juvenile. [225003040360] |*Rhymes with American English butter ... [225003040370] |I'm not in love with [ɚ] as a representation, but I guess we need SOMETHING for that seriously weird thing that ends such words, so okay, [ɚ] it is. [225003040380] |I'll leave the intriguing issues of how best to represent rhotics to real phoneticists, far better skilled than I to tease apart the deep problems of intervocalic apical stops and rhotacized schwas. [225003040390] |**For the record, Wikipedia gives [ˈ ɑtˤɑɼ] as the pronunciation of Qatar, but ultimately this has little impact, if any, on my point. [225003040400] |Local and "official" pronunciations are simply ignored. [225003040410] |That's my point. [225003040420] |Note for HTML geeks: The amount of effort it would have taken me to fix the font disparity in this post was, simply put, not worth it. [225003040430] |Let it stand. [225003070010] |Grammar Myths Debunked [225003070020] |Motivated Grammar debunks ten grammar myths in honor of National Grammar Day. [225003070030] |Additionally, he honors the better spirit of linguistics by linking to "two papers that really made me fall in love with the field." [225003070040] |I thought that was a nice idea so I'll pick up his cue and link to a couple that got my own linguistics juices flowing back-in-the-day. [225003070050] |
  • Women, Fire, and Dangerous Things by George Lakoff
  • [225003070060] |
  • Everything that Linguists have Always Wanted to Know about Logic . . . [225003070070] |But Were Ashamed to Ask by James D. McCawley
  • [225003070080] |
  • Constructions: A Construction Grammar Approach to Argument Structure by Adele E. Goldberg
  • [225003070090] |Cheers! [225003080010] |Really! Really? Really. [225003080020] |Dear Netflix, is The Importance Of Being Earnest really a "Cerebral Drama"? [225003080030] |Really! [225003080040] |Really? [225003080050] |Really. [225003080060] |I'm not sure your recommender system ever actually read Oscar Wilde. [225003080070] |Pssst, for context on The Three "reallys" Construction see my comment on LL and repeated below: [225003080080] |The Three "reallys" Construction is strictly a spoken construction, as far as I can tell, so I can't do much of a search, but it's common in sitcoms and very commonly used in casual setting amongst friends when a person is faced with a situation that is (1) surprising, (2) intractable. [225003080090] |The three "reallys" provide a cascaded enunciation of the cline from genuine surprise to complete defeatism (i.e., the person realizes there's nothing they can do about the situation). [225003080100] |really 1 = interjection like wow expressing internal surprise. really 2 = interrogative, actually questioning the other person wrt the situation. really 3 = expression of defeat (i.e., I give up). [225003090010] |Linguistic Genocide? [225003090020] |I have ... um ... complicated beliefs about language death. [225003090030] |Nonetheless, I thought this was a list worth reading: 10 Modern Cases of Linguistic Genocide. [225003090040] |HT: celiabcn, via twitter #linguistics. [225003100010] |Hypercorrect Substitutions [225003100020] |I got my morning cuppa joe from a Green Beans Coffee shop at The Great Place on my first day of three weeks in wonderful central Texas. [225003100030] |While sipping ... okay, fine ... gulping my java I noticed the sleeve had the following quote: [225003100040] |Myself and many of the Naval sailors I work with have all had your coffee and love it. [225003100050] |The linguist in me couldn't help but notice that this was a beautiful example of hypercorrection(1). [225003100060] |I also couldn't help wonder why the simple syntactic test of substitution isn't better understood by the average person. [225003100070] |It's such a simple idea, any 6th grader could master it. [225003100080] |The idea is that, when faced with a grammar choice you are unsure of, you simply ask yourself, what else could I put in its place and how does that help me make my choice? [225003100090] |So, here we have a complex subject (i.e., a subject with two NPs): [225003100100] |
  • X and Y have all had your coffee
  • [225003100110] |Where X refers to the speaker and Y = many of the Naval sailors I work with and the decision is what form of personal pronoun is appropriate for X. If we ask ourselves, what if this were a simple subject composed only of X, what form of the personal pronoun would we chose? [225003100120] |
  • I have had my coffee
  • [225003100130] |
  • Myself have had my coffee*
  • [225003100140] |At this point, the decision is quite obvious, isn't it? [225003100150] |But wait! [225003100160] |I'm no prescriptivist, Certainly I must have some descriptivist point to make, mustn't I? [225003100170] |But of course I have dahling. [225003100180] |The point that is so interesting is that this choice is made automatically by our human language system in ways that are, really, quite baffling. [225003100190] |Exactly what is going on under the hood here is remarkably interesting. [225003100200] |Jeff Runner did lots of really cool psycholinguistics stuff with reflexives and Barbie dolls, once upon a time, but I couldn't find anything freely available (shame on you Jeff, gimme free stuff!). [225003100210] |But the point is that most people are confused by reflexives. [225003100220] |They're weird little beasts (and they only got weirder when Ken and Barbie got involved). [225003100230] |(1) Try as I might, the search function over at Language Log quite thoroughly stymied me when I searched for hypercorrection and hyper-correction, though I'm quite certain the folks at LL have posted about it a non-trivial number of times. [225003100240] |C'est la vie. [225003110010] |Is Language Death All That Bad? [225003110020] |John McWhorter echoes some of my previous musings ... [225003110030] |I like this guy: [225003110040] |Yet the going idea among linguists and anthropologists is that we must keep as many languages alive as possible, and that the death of each one is another step on a treadmill toward humankind’s cultural oblivion. [225003110050] |This accounted for the melancholy tone, for example, of the obituaries for the Eyak language of southern Alaska last year when its last speaker died. [225003110060] |That death did mean, to be sure, that no one will again use the word demexch, which refers to a soft spot in the ice where it is good to fish. [225003110070] |Never again will we hear the word 'ał for an evergreen branch, a word whose final sound is a whistling past the sides of the tongue that sounds like wind passing through just such a branch. [225003110080] |And behind this small death is a larger context. [225003110090] |Linguistic death is proceeding more rapidly even than species attrition. [225003110100] |According to one estimate, a hundred years from now the 6,000 languages in use today will likely dwindle to 600. [225003110110] |The question, though, is whether this is a problem (emphasis added). [225003110120] |This guy needs to read my own most excellent ramblings: [225003110130] |Is language death a separate phenomenon from language change? [225003110140] |
  • In terms of linguistic effect, I suspect not
  • [225003110150] |Are there any favorable outcomes of language death? [225003110160] |
  • I suspect, yes
  • [225003110170] |How do current rates of language death compare with historical rates? [225003110180] |
  • Nearly impossible to tell
  • [225003110190] |What is the role of linguists wrt language death? [225003110200] |
  • One might ask: what is the role of mechanics wrt global warming?
  • [225003110210] |HT i09 via Twitter's #linguistics. [225003120010] |Auto-detecting Language [225003120020] |Why doesn't Google's translation tool automatically detect the language I paste in? [225003120030] |This is not a terribly difficult problem to solve computationally. [225003120040] |I suspect that if they took a bag o' trigrams (of characters, that is) and compared to a corpus using some kind of simple tf–idf weight, they'd get a pretty high degree of accuracy. [225003120050] |Here are some distinctive trigrams from a page on Omniglot. [225003120060] |Wanna guess the language based solely on these? [225003120070] |I doubt it will be difficult. [225003120080] |And I suspect that just one or two of these trigrams is distinctive enough to make an accurate guess. [225003120090] |
  • änn
  • [225003120100] |
  • isk
  • [225003120110] |
  • a_m
  • [225003120120] |
  • är_
  • [225003120130] |
  • föd
  • [225003120140] |
  • och
  • [225003120150] |
  • vär
  • [225003120160] |UPDATE: thanks to the cemmentators for schooling me on this. [225003120170] |In fact, Google DOES have a detect language function. [225003120180] |I've been trying to find documentation on their methods but haven't had much luck. [225003120190] |I did find this discussion of a different language detector that works rather differently than I proposed. [225003120200] |Rather than compare trigrams of letters to language models, it looks up whole words in dictionaries. [225003120210] |While I admit to the greater simplicity of this method, I think my idea is more betterer 'cause it's more linguisticy. [225003120220] |Notes on my searching: [225003120230] |
  • Lots of programming language detecting tools.
  • [225003120240] |
  • Several human language detecting tools, but few discussed methodology
  • [225003130020] |Some advice from Mankiw about prospective grad students choosing where to go. #8 struck me as critical for any and all students: [225003130030] |Don't be distressed if you did not get into your top choice. [225003130040] |What you do in graduate school (or college) is far more important than where you go. [225003130050] |Your personal drive matters more than ranking of the school you attend. [225003140020] |A software blogger takes on the heady task of defining categories. [225003140030] |It's not clear to me if this is prescriptivist poppycock, naive descriptive lexicography, or wishful thinking: The Difference Between A Developer, A Programmer And A Computer Scientist. [225003140040] |I made my own foray into this world here: Computational Linguistics vs. NLP. [225003140050] |HT zelandiya (via #linguistics) [225003150010] |Is There A Disfluency Gap? [225003150020] |Watching the health care debate on C-SPAN I find Nancy Pelosi's speaking style to be jarringly disfluent, at least as much so as George W. Bush's ever was (or Sarah Palin for that matter) yet I don't recall Pelosi being as criticized as they were. [225003150030] |My hotel internet connection is not fast enough for me to YouTube around for examples of Pelosi speaking extemporaneously, but I suspect you can find these examples easily and I suspect you'll see what I mean. [225003150040] |Is this a partisan issue? [225003150050] |Are Republicans more likely to be criticized for speech errors than Democrats? [225003150060] |The folks at Language Log have discussed the politics of speech errors many times (see THIS post which includes links to many others) and it's worth quoting Liberman: "Everyone commits speech errors...and anyone who makes a big deal about particular examples is either a fool or a hypocrite." [225003150070] |My gut reaction is that there are many fools and hypocrites reporting on our politicians ... surely I am the first to uncover this rare gem of insight. [225003150080] |NOTE: I make no political point by bringing this up other than to ask if there is a statistical difference between the likelihood that a Republican figure will be criticized for speech errors and the likelihood that a Democrat will be criticized for speech errors. [225003150090] |My intuition is that there is a difference, and that difference leans towards Republicans being more likely to be criticized. [225003150100] |I caution the reader against trying to infer my own political beliefs from this post. [225003160010] |Still No 'moist' [225003160020] |A Twitter challenge to list "the ugliest words in the English language" at #uglish. [225003160030] |I vote for uglish. [225003170010] |Doh! Nut Metaphors [225003170020] |Neal Whitman deconstructs the recent doughnut hole metaphor buzzing around the health car reform debate. [225003170030] |Money quote: [225003170040] |... reading about the doughnut hole in the newspaper or hearing about it on the radio, I kept having a feeling I wasn't understanding something. [225003170050] |It was when I called upon my real-world knowledge of doughnut structure that I finally realized it wasn't the issue itself that was troubling me, but the choice of metaphor. [225003170060] |Figure 2 shows a typical donut. [225003170070] |We can observe that it is a glazed, cake doughnut, without sprinkles. [225003170080] |We can also see that the gap in coverage from Figure 1 corresponds not to the doughnut hole, but to the sweet, cakey goodness of the doughnut itself. [225003170090] |He suggest a castle moat as an alternative metaphor. [225003170100] |And yes, he really does use two different spellings: doughnut and donut. [225003170110] |This may reflect an underlying ambivalent on his part, but I suspect it's just a good case of legitimate spelling change that hasn't fixed upon a final form. [225003170120] |I suspect we'll all use donut soon enough. [225003190010] |tschüß [225003190020] |Thanks to a desperate need to brush up on my German (i.e., was thoroughly embarrassed at a German meet-up in NOVA), I just discovered that the German farewell tschüß is a cognate of French adieu (I know, right?). [225003190030] |Wiktionary's explanation: From Low Saxon, from Walloon adjüs (the equivalent of adieu in French). [225003200010] |On Statistical Anomalies [225003200020] |(the table lists Hand #, Table Name, My Hole cards, Winner, Pot) Having nothing to do with linguistics, I challenge my fellow online poker player Nate Silver to walk through the probability that I would be dealt pocket 22, 33, 44 successively in NLHE. [225003200030] |I have proof positive that it happened (see image above). [225003200040] |And I note that the probability of being dealt any three pairs in a row should be the same as the probability of being dealt three consecutive pairs; it's us silly humans who care about the difference between 22 and KK, not the poker gods. [225003210010] |It's My Bar Of Chocolate! [225003210020] |I'm having a Veruca Salt moment. [225003210030] |All I want is to read a paper in Cognition, but the dirty bastards at Elsevier have locked it up behind a big dirty wall. [225003210040] |Having left the sweet comfort of The University, my greatest frustration is not having access to papers and data that I used to take for granted. [225003210050] |This is the 21st Century people. [225003210060] |There's lots of free linguistics stuff out there (just look at my own most excellent list of resources to the right). [225003210070] |Everything is supposed to be free. [225003210080] |Google said so, and I believe them. [225003210090] |This goes for you too LDC with all that sweet delicious data locked up behind $$ signs. [225003210100] |Now give me everything I want right now. [225003210110] |To quote my hero: I want the worksI want the whole worksPresents and prizes and sweets and surprisesOf all shapes and sizesAnd nowDon't care howI want it nowDon't care howI want it now [225003220010] |John’s grandmother feeds the monkey every morning [225003220020] |There's a brief and shallow puff piece out discussing new research about differences in how the brain processes word order versus inflection with the absurd title Languages use different parts of the brain. [225003220030] |Even if you know nothing about linguistics you can quickly determine that the title is absurd because the article itself admits that the study involved used only ONE language! [225003220040] |This was not a cross-linguistic study. [225003220050] |It says nothing about what parts of the brain different languages use. [225003220060] |The author makes the leap of logic assuming that (A) because languages can be typed according to their morphology (fusional, agglutinating, etc) that (B) therefore languages that are predominantly agglutinating must be processed differently than fusional languages. [225003220070] |Nope. [225003220080] |The study did not show this. [225003220090] |The research paper which spawned this puff piece is Dissociating neural subsystems for grammar by contrasting word order and inflection Aaron J. Newmaa, Ted Supalla, Peter Hauser, Elissa L. Newport, and Daphne Bavelier, but it's behind a firewall, of course. [225003220100] |As far as I can tell from the abstract, the researchers used sign language stimuli to discover that sentences which relied on word order to convey case information activated different patterns in the brain than sentences using inflections (which the puff piece quaintly calls "tags"). [225003220110] |From the abstract: [225003220120] |During functional (f)MRI, native signers viewed sentences that used only word order and sentences that included inflectional morphology. [225003220130] |The two sentence types activated an overlapping network of brain regions, but with differential patterns. [225003220140] |Word order sentences activated left-lateralized areas involved in working memory and lexical access, including the dorsolateral prefrontal cortex, the inferior frontal gyrus, the inferior parietal lobe, and the middle temporal gyrus. [225003220150] |In contrast, inflectional morphology sentences activated areas involved in building and analyzing combinatorial structure, including bilateral inferior frontal and anterior temporal regions as well as the basal ganglia and medial temporal/limbic areas. [225003220160] |These findings suggest that for a given linguistic function, neural recruitment may depend upon on the cognitive resources required to process specific types of linguistic cues. (emphasis added). [225003220170] |The final sentence of the abstract is compelling as it makes a claim about neural recruitment and cognitive resources. [225003220180] |NOT about different languages using different parts of the brain! [225003220190] |There are some respected linguistics on the author list, so I suspect the paper worth reading (if they would let me, that is!). [225003220200] |But the original puff piece did provide two of the stimuli: [225003220210] |
  • John’s grandmother feeds the monkey every morning.
  • [225003220220] |
  • The prison warden says all juveniles will be pardoned tomorrow.
  • [225003220230] |Psycholinguistics stimuli are often funny because they need to be constructed to contain very specific features, so I can forgive them these awkward sentences, but really? [225003220240] |They couldn't have gramma feeding a dog? [225003220250] |It had to be a monkey? [225003220260] |Hmmmmm. [225003220270] |Probably has something to do with the inflections for nouns, but c'mon, a monkey? [225003220280] |Sounds down right lewd. [225003240010] |and a thousand new dissertations were born... [225003240020] |The U.S. Library of Congress will be creating "a digital archive of Twitter as a historical record." [225003240030] |Money quote: [225003240040] |In an extraordinary agreement with Twitter's founders, the Library of Congress –the world's largest library and America's oldest federal institution –is to create a digital archive of the several billion tweets publicly posted on the social networking site since its inception in 2006. [225003240050] |Sounds like one deeeeeeelicious linguistic corpus to me. [225003240060] |Me want. [225003250010] |Word Frequency Lists [225003250020] |Mark Davies and company over at BYU have released quite a collection of English word frequency data HERE. [225003250030] |Here's a taste: [225003250040] |Our data is based on the only large, genre-balanced, up-to-date corpus of American English -- the 400 million word Corpus of Contemporary American English. [225003250050] |You can be sure that the words in these lists and in this dictionary -- sorted from most to least frequent -- are really the most common ones that you will encounter in the real world. [225003250060] |The frequency data comes in a number of different formats: [225003250070] |
  • An eBook containing up to the 20,000 most frequent words, along with the 20-30 most frequent collocates (nearby words) and the synonyms for each word -- which provide valuable insight into meaning and usage.
  • [225003250080] |
  • A printed book (from Routledge) with the top 5,000 words (including collocates) and thematic lists.
  • [225003250090] |
  • Lists with the top 200-300 collocates for each of the 20,000 words, giving more than 4,300,000 node word / collocate pairs
  • [225003250100] |
  • Simple word lists of the top 10,000 or 20,000 words, but without collocates or synonyms.
  • [225003250110] |
  • A free word list -- top 5,000 words, but no collocates or synonyms.
  • [225003250120] |
  • N-grams: more than 155 million trigrams, which can be queried by word form, lemma, part of speech, etc
  • [225003260010] |Boring Volcanoes [225003260020] |While debating the pronunciation of Eyjafjallajökull has been all the rage in the blogosphere (see here), a more ominous threat has emerged, the eminent reuption of the great and powerful Katla! ...yeah, my reaction too. [225003260030] |Somehow, the pronunciation difficulty of Eyjafjallajökull added to its pop cultural caché. [225003260040] |I fear Katla, regardless of the might of its wrath, will suffer a sort of pop cultural Marsha Marsha Marsha syndrome. [225003260050] |For what it's worth (not much), Wikipedia's pronunciation is here. [225003270010] |Syntactic Structures of the World's Languages [225003270020] |A new free online resource for linguists: Syntactic Structures of the World's Languages. [225003270030] |I haven't had time to play around with it, but the list of contributors is impressive.Money quote: [225003270040] |SSWL is a searchable database that allows users to discover which properties (morphological, syntactic, and semantic) characterize a language, as well as how these properties relate across languages. [225003270050] |This system is designed to be free to the public and open-ended. [225003270060] |Anyone can use the database to perform queries. [225003270070] |Emphasis added (yes, that's for you LDC, haha). [225003270080] |(HT WordAficionada via Twitter #linguistics) [225003280010] |When Is Bilingualism Bad? [225003280020] |When it's a litmus test for Supreme Court nominees, and Canada might go there: Linguistics above knowledge. [225003280030] |Money quote: [225003280040] |If the Senate does not defeat it, Bill C-232 will amend the Supreme Court Act to insist that all future appointees to our highest court be fluently bilingual, and not just fluent in conversational French and English, but in both official legalistic languages. [225003280050] |It will make it a prerequisite for justices to be able to hear all cases without the aid of translation. [225003280060] |In practical terms, the bill will restrict appointment to a very small number of bilingual legal scholars and lower-court judges. [225003280070] |It will make it difficult for Canadians outside a narrow strip from Ottawa, through Montreal and Quebec City, and into Moncton, to ever be appointed to the court that has the final say over how the Charter will be interpreted and what rights we may have. [225003280080] |I don't know what the chances are that this Canadian bill passes, but the article suggests it's highly likely. [225003280090] |HT:morsmal via Twitter #linguistics). [225003290010] |On The Campus Frame [225003290020] |(UT Austin's Main Building) On Saturday morning, I found the above sign pragmatically odd. [225003290030] |Wondering down Guadalupe that morning after my latte at The Hideout (and wishing I'd known about the Texas Round-Up 5k ahead of time so I could have run), I decided to check out the UT Austin campus. [225003290040] |The morning was a gloriously sunny 70 degrees, no clouds or wind, and I love exploring college campuses. [225003290050] |UT has a nice, almost stereotypical layout with large academic buildings, rolling hills, stone staircases, the large football stadium to the West, and the UT Austin Tower ominously presiding over all. [225003290060] |My meandering tour brought me up a series of stairs to the face of the tower's building. [225003290070] |Academic buildings tend to be named after people (e.g., the building next to the tower is called the Dorothy L. Gebauer Building). [225003290080] |But when I walked up to the tower building's sign, all I found was a pragmatics puzzle: Main Building. [225003290090] |I snapped the pic above and strode over to Caffé Medici to ruminate on why I found this sign so pragmatically odd. [225003290100] |It is, in fact, less obscure than Dorothy L. Gebauer, right? [225003290110] |Quite straight forward. [225003290120] |This is one building amongst many which serves as some sort of center point for activity. [225003290130] |First among equals, to borrow a term from the political realm. [225003290140] |This should be a perfect instantiation of FrameNet's Locale_by_use frame (of which campus is in fact a lexical unit) whereby the NP Main Building evokes a Constituent_part ("Salient parts that make up a Locale") of a Locale (A stable bounded area). [225003290150] |But why did did I find it odd? [225003290160] |After lunching at Veggie Heaven (and escaping a near death experience crossing Lavaca), I could only come up with the suspicion that the high frequency of person names for academic building trumps the logic of the frame model. [225003290170] |In other words, I accept that there probably exists some cognitively real conceptual object roughly equivalent to a frame, and our human language system uses frames in some way to build a semantic representation of an input like Main Building in order draw inferences about the role of that object in some state-of-affairs; nonetheless, if objects within that state-of-affairs have a statistically significant tendency to be named using highly specific non-functional terms, then a building with a general and functional name will stand apart as somehow not a proper member of the state-of-affairs. [225003290180] |Membership in the group is NOT determined by its role in a frame, but rather by its similarity to other members of the group. [225003290190] |I'm reminded of the beer from Repo Man: [225003290200] |(image from qbn.com) This generic BEER (which was, ever so briefly, a real product in American stores in the early 1980s) never quite took hold. [225003290210] |It just didn't fit. [225003290220] |I suspect BEER is a nice example of monopolistic competition. [225003290230] |They flouted the need to distinguish their nearly identical product in a tough competitive market, hoping their floutestation alone would distinguish it (yep, I made that word up and I'm sticking with it). [225003290240] |It would, however, take some logical flips and leaps to make the connection to the Main Building example (not saying there ain't a cognitive connection, just sayin I'm a lazy blogger). [225003290250] |Phew! [225003290260] |That took a lot of words to state the obvious...and explaining the card game frame necessary to understand my use of a trumps is another post entirely. [225003290270] |NOTE: Yes, I challenged myself to include as many Austin sites as possible in this post. [225003290280] |Just 'cause I've been spending the last few weekend sin Austin. [225003290290] |But rest assured, my morning followed almost exactly this story. [225003290300] |BTW: What the hell is that image on the banner of UT Austin Linguistics homepage? [225003290310] |Is that an FSA leading into a spectrogram? [225003290320] |Huh? [225003290330] |If yes, shouldn't the nodes have state labels and the arcs have transition labels? [225003290340] |And why does the final node transition to the little stop image? [225003290350] |Oh yeah, and I really hate this: (hint, see source for HTMl code). [225003300010] |Text Messaging and Language Use Survey [225003300020] |Brennan Gamwell, a student at Georgetown, has posted on online survey for language and text messaging HERE. [225003310010] |How do you feel this bar? [225003310020] |English speaker walks into a bar in China hoping to practice his Chinese*. [225003310030] |Chinese waiter walks up to the gweilo hoping to practice his English, and the game begins. [225003310040] |A lingo-blogger takes on the heavy challenge of analyzing this linguistic power struggle in a post on sinosplice. [225003310050] |In classic linguistic fashion, he devises a rule: [225003310060] |John's Rule For Determining Language: Given a conscious choice between a number of languages to use for interaction, speakers will naturally tend to choose the common language in which the poorer speaker’s level is highest. [225003310070] |John wiggles by stipulating that "there’s no strict right or wrong here" (all linguistic "rules" require that same stipulation, haha, so what the hell's the point of a rule!!). [225003310080] |But John uses this rule to define a linguistic strategy: "if I want to improve my Chinese without all this strife, I need to find Chinese speakers with English worse than my Chinese." [225003310090] |While John evokes communication efficiency as his basis for this strategy, he misses a crucial factor: appropriateness. [225003310100] |It's not really appropriate for a customer to use a waiter for language practice, and vice versa. [225003310110] |Even though it's effective for language learning purposes, that's just not why bars exist. [225003310120] |Once John as customer violates the appropriateness, he's all but invited that waiter to do the same. [225003310130] |At that point, all rules are off, it becomes a linguistic jungle with each speaker fighting for survival. [225003310140] |Unfortunately, neither John nor I could find any academic research on this topic (I found tons on inter cultural pragmatics, but nothing obviously on this particular situation). [225003310150] |I suspect it's out there, it's just hard to find. [225003310160] |What terms should I search for? [225003310170] |Hmmm, it's an odd one, no doubt. [225003310180] |*The blogger did not specify what dialect, though Mandarin is likely. [225003320010] |In Defense Of Science Blogging [225003320020] |Jason G. Goldman, a science blogger out of USC, posts a thoughtful defense of the emerging role of science blogging. [225003320030] |His major points seem to be: [225003320040] |
  • Science journalism sucks (okay, "sucks" is my word), so science blogging is a potential, and superior replacement
  • [225003320050] |
  • Blogging is a form of public intellectualism
  • [225003320060] |
  • There are real professional development opportunities
  • [225003320070] |He makes other points as well. [225003320080] |And, he offers some good links to related posts. [225003330010] |mixed modals [225003330020] |I found Ta-Nehisi Coates' use of had have awkward in the following sentence (referring to Rachel Maddow's recent interview with Rand Paul): [225003330030] |That interview would have went a lot better for Rand Paul if Maddow had have just thrown her notes in the air and accused him of being a bigot, and a covert member of the Klan. (emphasis added). [225003330040] |So, the construction is "X would have went a lot better if Y had have just verbbed." [225003330050] |My position is that the tense and aspect of the VP in the embedded subjunctive (the if-clause) normally matches the VP in the main clause. [225003330060] |So, my preference is for "X would have went a lot better if Y would have just verbbed." [225003330070] |This use of had reminds me of the use of past perfect for simple past in black English, in constructions like "He had told me to be here at six." (though this wiki page says nothing about it). [225003330080] |But this is not simple past anyway. [225003330090] |Coates' use of had in the embedded clause may be a function of his dialect, I don't know. [225003330100] |He's from Baltimore, but I don't know which neighborhood. [225003330110] |In a previous post, he talks about his language use as a child just a bit: [225003330120] |The fact is that while I read a ton, and got teased for it, I lived in the neighborhood and talked like people in the neighborhood. [225003330130] |I was in gifted classes at school, but I didn't have the kind of parents who penalized for using a word like "irregardless." [225003330140] |Moreover, I was, if not particularly cool, still really well liked. [225003330150] |My particular and specific black experience was that as long as you had some familiarity with the language, you pretty much were free to do whatever you wanted. (emphasis added). [225003330160] |Nonetheless, I'm no prescriptivist, I just thought it curious. [225003340010] |The Politics of Publishing [225003340020] |(image from http://alysha.gather.com/) Let's talk about class warfare in academics, shall we? [225003340030] |I just read a nice little article on speech production from Cognition and while I enjoyed it, I couldn't help but wonder how it got published because it was rather light weight. [225003340040] |To be fair, Cognition published it as a "Brief article" so it was meant to be short*; nonetheless, it had the feel of a grad student poster, not a publication. [225003340050] |You might argue that this is the point of a "Brief article", but I will argue that similar content would likely not have been published had it not been recognizably associated with a well known scholar. [225003340060] |Despite the precautions of blind reviews, it is not uncommon for a linguistics reviewer to have a pretty good idea of who authored or co-authored a paper, simply because linguistics is a small field, and the sub-fields even smaller. [225003340070] |Most scholars have easy-to-recognize methodologies, content areas, or style that acts almost as a scholarly fingerprint. [225003340080] |I don't mean to be mean-spirited, I hope this doesn't come across that way, but minus the second author's fingerprint, I don't see this paper getting accepted. [225003340090] |But first, let's look at the paper itself: A purple giraffe is faster than a purple elephant: Inconsistent phonology affects determiner selection in English (full citation below). [225003340100] |From the abstract: "during the production of a determiner–noun phrase, nouns automatically activate the phonological forms of their determiners, which can compete with the phonological forms that are generated by an assimilation rule." [225003340110] |As I understand it, this means that nouns activate default articles when we're about to say them. [225003340120] |Show me a picture of an orange giraffe, and I think a orange giraffe before I say an orange giraffe. because I haven't yet applied the phonological process that says the indefinite article a becomes an when followed by a vowel. [225003340130] |This leads to competition between a and an to see which one will actually be said out loud. [225003340140] |What the researchers found was that the phrase an orange giraffe was produced more slowly than a purple giraffe and they argue that this slowness (aka, naming latency) is the result of the extra time it takes to apply the phonological process to the indefinite article (aka determiner competition). [225003340150] |By putting the adjective orange in between the article and the noun, they were able to show that it was the noun driving this effect, not the adjective (because the indefinite article agreed with the noun originally, thus the slowness). [225003340160] |Like I said, a nice little article. [225003340170] |Cute little paradigm, good results, nice work. [225003340180] |But here's the thing: more than a dozen previously published articles say the same thing. [225003340190] |There's nothing new here. [225003340200] |What this research does is drill down to test a detail of a well known phenomenon (determiner competition), namely that phonology alone can invoke this competition. [225003340210] |In the authors' own words: "the lexical-syntactic level may not be necessarily involved." [225003340220] |Let me repeat: X may not be necessarily involved in Y. Wow, that's hardly a bold statement worthy of a journal publication. [225003340230] |There was really only one experiment reported (a second experiment was included, but imho, it was so similar to the first, it hardly counts as a second). [225003340240] |I have no problem with this as an experiment and it would be a good poster at a psycholinguistics conference or the LSA. but I can't image being able to publish something like this myself, nor anyone I went to grad school with because we simply didn't have the institutional ooomph to guide something as light as this through the review process. [225003340250] |Yes yes, again I know the process is "blind", but this article does have Kathryn Bock's fingerprint. [225003340260] |She's well known for publishing on sentence production in general as well as number agreement in sentence production, a similar if not exact match to determiner agreement. [225003340270] |And she's a well respected psycholinguist by any measure. [225003340280] |I'd have to re-read a bunch of her papers to see how closely the methods and presentation match her work, but I'd guess that a "blind" reviewer, who would by necessity be familiar with sentence production literature, would not have a hard time guessing that this work was done in conjunction with Bock. [225003340290] |The blind process actually allows such unintentional and intentional biases to fester because it's hidden and hard to prove. [225003340300] |If the process were transparent, I suspect this sort of thing wouldn't happen as much. [225003340310] |I'm all for open reviewing. [225003340320] |I'd prefer all open, transparent reviewing. [225003340330] |Why hide? [225003340340] |Spalek, K., Bock, K., &Schriefers, H. (2010). [225003340350] |A purple giraffe is faster than a purple elephant: Inconsistent phonology affects determiner selection in English Cognition, 114 (1), 123-128 DOI: 10.1016/j.cognition.2009.09.011 [225003340360] |*The only guideline regarding the content of a "Brief article" I could find on Cognition's site is this: "Brief articles must be no more than three thousand words long." [225003350010] |Psycholinguistics Experiments [225003350020] |The Portal for Psychological Experiments on Language has seven new experiments for May: [225003350030] |
  • Image Caption Generation: In this experiment you will be presented with a news image, an article associated with the image, and a caption describing the image. [225003350040] |Your task is to judge how well the caption describes the content of the image given the accompanying article and how grammatical the caption is. [225003350050] |Some captions will seem appropriate to you, but others will not. [225003350060] |You will make your judgement by choosing a rating from 1 (the caption is not appropriate) to 7 (the caption is appropriate). [225003350070] |All captions were generated automatically by a computer program.
  • [225003350080] |
  • Human-robot Interaction: You will see a series of pictures of a scene with a robot standing at a table with some objects on it. [225003350090] |You will hear the robot asking a question and a human answering it. [225003350100] |Every time you will make a judgment about the robot's question. [225003350110] |Between robot scenes you will check the correctness of simple calculations.
  • [225003350120] |
  • Sentence Reading: In this experiment, you will be shown a set of sentences which describe a situation. [225003350130] |You will have to read the sentences carefully and answer the questions asked at the end of the sets. [225003350140] |Each sentence will appear on a separate slide. [225003350150] |You can move to the next slide by clicking anywhere on the slide. [225003350160] |At no point will you be able to go back and revisit the contents of the previous slide (please do not use the 'Back' button of the browser as this will take you to the begining of the experiment). [225003350170] |On the slide containing the question, you will be given two options as possible answers and you need to select one of them. [225003350180] |On selecting the answer, you will be presented with the next set.
  • [225003350190] |
  • Image Annotation: In this experiment you will be presented with a news image, an article associated with the image, and a set of keywords describing the image. [225003350200] |Your task is to judge how well each of the keywords describe the content of the image given the accompanying article. [225003350210] |Some keywords will seem appropriate to you, but others will not. [225003350220] |You will make your judgement by choosing a rating from 1 (the keywords are not appropriate) to 7 (the words are appropriate). [225003350230] |All keywords were generated automatically by a computer program.
  • [225003350240] |
  • Sentence Compression: In this experiment you will be asked to judge how well a given sentence compresses the meaning of another sentence. [225003350250] |You will see a series of sentences together with their compressed versions. [225003350260] |Some sentence compressions will seem perfectly OK to you, but others will not. [225003350270] |All compressed versions were generated automatically by a computer program.
  • [225003350280] |
  • Story Generation: In this experiment you will be asked to read a set of short computer generated stories. [225003350290] |Each story will be 5 sentences long and will contain only a couple of characters. [225003350300] |After reading each story you will assess its quality along three dimensions: fluency, coherence and interest. [225003350310] |For each dimension you will provide a rating on a scale from 1 to 5.
  • [225003350320] |
  • Referring expressions: The goal of this short survey is to collect your opinions about the most natural way to refer to objects in a conversation. [225003350330] |Different people might do this in different ways, depending on how they interpret the context in which the dialogue takes place. [225003350340] |We are interested in your opinion, given the context described below.
  • [225003350350] |Enjoy! [225003360010] |yeah right ctd. [225003360020] |Thanks to Twitter #linguistics, I discovered that Hebrew University grad student Oren Tsur will be in DC next week presenting a paper on automatic detection of sarcasm in product reviews (see here and here for reactions). [225003360030] |I've posted on sarcasm before (see here and here) so I'm curious. [225003360040] |The conference is the 4th Int'l AAAI Conference on Weblogs and Social Media at GW and it looks rather interesting (the first interesting thing to happen in Foggy Bottom since Watergate?). [225003360050] |I might could take some PTO and check it out. [225003360060] |Tsur's work can be found here: A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews (pdf). [225003360070] |FYI: while Tsur's work relies solely on written words, Joseph Tepperman et al. from USC work on sarcasm in voice recognition: “YEAH RIGHT”: SARCASM RECOGNITION FOR SPOKEN DIALOGUE SYSTEMS (pdf). [225003370010] |My German [225003370020] |Here's a curious bit of linguistics: American Students refer to studying languages using a possessive phrase, but not other studies: [225003370030] |a. [225003370040] |I have to work on my German. b. *I have to work on my math. c. *I have to work on my biology. [225003370050] |Note that all three could easily include the word "skills" at the end, but only (a) is acceptable bare (to me, anyway). [225003370060] |I wonder if this is related to the creative work metonymy construction that Pullum just posted about over at LL (it was that post which triggered my thinking on this). [225003370070] |Being able to speak a language can be seen as a kind of creative work (i.e., the speaker is producing the language in a way they are not producing biology). [225003380010] |Causative Funny Business in Swedish [225003380020] |Finally saw the excellent Swedish film The Girl with the Dragon Tattoo (Män som hatar kvinnor). [225003380030] |While I don't speak Swedish, I noted that the word mörda 'murder' was translated into English as killed. [225003380040] |Since the cognate murder is clearly available, I had to wonder if there was some good reason for this choice. [225003380050] |Is mörda less causative than murder? [225003380060] |I used Google to translate I accidentally murdered him into Swedish and was given Jag mördade av misstag honom. [225003380070] |But when I translated that back into English, I got I accidentally killed him. [225003380080] |Some causative funny business going on here methinks. [225003390010] |Infixation FAIL [225003390020] |(image from Huffington Post) No sir, this just doesn't work. [225003390030] |They violated the prosody. [225003390040] |See here. [225003400010] |Friday Funnies [225003400020] |For your linguistic comedy pleasure, click HERE for 5 comedy videos on language (from ALTA Language Services, via Twitter #linguistics). [225003400030] |Personal Fav, the ever awesome, Catherine Tate: [225003410010] |The Linguistics of a "Perfect Game" [225003410020] |Full disclosure: I am not a baseball fan*. [225003410030] |It seems to me a curious thing, this kerfluffel about the blown "perfect game" because it is an example of bizarro linguistics**. [225003410040] |Despite the fact that incontrovertible evidence exists that proves the game does in fact meet the requirements of a "perfect game," the refusal of MLB to officially sanction it as a "perfect game" has caused a titanic uproar amongst fans. [225003410050] |Why? [225003410060] |We all know it really was a perfect game. [225003410070] |Why does anyone care about the label that MLB puts on it? [225003410080] |We care because they have been granted, by convention, the right to determine what counts as a "perfect game" and what doesn't. [225003410090] |We could call it a "perfect game" amongst ourselves, but it just wouldn't be, because MLB has the ultimate say so. [225003410100] |It's like Pete Rose. [225003410110] |We all know he's a hall of famer, but he just isn't. [225003410120] |Because MLB says he isn't. [225003410130] |This is the opposite of the way linguistic items generally form their meanings. [225003410140] |Generally if enough people agree that "wug" means X, then that's what it means. [225003410150] |But not in this case. [225003410160] |300 million Americans (and several million Japanese, Cubans and Venezuelans) all agree that Armando Galarraga pitched a perfect game, but we are linguistically overruled by a governing body, and that's that. [225003410170] |This strikes me as a variation on Putnam's semantic externalism whereby speakers assume that a word's meaning is determined by someone else. [225003410180] |We don't naturally see our own role in determining meaning. [225003410190] |If there is a clearly defined group, like MLB, then it's even easier to surrender our contribution, even when our own intuition about the meaning is so acute. [225003410200] |It's also interesting that almost nothing rides on this label. [225003410210] |They won regardless of what you call it; it doesn't affect the team's record at all. [225003410220] |That label will not likely affect the team's season, except perception. [225003410230] |The pitcher might have been able to use a "perfect game" as a negotiating tactic to get more money, but few fans care about that. [225003410240] |He would have gotten his name in the record books, that's tangible, but again, it does nothing for the team. [225003410250] |*I was a wrestler for 14 years; if there's no blood, it's not a sport. **I invented that term, patent pending, all rights reserved. [225003420010] |War on Americanisms??? [225003420020] |The Twitter world is abuzz with retweets of articles by UK journalist Mathew Engel who has published a few rants railing against the degradation of The Queen's English (see HERE for retweets; Engle's rants can be found HERE and HERE). [225003420030] |Consisting mostly of whiny British self-pity (the Empire is dead, get over it!), Engel and his readers whimper about the bully Americans and our "useless", "infuriating", "ugly", and "witless" sayings. [225003420040] |Some choice samples of Engle's whimpers: [225003420050] |
  • The battle is almost uncertainly unwinnable but I am convinced there are millions of intelligent Britons out there who wince as often as I do every time they hear a witless Americanism introduced into British discourse.
  • [225003420060] |
  • British English is being overwhelmed by a tidal wave of mindless Americanisms
  • [225003420070] |
  • Americans rarely hear any of our words, let alone adopt them.
  • [225003420080] |
  • But we are so overwhelmed by everything American that the British have lost their grasp on the difference between our form of English and theirs. [225003420090] |This is the reality of cultural imperialism.
  • [225003420100] |
  • ‘Speciality’ (with the i) is a lovely word, full of rolling syllables. [225003420110] |His version is the kind of usage that comes out of the mid-Atlantic and needs to be dropped back there, from a great height.
  • [225003420120] |
  • And there is widespread loathing of the verbalisation of nouns: incentivizing and all that rot.
  • [225003420130] |Cheers mate. [225003430010] |Is Arabic The Least Positive Language? (hint, no) ... sigh [225003430020] |Sometimes bad science reporting is a function of bad science. [225003430030] |Garbage in, garbage out. [225003430040] |There's been some buzz about new research regarding the bias of negative and positive words in English as well as cross linguistically. [225003430050] |I have refrained from commenting because it sounded like typical bad reporting and misunderstanding of academic research. [225003430060] |Then Andrew Sullivan got involved. [225003430070] |Sigh. [225003430080] |Sullivan has his strengths and weaknesses as a blogger. [225003430090] |His strength shone brightly last summer when he helped publicize the Iranian green movement. [225003430100] |His weakness, however, peeps out anytime he blogs about anything remotely related to science or academics (see HERE and HERE). [225003430110] |His most recent silliness has the title The English Language Is An Optimist. [225003430120] |His megaphone is so big, I feel someone must clear up the foggy facts and murky interpretations currently being disseminated. [225003430130] |To begin, the research under question is from Rozin et al., U Penn psychologists who appear to be focused on emotion research (full citation below). [225003430140] |As far as I can tell, no linguists were involved (and boy oh boy, they should have been. [225003430150] |Ya know, Penn has a linguistics department that is, let's just say, above average). [225003430160] |The basic point of the research cited is this: Positive events are more common (more tokens), but negative events are more differentiated (more types). [225003430170] |Sullivan simply posts a quote from another blog which regurgitates the research as if it were true with no ciritical analysis on anyone's part. [225003430180] |I will offer the much needed critical analysis here. [225003430190] |Here are the four facts about English that everyone seems to find so fascinating: [225003430200] |
  • Negative words are often composed of the positive root negated with a prefix (e.g., unhappy, insincere, unpleasant), while the reverse is exceptional (e.g., unselfish, uncontaminated).
  • [225003430210] |
  • Negated positive adjectives tend to have a negative valence, whereas negated negative adjectives tend to be neutral in valence
  • [225003430220] |
  • Usually, only positive adjectives are used to refer to the whole positive negative dimension
  • [225003430230] |
  • In conjunctions or disjunctions, positive adjectives are usually mentioned before the opposite negative adjectives
  • [225003430240] |One of the disappointing things about this buzz is that the facts gaining the buzz are decontextualized from the research reported in the paper (a common ingredient in these kinds of stories). [225003430250] |As far as I can tell, all Rozin et al. did was this: take some random linguistic facts published in 1978, look up some arbitrary words in a frequency table, then administer a short one-on-one survey with a small group of informants. [225003430260] |That's it! [225003430270] |And that's not much. [225003430280] |It's modest qualitative analysis masquerading as comprehensive quantitative data gathering. [225003430290] |The whole premise of this paper is based on a bold claim that "most of the events experienced in life have positive implications." [225003430300] |They cite research on this that I have not looked into, so I have no clue what they mean by this. [225003430310] |What do they mean by "event"? [225003430320] |I suspect their use of "event" and the use of this word by semanticists (especially formal semanticists) is quite different. [225003430330] |Linguists, and semanticists in particular, really care about defining what an "event" is. [225003430340] |If you want to have some fun, ask five formal semanticists to define "event". [225003430350] |Sparks will fly. [225003430360] |Then, ask them 'when does the beginning of an event end?'. [225003430370] |Oh my, fisticuffs are certain. [225003430380] |While linguists are obsessed with being quite disciplined with these kinds of things, psychologists don't seem to be. [225003430390] |I confess that what little emotion research I've read is disappointing. [225003430400] |The field seems plagued by vague terms and weak methodology. [225003430410] |But that's not what my principle critique will center on. [225003430420] |I'm more interested in what they actually did. [225003430430] |Let's walk through their methodology, shall we? [225003430440] |Take some random linguistic facts published in 1978 Rozin et al. report that "positive words (tokens, not types) occur with much higher frequency than negative words in English" but they only cite three studies, all of which were published before 1984, one of them in the 1960s, but their four big facts come from one study, Matlin, M. W., &Stang, D. J. (1978). [225003430450] |The Polyanna principle. [225003430460] |Selectivity in language, memory, and thought. [225003430470] |Cambridge, MA: Schenkman Publishing Company. [225003430480] |1978!...In other words, before the advent of large, easily searchable corpora. [225003430490] |Rozin et al. give no operational definition of what a "positive word" or a "negative word" is. [225003430500] |They appear to just assume that such things exist and they're easy to identify. [225003430510] |In other words, they assume the linguistics part is easy, so why bother working hard at it. [225003430520] |Bad psychologist, bad (imagine me slapping their noses with a newspaper while saying this). [225003430530] |If we take a "negative word" to simply mean the negated form of another word, well, then, yeah, sure they're gonna be marked. [225003430540] |If it's something else, we need to know what that something else is. [225003430550] |If we don't have a good definition of what these things are, then how do we go about finding them? [225003430560] |Well, Rozin et al. just decided arbitrary intuition was good enough: [225003430570] |These ‘‘reference’’ adjectives were selected in advance by the authors, such that some were negative and some positive. [225003430580] |They were common ajectives in English, but were selected by convenience, with the proviso that we knew in advance for all cases that the positive asymmetries we were exploring were present for these words in English. (emphasis added) [225003430590] |Their eight adjectives were pleasant, sad, dirty, disgusting, bad, sincere, pure, and beautiful. [225003430600] |Look up some arbitrary words in a single frequency table Once they came up with their list, they looked up each word in Leech's Word frequencies in written and spoken English. [225003430610] |We confirmed this in a preliminary study, searching for positive and negative valenced adjective frequency in an extensive corpus of over 100 million words of both spoken and written British English (Leech, Rayson, &Wilson, 1971, also available on the Internet). [225003430620] |We searched for frequency of English usage for the seven adjectives we examined across languages in the first part of the present study and their opposites (opposites listed after the solidus: pleasant/aversive, sad/ happy, dirty/clean, bad/good, sincere/no obvious opposite in English, ure/contaminated, beautiful/ ugly). [225003430630] |We also searched for the negation of any of these words, when it formed a word in English, which was the case for 5 of 7 positive words (unpleasant, unhappy, unclean, insincere, impure) [225003430640] |That right there was the sum total of corpus research they did. [225003430650] |Mere frequency counts don't tell us much. [225003430660] |This is the worst kind of corpus linguistics where simple word counts are imbued with magic and meaning. [225003430670] |Nope. [225003430680] |Nothing terribly meaningful in word counts all by themselves. [225003430690] |It would have been easy to gather collocation data and give us some sense of what significant co-occurrence was going on. [225003430700] |But no, they give us nothing. [225003430710] |Administer a short one-on-one survey with a small group of informants We interviewed one native speaker of each of 20 languages, not including English. [225003430720] |The languages were: Mandarin, Cantonese, Japanese, Korean, Vietnamese, Thai, Tagalog, Ibo, Arabic, Turkish, Tamil, Hindi, German, Icelandic, Swedish, French, Portuguese (Brazilian), Spanish, Russian, and Polish. [225003430730] |The languages were selected by convenience ... [225003430740] |The informants were asked ten questions about eight adjectives, half positive...It was essential that all informants had an intuitive sense of the language they were being interviewed about, since the questions had little to do with ‘‘rules’’ of syntax, but rather relied on what ‘‘sounds right’’. (emphasis &jumps added) [225003430750] |So they took one fluent speaker of Vietnamese, gave her/him an English adjective, and then asked these ten questions (and repeat for each speaker). [225003430760] |
  • Is there a positive word?
  • [225003430770] |
  • Is there a negative word?
  • [225003430780] |
  • Can the positive word be negated?
  • [225003430790] |
  • Can the negative word be negated?
  • [225003430800] |
  • Is the negation of the positive word neutral or extreme? (# extreme)
  • [225003430810] |
  • Is the negation of the negative word neutral or extreme? (# extreme)
  • [225003430820] |
  • Would the informant rather be ‘‘unnegative’’ or ‘‘unpositive’’? (# prefer unneg.)
  • [225003430830] |
  • Can the negative word be used on the positive end of the spectrum? (# yes)
  • [225003430840] |
  • Can the positive word be used on the negative end of the spectrum? (# yes)
  • [225003430850] |
  • Does it sound better to say the negative or positive word first? (# pos. first)
  • [225003430860] |This looks more like a weak study of lexical access, or cross-linguistic priming, than the study of positive/negative semantic space. [225003430870] |Survey tools like this are best as a beginning, very preliminary stage of deeper research (i.e., not published). [225003430880] |Descriptive linguists/grammarians work with speakers for years to tease out these kinds of semantic judgments. [225003430890] |This is no easy thing to do. [225003430900] |With only one informant per language and such little information (except English), it's hard to tell if this is just noise. [225003430910] |The authors dismiss the noise potential in what, to me, sounded entirely illogical: Any uncertainty or inaccuracy of a single informant would, of course, not bias our findings but would add ‘‘noise’’ and make it more difficult to demonstrate a strong commonality across languages. [225003430920] |If one and only one informant were inaccurate, then that's noise, but what if all were slightly confused by an awkward task? [225003430930] |That's garbage. [225003430940] |How would we know? [225003430950] |The authors tried to address this by interviewing "10 native English speakers (8 of them students at the University of Pennsylvania), presenting our full protocol of questions for three of the adjectives: sad, good and pleasant, and for all five ‘‘unique’’ negative nouns." [225003430960] |So they took a sub-set of their already small data and tested 10 informants in one language, assuming that the variation they found in that one language would be a good proxy for the variation in any other language. [225003430970] |I just don't think that's a wise assumption. [225003430980] |Ultimately my impression of this article was that it's weak research about a topic that people love so much, they're willing to take sound-bite blogging at face value. [225003430990] |This is borderline rumor mongering. [225003431000] |Did this research say that the English language is "optimistic", Andrew? [225003431010] |No, it did not. [225003431020] |Did this particular study find that positive events outnumber negative events, Andrew? [225003431030] |No, it did not. [225003431040] |Let me make my point by using their own data. [225003431050] |Rozin et al. found that English showed the largest percent of cases with positive dominating negative and Arabic the least*. [225003431060] |Now, imagine I claim that Arabic is the least positive language? [225003431070] |How happy would you be with this interpretation? [225003431080] |Should we be any more happy with Rozin's interpretations? [225003431090] |Sullivan's? [225003431100] |*I'm not sure I completely understand this result because they didn't publish their actual results, but I think it means that for the 7 adjectives, the biases in Arabic were all small (i.e., pos/neg were all similar). [225003431110] |Rozin, P., Berman, L., &Royzman, E. (2010). [225003431120] |Biases in use of positive and negative words across twenty natural languages Cognition &Emotion, 24 (3), 536-548 DOI: 10.1080/02699930902793462 [225003440010] |Linguistic Dodge-ball [225003440020] |The playful linguists at University of Essex have wisely decided to give in to the football/soccer hype and link their great online linguistics game Phrase Detectives to The World Cup in South Africa with a new Dodge The Ball competition: [225003440030] |If you've had enough of the football coverage already, maybe you would be interested in the our Dodge the Ball competition. [225003440040] |Simply play the University of Essex's Phrase Detectives game between 11 June and 11 July and you could be selected to win a prize everyday. [225003440050] |Now you have something better to do than watch 22 men kick a pig's bladder around a field. [225003440060] |Pssst, you can play the game and watch football simultaneously too. [225003440070] |Now go play the game! [225003450010] |Car Talk Goes Linguistic (kinda) [225003450020] |If you listened to this week's Car Talk, you heard the answer to last week's puzzler, which contained a semi-linguistic related brain tease. [225003450030] |Unbeknownst to the Tappet Brothers was the fact that there were a few extra puzzles hidden within their main one. [225003450040] |First, the puzzler as it was presented on the show: [225003450050] |This was a puzzler that I'm stealing from the late Martin Gardner. [225003450060] |I'm going to give you a number and you're going to tell me what's unique about the following number: 8,549,176,320. [225003450070] |Now if you want a hint, I'll point out that there are 10 digits in that number. [225003450080] |The question is, what's unique about this number? [225003450090] |The answer requires some linguistic gymnastics. [225003450100] |The extra puzzles are somewhat hidden until the first puzzle is solved: [225003450110] |The number 8,549,176,320 is the result of taking the written form of each numeral and alphabetizing them from A-Z. [225003450120] |So this list [225003450130] |Zero One Two Three Four Five Six Seven Eight Nine [225003450140] |Becomes this number Eight Five Four Nine One Seven Six Three Two Zero [225003450150] |or 8,549,176,320 (note that one could argue that 236,719,458 is every bit as "unique" in this same sense). [225003450160] |While gnawing on this otherwise trivial puzzle, I noticed a couple of more interesting facts. [225003450170] |
  • There are three letters that start two numbers: T (two, three) F ( four, five), S (six, seven). [225003450180] |In all three cases, the numbers are consecutive, hence the letters "pattern together" in a certain sense (also notice that in the alphabetized version, each pair's order is reversed). [225003450190] |Is there any historical reason for this, or just coincidence? [225003450200] |Care to write a FOPC statement that correctly captures this fact?
  • [225003450210] |
  • It became quite clear to me that accessing numbers and lexical items interfere with each other while trying to write the number down. [225003450220] |Since I was driving in a car when I first heard the puzzler, I didn't write the number down, but I figured I could reconstruct it easily, but then I jumped on the DC metro with no writing tool, so I couldn't write down the letters then alphabetize them in front of my eyes, so I had to "figure it out" in my head. [225003450230] |No worries, I thought, I'm a smart guy and a little Saturday morning brain tease is better than coffee. [225003450240] |So I flipped out my archaic, obsolete cell phone and wrote the number into a text message. [225003450250] |What I discovered was something akin to a numerical Stroop effect where the numbers interfered with my ability to choose the correct button to push. [225003450260] |So I wanted to type the letter "T" for "two" (which is the number 8 on my keypad), but instead I typed the letter "A" because that's on the number 2 button. [225003450270] |I had a similar problem just now when typing on a full keyboard. [225003450280] |When I tried typing the number names above, I regularly typed the number from the topmost key row instead. [225003450290] |Somebody must have studied this kind of interference already, right?
  • [225003460010] |The Most Ridiculous Use Of The Umlaut EVER? [225003460020] |(image from a cell phone pic)You tell me...Original here. [225003460030] |(I have nothing to say about the odd use of underlines either). [225003470010] |Grilled Cat With Lemon [225003470020] |I snapped the pic of the sign above at a beach near Hampton, VA. [225003470030] |Yes, the new line and plural "PETS" helps, but still, can we buy a comma Pat? [225003480010] |An X of Y [225003480020] |(a pod of whales from About.com)[reposted from last year with update] [225003480030] |[UPDATE: kottke points to the same blog with added pics here). [225003480040] |10 years ago, when I was teaching English in China, I was surprised by how interested my students were in learning about phrases like "a pod of whales," "a cup of coffee," and "a pride of lions." [225003480050] |When I mentioned a phrase like this, they would perk up immediately (difficult to do in the oppressive Guangzhou 广 summer heat). [225003480060] |This was a year before I began studying linguistics proper so I had no clue what a collective noun was, nor did I know what a classifier was, nor did I know that Chinese languages like Mandarin and Cantonese have elaborate systems of nominal classifiers (this Wiki page is a good primer). [225003480070] |I just thought it was a cute diversion to talk about at the end of an evening's class. [225003480080] |It turns out that collective nouns have very interesting properties which linguists love to obsess over (I regret I do not have access to a copy of The Cambridge Grammar of the English Language because I suspect Huddleston and Pullum have some fascinating points). [225003480090] |Now, Via kottke, I discovered a blog called All Sorts dedicated to culling collective nouns from Twitter feeds. [225003480100] |It relies on a little NLP and some crowd sourcing. [225003480110] |It appears to be restricted to the syntactic construction "an X of Y". [225003480120] |Since it relies so heavily on syntax, it gathers examples that are weak, at best. [225003480130] |For example, in what way are the following collective nouns? [225003480140] |a conspiracy of theorists a tantrum of 2 year olds a pratfall of clowns [225003480150] |My first pass reading of those thee phrases is not as collective nouns, but rather as periphrastic genitives (e.g., "a mayor of Buffalo once said..."). [225003480160] |The "a X of Y" syntax is, by itself, ambiguous between the periphrastic genitive and collective noun constructions (as well as simple PP attachment like "a webcomic of romance"). [225003480170] |Do people prefer the use of "a X of Y" for one of these constructions? [225003480180] |I suspect any preference would be based on the semantic features of the nouns involved (once you read the word "group", you pretty much know you've got a collective noun on your hands). [225003480190] |I wonder if anyone has done online reading tasks with subjects reading the two kinds of phrases and experimenting with different features to see what cues one reading over another. [225003480200] |Imagine creating a set of stimuli containing sentence frames that could take either a collective noun or a periphrastic genitive and alternating each, controlling for features like animacy. [225003480210] |I'll take a crack at one such frame. [225003480220] |My goal is to create sentence pairs involving minimal pairs of "a X of Y" constructions which differ only in the Y noun and where the first is a collective noun while the second is a periphrastic genitive. [225003480230] |This relies critically on finding an X word that can be a collective noun like "group" as well as a possessive. [225003480240] |Hmmmmmm, this ain't gonna be easy.... [225003480250] |a. [225003480260] |That cup of coffee that I broke has been cleaned up. b. [225003480270] |That cup of John's that I broke has been cleaned up. [225003480280] |My original hypothesis was that people will delay on (b) [meaning, their reading of the following region will be slower than (a)]. [225003480290] |But I dunno, because some will be confused at "broke" in (a) as well. [225003480300] |Part of this will depend on where the delay occurs. [225003480310] |Now, you go write up 100 of these pairs, norm them for acceptability, set up a moving window reading test in ePrime, run at least 30 subjects, then call me when you got results. [225003480320] |I've done my part. [225003490010] |When-Copy-Editing-Prescriptivism_GO-E-s_H-0_rr-i_b_ly-W-R-O_nG [225003490020] |It's almost too easy to beat up on Slate.com these days. [225003490030] |The whole site has devolved into a garbage can of reactionary, simple minded, and flat wrong typists who are making dear Truman turn in his fabulous grave. [225003490040] |One of the latest wastes of pixels is this review of the movie Grown Ups which hinges almost its entire critique on a hyphen (no shit, that's about the entire review): [225003490050] |Grown Ups. [225003490060] |Just to be clear: That's Grown, space, Ups. [225003490070] |What this might mean is a problem of Noam Chomsky-esque proportions. [225003490080] |What's fairly certain is that at no stage of the movie's well-funded production did anybody think to check the spelling of the title. [225003490090] |The dictionary that Copy-Editing the Culture happens to be wedded to (not always happily) is Webster's New World College Dictionary: Fourth Edition. [225003490100] |It's called "college" because it is intended for, as it were, grown-ups—or, as Webster's also allows, grownups. [225003490110] |Never has Copy-Editing the Culture met a prescriptive dictionary that supports Sony's version of the word. [225003490120] |That's because the noun grown ups makes no sense. [225003490130] |To grow up—or to push down, to walk toward, to jump up—is a straightforward verb intensified with a preposition. [225003490140] |Grown-up is a single noun compounded from those pieces. [225003490150] |But what's a grown up? [225003490160] |Grammatically, this uncompounded object makes sense only if one is describing an "up" that has grown. [225003490170] |And what's an "up"? [225003490180] |Does it eat? [225003490190] |Need it be socialized? [225003490200] |Way to name drop Chomsky, btw. [225003490210] |As one of the rare people who've actually read the The Minimalist Program* I'm pretty sure Chomsky would not be impressed with this reference. [225003490220] |The author, Nathan Heller, seems to think that the placement of hyphens are the height of grammatical analysis. [225003490230] |As a former college writing instructor, I can tell you that citing a dictionary definition is the clear sign of an author who has ZERO idea what the hell she/he is talking about. [225003490240] |Good authors structure their own arguments, they don't cite dictionaries as their authorities. [225003490250] |Following up the dictionary reference with the incoherent claim that the preposition up some-how** intensifies the verb grown exposes Nathan's complete lack of credibility on all matters of linguistics. [225003490260] |Even the most simple-minded*** grade school teacher would be wary of claiming the preposition up magically intensifies the verb grown in the title Gown Ups. [225003490270] |It is an incoherent claim on all levels. [225003490280] |It makes no sense. [225003490290] |* Believe me folks, I ain't proud of this. [225003490300] |I just happened to be trained at a functionalist grad school which believed in know thy enemy pedagogy, and hence a summer reading group was born. [225003490310] |**Nate, whattaya think about my use of "some-how"? [225003490320] |Hmmm? [225003490330] |Weird hyphen, huh? [225003490340] |Is it "uncompounded"? [225003490350] |***Oh shit, Nate, did you see that? [225003490360] |I used another hyphen! [225003490370] |Hyphens and Chomsky and Intensifiers, Oh My! [225003500010] |BEST. HEADLINE. EVER. [225003500020] |(Image from dlisted) And a bravo for this one I (which I first heard about on Wait, Wait, Don't Tell Me): Woman in sumo wrestler suit assaulted her ex-girlfriend in gay pub after she waved at man dressed as a Snickers bar (online version of story here, but with simpler headline). [225003500030] |Surprisingly, this is not a crashblossom. [225003500040] |This headline, as far as I can tell, means exactly what you think it means on the first read. [225003500050] |The linguistics question, which is completely obvious, of course, is why did the headline author feel that "Snicker's bar" was the only NP that needed an article? [225003510010] |Robo-Linguists At Last!! [225003510020] |(image from evasee) Finally, the tedious job of linguists has been replaced by robots! [225003510030] |The site io9.com proudly trumpets the triumph of algorithm over the comparative method with the post title: Computer program deciphers a dead language that mystified linguists. [225003510040] |i09.com proudly proclaims the following: The lost language of Ugaritic was last spoken 3,500 years ago. [225003510050] |It survives on just a few tablets, and linguists could only translate it with years of hard work and plenty of luck. [225003510060] |A computer deciphered it in hours. [225003510070] |However, just a brief scan of the original article (pdf) suggests that there's less here than meets the eye. [225003510080] |The abstract begins thusly: [225003510090] |In this paper we propose a method for the automatic decipherment of lost languages. [225003510100] |Given a non-parallel corpus in a known re- lated language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. [225003510110] |Producing an alphabetic mapping and a cognate set is nice, but "deciphering a dead language" it ain't. [225003510120] |HT: Sérgio Bernardino vi Twitter #linguistics) [225003530010] |No Problem? You're Welcome. [225003530020] |Over at Salon, Matt Zoeller Seitz posted a fairly mundane rant in the peevologist tradition complaining about the decline in civility represented by the rise of no problem as a replacement for you're welcome in American courtesy interactions. [225003530030] |What piqued my interest was not the rant itself (I'm growing tired of countering the peevologists, let them rant away, yawn) but rather the fact that we can easily fact-check his intuition that you're welcome is declining in use while no problem is rising thanks to the newly released Corpus of Historical American English from Mark Davies at BYU. [225003530040] |As a caution, this corpus is not really suited to this question because it's not limited to spoken courtesy phrases*, which is what Seitz was specifically ranting about; nonetheless, it give us a hint at the change in frequency of these two phrases. [225003530050] |Using the freely available online tool, I plotted the frequency of you're welcome over the last 200 years: [225003530060] |Then I plotted the frequency of no problem over the last 200 years: [225003530070] |There's no doubt that no problem has increased in frequency, but its rise started 50+ years ago and it has dwarfed you're welcome since the 1970s. [225003530080] |Just for kicks, I performed searches on Google and Bing. [225003530090] |Here are the results: Google/Bing seem to confirm COHA in that no problem is more frequent than you're welcome. [225003530100] |However, these data are hard to tease apart because no problem can be used in a wider variety of contexts than you're welcome, so it's to be expected that its frequency is greater. [225003530110] |But the dramatic and sudden rise of the phrase in ALL contexts 50 years ago is the truly interesting fact, imho. [225003530120] |I have no explanation for that. [225003530130] |One of Seitz' friends suggested that the influence of Romance language speakers (e.g., Spanish speakers in North America) has led to the semantic/conceptual borrowing of de nada. [225003530140] |A nice thought, but it would take much more research to confirm/deny such a thing. [225003530150] |This does give some credence to half of Seitz' claim (if not his complaint) as yes, indeed, the frequency of no problem has increased. [225003530160] |However, there's no indication that the frequency of you're welcome has declined; quite the contrary. [225003530170] |Its frequency has held solidly over 1.5/mil since the 1960s and hit its all time high just last decade. [225003530180] |According to his IMDB profile, Seitz was born in 1968, which means his entire life has been lived well within the boundary of the roaring no problem period. Exactly when did he experience the golden era of you're welcome? [225003530190] |And how can it be said that no problem is REPLACING you're welcome if you're welcome is as strong as ever? [225003530200] |This is probably an example of the yester year phenomenon whereby history is rewritten because a person assumes the days of his youth were the greatest days on Earth; therefore they must conform to his current beliefs about what the greatest days must be like; therefore the past was different than it really was. [225003530210] |*FYI, the sources for COHA are Fiction, Magazine, Newspaper, and Non-Fiction Books. [225003540010] |Online Dialect Survey [225003540020] |All you swinging North American English speakers, pucker up and get yer vocal folds hummin 'cause you're being called to service! [225003540030] |Claire Bowern of Yale University's Linguistics Department has launched an online North American English Dialect Survey (HT Mr. Verb). [225003540040] |Now quit yer bitchin' and yer belly achin' about the use of ain't, the sissy passive, and no problem and contribute something useful to the interwebs, yer voice!. [225003550010] |Salad Too Grill [225003550020] |I can't make sense of this name (just a block from the White House). [225003550030] |It seems to be attempting a play on words of some sort, but I just can't get any coherent meaning. [225003550040] |If it were "Grill Salad Too" at least I could imagine it means a grill plus salads, or a play on "Salad to grill" meaning, I dunno, grilled salads (yuck) but I just can't get either of those from "Salad Too Grill." [225003550050] |Nonetheless, it has a good reputation on Yelp. [225003560010] |Linguists DEBUNK: Does Obama Talk Like a Girl? [225003560020] |For shame, Atlantic Wire. [225003560030] |You wildly mislead your readers with this ridiculous title: Linguists Debate: Does Obama Talk Like a Girl? [225003560040] |This is flat wrong. [225003560050] |Linguists ain't debating this at all. [225003560060] |Linguists, as far as I can tell, are all in COMPLETE AGREEMENT on this topic. [225003560070] |Obama does not talk like a girl. [225003560080] |It's a ridiculous claim with ridiculous presuppositions and ridiculous implicatures. [225003560090] |I don't know a single linguist who disagrees with or wishes to debate this at all. [225003560100] |The Atlantic Wire's roundup of the whole Parker-Krauthammer-Payack kerfluffle treats the delusional scribbles of political partisans on equal terms with the objective, thoughtful and empirically sound analysis of professionals. [225003560110] |This is just wrong. [225003560120] |For shame. [225003560130] |UPDATE: Just noticed that John Lawler makes exactly this point in the comments of The Atlantic Wire's page. [225003560140] |Good for you John. [225003570010] |Implicit Language Policy [225003570020] |Ingrid, over at Language on the Move, tells the story of how difficult it was to get her university to accept the record of a non-English publication, then draws a smart conclusion about linguistic hegemony: [225003570030] |...no one ever made an explicit policy decision that research publications in languages other than English are less desirable than those in English. [225003570040] |However, mundane bureaucratic practices –such as making record entry for a publication in a language other than English more difficult –conspire to have exactly that policy effect. [225003570050] |In this way many decisions that seem to have nothing to do with language end up as implicit language policy decisions –the fact that English-language journals dominate the academic rankings is another example from academic publishing (emphasis added). [225003580010] |Language Is A Battlefield [225003580020] |British author Roz Kaveney discusses Why trans is in but tranny is out - the language of transgender. [225003580030] |Money quote: [225003580040] |Right now, trans is just about universally acceptable - though in recent years there was a fight over whether it should be an adjective or a prefix. [225003580050] |A trans woman, the argument goes, is a woman who happens to be trans as she might be, say, blonde, but a transman is some special and distinct order of being. [225003580060] |For a while, it seemed as if some younger trans men were going to successfully reclaim 'tranny', at least as a 'smile when you say that' epithet, or a 'we can say that about ourselves; you can't' in-group word like 'queer'. [225003580070] |It didn't take, though, partly because it had never stopped being used by would-be hip lad journalists to abuse not only actual trans people, but a list of 'weird' people seen as non-gender-conforming. [225003590010] |what do you say to a linguist? [225003590020] |Here's a clever site by David R. MacIver that compiles stereotypical responses to the "What did you study" question. two responses so far for linguistics. [225003590030] |I'm sure we can add many more. [225003590040] |What do people say to Linguistics? [225003590050] |"Oh, my grammar is horrible. [225003590060] |You must hate that..." [225003590070] |"So how many languages do you speak?" [225003600010] |"Former Hacker"? [225003600020] |The term former hacker is being bandied about quite a lot right now (see examples here). [225003600030] |The term struck me as odd simply because I think of hacking as a skill set, not a job. [225003600040] |You can be a former police officer or former mayor because those are jobs that can end. [225003600050] |But once you have a skill, you tend to retain it forever (like riding a bike....). [225003600060] |My hunch is that the term is meant to suggest that the individual no longer breaks into other people's networks just for fun anymore, even though they could. [225003610010] |Again With The Bad Science Reporting.... [225003610020] |A nice post over at Thoughtomics debunks an all too typical example of bad science reporting run amok involving chickens and eggs and proteins... you see where this is going, right? sigh... [225003610030] |Money Quote: [225003610040] |I didn’t exactly hold mainstream science journalism in high esteem, but I’m amazed that science journalists continue ‘covering’ science stories in this way, even when readers are calling them out. [225003610050] |While the trouble may have started with a misleading introduction and a quirky quote, it is the journalist’s responsibility to check facts and put a story into a context. [225003610060] |Coverage like this does more harm than good for the public image of science reporting and scientists themselves. [225003620010] |Words For Canoe... [225003620020] |The Ottawa Citizen reports that Words for 'canoe' point to long-lost family ties. [225003620030] |The story begins thusly: [225003620040] |An obscure language in Siberia has similarities to languages in North America, which might reshape history, writes Randy Boswell. [225003620050] |A new book by leading linguists has bolstered a controversial theory that the language of Canada's Dene Nation is rooted in an ancient Asian tongue spoken today by only a few hundred people in Western Siberia. [225003620060] |The landmark discovery, initially proposed two years ago by U.S. researcher Edward Vajda, represents the only known link between any Old World language and the hundreds of speech systems among First Nations in the Western Hemisphere (emphasis added). [225003620070] |It's a nice story about hard working linguist Edward Vajda discovered linguistic relationships between the Ket language of Siberia and Athapaskan languages of North America. [225003620080] |A relationship that goes back maybe 13,000 years. [225003620090] |From his web page, "The "Dene-Yeniseian Hypothesis" is gaining acceptance as the first demonstrated link between an Old World and a New World language family." [225003620100] |Having been trained at a school known for both typology and field linguistics, I have a lot of respect for the skills and talent a linguist like Vajda brings to the field (especially since I lack the patience to do this kind of work). [225003620110] |And his enthusiasm is infectious. [225003620120] |From the article: [225003620130] |He found that the few remaining Ket speakers in Russia and the Dene, Gwich'in and other Athapaskan speakers in North America used almost identical words for canoe and such component parts as the prow and cross-piece. [225003620140] |"Finally, here was the beginning of a system that struck me as beyond the realm of chance," Vajda wrote at the time. [225003620150] |"At that moment, I think I realized how an archeologist must feel who peers inside a freshly opened Egyptian tomb and witnesses what no one has seen for thousands of years." (emphasis added). [225003630010] |on withdraw [225003630020] |Like many people, a word I encounter all the time, which I consider normal will occasionally pop out at me and seem odd in some linguistically interesting way. [225003630030] |Today, the word withdraw popped out at the ATM (along with the cash, hehe). [225003630040] |It's the preposition that struck me as odd. [225003630050] |I can still get the use of draw to mean take away (mostly thanks to poker), but what's with doing in that word? [225003630060] |To withdraw does not mean draw with. [225003630070] |The preposition with is a tricky one that marks a wide variety of semantic roles. [225003630080] |A brief set of examples should suffice to make the point (forgive my semantic role labels if they don't match your preferred terminology, just trying to make the point obvious): [225003630090] |
  • Chris loaded the truck with hay. hay = object*
  • [225003630100] |
  • Chris loaded the truck with a pitchfork. pitchfork = instrument
  • [225003630110] |
  • Chris loaded the truck with Larry. [225003630120] |Larry = co-agent
  • [225003630130] |
  • Chris loaded the truck with enthusiasm. enthusiasm = manner
  • [225003630140] |
  • Chris loaded the truck with stripes. stripes = modifier
  • [225003630150] |In his big red syntactic theory book, one of my professors wrote a fairly involved analysis on why with is so versatile. [225003630160] |But arguments as to why this is the case are not particularly relevant at the moment. [225003630170] |I'm more interested in how with got there in the first place, not why the contemporary English grammar** allows it. [225003630180] |The Online Etymology Dictionary lists the following defintiion (sorry, no OED access): withdraw early 13c., "to take back," from with "away" + drawen "to draw," possibly a loan-translation of L. retrahere "to retract." [225003630190] |Sense of "to remove oneself" is recorded from c.1300. (emphasis added) 1300 1200 is a long time ago, so the word has serious English street cred. [225003630200] |But I found the definition of with as 'away' again, just odd until I followed up on the etymology of with: with: O.E. wið "against, opposite, toward," a shortened form related to wiðer, from P.Gmc. *withro- "against" (cf. O.S. withar "against," O.N. viðr "against, with, toward, at," M.Du., Du. weder, Du. weer "again," Goth. wiþra "against, opposite"), from PIE *wi-tero-, lit. "more apart," from base *wi- "separation" (cf. Skt. vi, Avestan vi- "asunder," Skt. vitaram "further, farther," O.C.S. vutoru "other, second"). [225003630210] |In M.E., sense shifted to denote association, combination, and union, partly by influence of O.N. vidh, and also perhaps by L. cum "with" (as in pugnare cum "fight with"). [225003630220] |In this sense, it replaced O.E. mid "with," which survives only as a prefix (e.g. midwife). [225003630230] |Original sense of "against, in opposition" is retained in compounds such as withhold, withdraw, withstand. (emphasis added). [225003630240] |So, to withdraw is to draw against an account, and that makes perfect sense. [225003630250] |Thank you freely available online lingo-tools. [225003630260] |It's a nice example of how dramatically a word can change its semantics. [225003630270] |Virtually all contemporary uses of with involve the sense of together, not against. [225003630280] |But there it is, in black and white (and a little bit of green). [225003630290] |*I think Propbank would use cargo as the role label for hay, I'm not sure, but I figured object was more obvious for lay readers. [225003630300] |U. Illinois has a nifty online Semantic Role Labeler demo, if you want to play around with this kind of thing. **Careful now, I'm using the term English grammar in a fairly technical, psycholinguisticee sense. [225003640010] |the upside of language death? [225003640020] |The bio-blogger Razib Khan steps into the murky waters of language death and proposes an hypothesis about how language death might have favorable outcomes for language evolution. [225003640030] |Money quote: "very high linguistic diversity is not conducive to economic growth, social cooperation, and amity more generally scaled beyond the tribe." [225003640040] |As far as I can tell he has no evidence for this, but rather is drawing an analogy to cultural evolution ala Jared Diamond. [225003640050] |The take-away seems to be: a little language diversity is good; a lot of language diversity is bad. [225003650010] |kids say the darnedest things [225003650020] |Too cute not to pass on...a FB comment posted by my sister Lori (who owns and runs her own pre-school): [225003650030] |One of the funniest things recently said by my preschooler Lilly (4 years old): [225003650040] |After repeating the importance of not unlocking the front door, her Mom said to her, ”What is it that you do not understand?” and Lilly replied, “English.” [225003660010] |pullum bait [225003660020] |Here's an occasionally tongue-in-cheek Q&A from the Chicago Manual of Style Online. [225003660030] |Personal fav: [225003660040] |Q. Can I use the first person? [225003660050] |A. Evidently. [225003660060] |And running a close second: Q. “Between” vs. “among.” [225003660070] |I’m going insane. [225003660080] |I think the editor who changed my wording is just clueless or hasn’t given the issue enough thought. [225003660090] |Please help. [225003660100] |I’ve read the advice in CMOS, Garner’s Modern American Usage, Bernstein’s The Careful Writer, The Cambridge Grammar of the English Language, and a few other sources, but I can’t decide. [225003660110] |Should I say “competition between companies” or “competition among companies”? [225003660120] |They’re competing with each other, severally and individually. [225003660130] |At least, that’s what I think. [225003660140] |Or is “among” justified on the grounds that competition implies vague, intricate relationships? [225003660150] |Do I need an economist to clear this usage question up? [225003660160] |Are there right and wrong answers in this case? [225003660170] |The phrase is “competition between/among companies is intensifying.” [225003660180] |A. [225003660190] |It really doesn’t matter. [225003660200] |The editor might well be clueless—it happens—but you are overthinking this. [225003660210] |HT: kottke [225003670010] |Stanford in the news (good and bad) [225003670020] |Several Stanford linguistics related items have been popping up here and there, none worthy of a post by itself, but taken as a whole, something weird is happening over there: [225003670030] |
  • Stanford linguistics recently posted a search for not one but TWO tenure-track faculty positions. [225003670040] |I've always had the impression that linguistics departments at elite universities don't hire all that often and it's quite rare to find two positions simultaneously. [225003670050] |Not sure if they just have money to burn or if this is a special situation.
  • [225003670060] |
  • Mr. Verb linked to a study with the title BRITISH ACCENT NO LONGER SEXY, STUDY FINDS. [225003670070] |The linked to article makes a variety of claims: 1) the research was done "by the Department of linguistics and the Department of Psychology at Stanford University" (why psychology gets caps but linguistics doesn't is perhaps another question as well), 2) it's called The Comito Study, and 3) either Dr. Linda Masterson or Dr. Lisa Masterson, or possibly both, are involved. [225003670080] |So far, I cannot find any reference to the study anywhere on Stanford's pages (or anywhere in the googlesphere save the original article), nor can I find either Dr. Linda or Dr. Lisa at Stanford (nor can I find any Masterson at all). [225003670090] |UPDATE: I'm so used to seeing bad science reporting, I just assumed this was legit. [225003670100] |A little follow-up shows this to be the work of the classic bat-boy publication The Weekly World News, a not-too-distant cousin to the Onion. [225003670110] |Shame on me, haha.
  • [225003670120] |
  • While searching for Drs Linda and Lisa on Stanford's page, I discovered that Stanford has implemented some kind of algorithm for matching similar sounding names, so the search page asked me if I wanted to "Find last names that sound like my search term." [225003670130] |One of the earliest, if not THEE earliest sound matching algorithms was Soundex, patented in 1918 and now freely available in a variety of implementations (since the patent has expired). [225003670140] |However, there are far superior algorithms involving various minimum edit distance and bag o' sound phonological comparisons (I spent a brief time at IBM with a group working on this). [225003670150] |I don't know how they've implemented their search, but it's a nifty tool to include in a search engine, imho.
  • [225003680010] |on the evolving language of headlines [225003680020] |Gene Weingarten wrote up a nice rant about the evolution of headlines from the era when print headlines were meant to grab a reader's eye to the modern era where headlines are meant to juice SEO. [225003680030] |Money Quote: Newspapers still have headlines, of course, but they don't seem to strive for greatness or to risk flopping anymore, because editors know that when the stories arrive on the Web, even the best headlines will be changed to something dull but utilitarian. [225003680040] |That's because, on the Web, headlines aren't designed to catch readers' eyes. [225003680050] |They are designed for "search engine optimization," meaning that readers who are looking for information about something will find the story, giving the newspaper a coveted "eyeball." [225003680060] |Putting well-known names in headlines is considered shrewd, even if creativity suffers. [225003680070] |The temptation to end this post as he did was great...but modesty has won...this time... [225003690010] |Urdu is the most influential language IN THE WORLD!!! [225003690020] |The competition is over and Urdu has won!!! [225003690030] |According to "Renowned Urdu scholar" Dr Farman Fatehpuri, "the status of a language should be decided in view of its influence and that Urdu was the most influential language in the world [...] [225003690040] |Urdu has the distinction of having the phonetics and the letters that conform to it. [225003690050] |Other languages are mostly devoid of it.” [225003690060] |Well, there you have it. [225003710010] |Andrew Sullivan is the Sarah Plain of science [225003710020] |Again and again, Andrew Sullivan impresses me with his utter incompetence at any and all things scientific-ee. [225003710030] |For all his tirades against Sarah Palin, it's ironical that he can, in less than tongue-in-cheek manner, be accused of being the Sarah Palin of science. [225003710040] |Yet again he posts an easily falsifiable scientific claim (regarding conversational analysis, or lack thereof...), fails to do even the most basic Google search, then comments on the false claim as if it were true. [225003710050] |He would NEVER accept this kind of behavior from a political blogger, but he routinely engages in this himself when it comes to science. [225003710060] |It begins with Scott Adams, creator of Dilbert, posting this sentence on his own blog A conversation, like dancing, has some rules, although I've never seen them stated anywhere. [225003710070] |Any first year linguist, anthropologist, English major, linebacker, Starbucks barrista, etc would see this statement and say, "hmmm, that seems wrong. [225003710080] |I can't believe no one has ever studied conversation from scientific standpoint. [225003710090] |Let me Google around a bit and see what I can find..." [225003710100] |Sullivan didn't do this, he just reposted a passage from another blogger, uncritically pasted it into his own rather large megaphone, then added his own, misguided, largely wrong comment. [225003710110] |What that little bit of Googling would have given you, dear Scott and Sully, was the fact that there is a rather long history of conversation analysis within linguistics, sociology, anthropology, and now even computational linguistics. [225003710120] |There's a fucking Wiki page for fuck's sake! [225003710130] |And yes, people have been trying to write down the "rules" of conversation for a long time. [225003710140] |They even have a name for them: turn taking. [225003710150] |Though attempts at defining the "rules" of turn-taking have been fraught with problems, nonetheless scholars and scientists have been trying. here's a brief and incomplete but representative list of some freely available papers and resource on the science of conversation analysis: [225003710160] |
  • A Computational Architecture for Conversation (Microsoft, pdf). [225003710170] |We describe representation, inference strategies, and control procedures employed in an automated conversation system named the Bayesian Receptionist. [225003710180] |The prototype is focused on the domain of dialog about goals typically handled by receptionists at the front desks of buildings on the Microsoft corporate campus.
  • [225003710190] |
  • Turn taking in conversation is universal (Max Planck institute for Psycholinguistics): Do people take turns in natural conversation in the same basic way in all languages, or does the turn-taking system vary in each language? [225003710200] |Many anthropologists have suggested the latter, but MPI-researchers have found empirical evidence for robust universals in human conversation. [225003710210] |Their study appears in this week's Proceedings of the National Academy of Sciences.
  • [225003710220] |
  • Speaking while monitoring addressees for understanding (pdf, Stanford): Speakers monitor their own speech and, when they discover problems, make repairs. [225003710230] |In the proposal examined here, speakers also monitor addressees for understanding and, when necessary, alter their utterances in progress. [225003710240] |Addressees cooperate by displaying and signaling their understanding in progress.
  • [225003710250] |
  • Sequencing in Conversational Openings (UCLA):An attempt is made to ascertain rules for the sequencing of a limited part of natural conversation and to determine some properties and empirical consequences of the operation of those rules.
  • [225003710260] |As for Sullivan's contribution: "I think of it as a friendly tennis match. [225003710270] |There is no attempt to score a point or win a match" this is more a function of his perception of a conversation than the reality. [225003710280] |Conversations are always governed by goals and there is competition for the floor inherent in the interaction. [225003710290] |It would be interesting if Sullivan would post a lengthy example of one of his friendly tennis match conversations and let a CA scholar have a go at analyzing the content. [225003710300] |My guess is that we would find that Sully's friendly tennis match is more Serena v. Venus than he's willing to admit.