[225002320010] |a deeply frustrating pursuit [225002320020] |Neuroblogger Jonah Lehrer has a new article about the value of failure in science and how it can lead to discovery. [225002320030] |A nice, if somewhat light, read: Accept Defeat: The Neuroscience of Screwing Up. [225002320040] |The basic point is that our brains have two somewhat competing processes, one for perceiving errors (the “Oh shit!” circuit) and one for deleting irrelevant stuff (the Delete key). [225002320050] |If the delete key wins, important discoveries are ignored (something like that). [225002320060] |Money quote: [225002320070] |While the scientific process is typically seen as a lonely pursuit —researchers solve problems by themselves —Dunbar found that most new scientific ideas emerged from lab meetings, those weekly sessions in which people publicly present their data. [225002320080] |Interestingly, the most important element of the lab meeting wasn’t the presentation —it was the debate that followed. [225002320090] |Dunbar observed that the skeptical (and sometimes heated) questions asked during a group session frequently triggered breakthroughs, as the scientists were forced to reconsider data they’d previously ignored. [225002320100] |The new theory was a product of spontaneous conversation, not solitude; a single bracing query was enough to turn scientists into temporary outsiders, able to look anew at their own work. [225002330010] |How Many Linguists Are There? [225002330020] |The Independent recently published an article about the language documentation efforts of Mark Turin and his colleagues at The World Oral Literature Project. [225002330030] |In the article, Turin was lamenting the large number of undocumented languages (a fair lament) and was quoted as saying this: [225002330040] |There are more linguists in universities around the world than there are spoken languages –but most of them aren't working on this issue. [225002330050] |To me it's amazing that in this day and age, we still have an entirely incomplete image of the world's linguistic diversity. [225002330060] |People do PhDs on the apostrophe in French, yet we still don't know how many languages are spoken. [225002330070] |I found this passage remarkably agitating. [225002330080] |I appreciate Turin's passion for language documentation and I support language documentation efforts, but there are two claims in this passage (one explicit, one implicit) that I object to: [225002330090] |First, I'm not sure there really are more linguists than languages. [225002330100] |Linguistics is a small field (this fact is relevant to both of my objections). [225002330110] |The article uses the fairly common number of 6500 languages. [225002330120] |This is a guesstimation at best. [225002330130] |We don't have a good definition of a language (vs. a dialect), so it's not clear what counts. [225002330140] |This is a non-trivial point. [225002330150] |Figuring out what exactly language is, is the core of linguistics (imho). [225002330160] |I take the problem seriously. [225002330170] |The answer will likely be disappointing to non-linguists. [225002330180] |The answer will likely be something like: there no such thing as a language as traditional conceived. [225002330190] |In any case, I'm fine with using the 6500 number publicly because people like numbers. [225002330200] |They want a number? [225002330210] |Okay, we'll give them 6500. [225002330220] |But within the field, there is no number. [225002330230] |Next, I'm not sure what a linguists is. [225002330240] |This is also non-trivial. [225002330250] |We could say anyone with a PhD in linguistics from an accredited institution is a linguist. [225002330260] |But even that definition requires refinement. [225002330270] |Do cognitive scientists count? [225002330280] |Professors of French? [225002330290] |We could define this as anyone with the skill set required to go into the field and document a language. [225002330300] |Wow, that would actually be a highly restricted set. [225002330310] |No computational linguists. [225002330320] |No psycholinguists. [225002330330] |Not most syntacticians. [225002330340] |Not most phonologists. [225002330350] |Even if we were fairly generous in our definition, I think we'd be hard pressed to count up 7000 linguists. [225002330360] |Second, I object to the notion, implicit in Turin's quote, that language documentation is so critical a goal of linguistics that most linguists should devote their careers to it. [225002330370] |Again, I'm pro-documentation, but there are lots of important tasks to be completed within linguistics. [225002330380] |I believe that understanding how language works in the human brain is the absolute center of linguistics. [225002330390] |All efforts follow from that. [225002330400] |Language is first and foremost a cognitive product of individual human brains. [225002330410] |Yes, there are very interesting sociolinguistic processes that are well worth studying; important cultural interactions that language takes part in to be sure. [225002330420] |But understanding how the individual human brain produces and comprehends language is the key to understanding those sociocultural process. [225002330430] |Look, this was a bit of hyperbole on Turin's part and he doesn't deserves to be beaten about it, but it just got under my skin. [225002330440] |It all needs to be studied. [225002330450] |I get that. [225002330460] |I want to quadruple the number of linguists in the world and set an army of linguists into every town and hamlet, every village and urban center, documenting and analyzing every linguistic feature they can get their greedy hands on. [225002330470] |I also want an army of theoretical linguists, psycholinguists, computational linguists, neurolinguists, and a host of other *linguists. [225002330480] |But there must be method to the madness. [225002330490] |There must be something that coordinates those efforts around a shared goal. [225002330500] |I see that shared goal as understanding how language works in the brain. [225002350010] |Vision Affects Language Processing [225002350020] |Does watching a leaf fall help you process the sentence the leaf is falling down? [225002350030] |Apparently, no, it hurts. [225002350040] |It slows you down. [225002350050] |Cognitive Daily reviews research supporting this conclusion. [225002350060] |Money quote: [225002350070] |...people take longer to process sentences that match the movement of an animation than they do to process sentences that don't match it. [225002350080] |Kaschak's team reasons that we must be using the same region of the brain to process the motion itself as we do to process the language describing that motion. [225002360010] |Brain Farts [225002360020] |The title alone made this worth reading: Anatomy of A Brain Fart. [225002360030] |Money quote: The latest research seems to indicate that brain farts are a unique type of cognitive mistake. [225002360040] |Unlike errors caused by lack of information or experience, or by distractions, brain farts are innate. [225002360050] |They have a predictable neural pattern that emerges up to 30 seconds before they happen. [225002360060] |When you are absorbed in inward-focused thinking such as daydreaming, a collection of brain regions jointly called the default mode network (DMN) starts furiously popping away. [225002360070] |Neuroscientists don’t agree on exactly which parts of the brain compose this network, but they now believe it is one of the busiest neurological systems. [225002370010] |Eskimo Gibberish? [225002370020] |(image from vintage_ads) [225002370030] |Recently, kottke posted this vintage ad featuring two "Eskimos" with one saying , and I quote, "Kripik igloo sop frofu torky." [225002370040] |Commenter bluebear2 at vintage_ads notes that none of those words appear in the online Inuktitut Dictionary. [225002370050] |I would be surprised and impressed if this was anything but gibberish, but I know next to nothing about the Eskimo-Aleut family of languages. [225002370060] |I Googled the sentence (to use that word lightly) and found nothing, of course. [225002370070] |Just thought I'd throw it out there. [225002370080] |Any experts out there care to confirm the obvious? [225002380010] |Google Basterds [225002380020] |Since Liberman at LL just re-confirmed "the observation that Google counts no longer have even order-of-magnitude comparative validity in matters of usage (if they ever did)," I thought I'd pass along my own latest discovery: Google double quotes are not as restrictive in queries as they're claimed to be. [225002380030] |From Google's support page: [225002380040] |Phrase search ("")By putting double quotes around a set of words, you are telling Google to consider the exact words in that exact order without any change. [225002380050] |Google already uses the order and the fact that the words are together as a very strong signal and will stray from it only for a good reason, so quotes are usually unnecessary. [225002380060] |By insisting on phrase search you might be missing good results accidentally. [225002380070] |For example, a search for [ "Alexander Bell" ] (with quotes) will miss the pages that refer to Alexander G. Bell. [225002380080] |But this is not what it seems... [225002380090] |This is a classic recall vs precision issue, right? [225002380100] |If you care about recall, you want to return ALL matches, even if you also return other stuff (inclusive). [225002380110] |If you care about precision, you want to make sure that each return is correct with no errors, even if this means you miss some correct matches (restrictive). [225002380120] |Read more here. (psst, note the very issue Liberman posted about is alive and well here. [225002380130] |I tend to say "recall and precision" while it's quite common, perhaps more so, to say "precision and recall"). [225002380140] |Google, and most search engines, allows us to put double quotes around a query to make it highly restrictive. [225002380150] |In theory, this should mean that a query with no quotes around it should always return at least the same number of matches as the exact same query with quotes, and usually more. [225002380160] |The quoted query matches should be a subset of the unquoted query matches, Got it? [225002380170] |If I'm wrong on this, let me know, but that's my assumption. [225002380180] |Yesterday I wanted to know if some of the better quotes from Tarantino's recent movie Inglourious Basterds were being picked up in general usage yet so I Googled some of them and looked at their search results estimate. [225002380190] |As a sort of baseline, I decided to Google some famous lines from film history, to see how many hits famous lines generally get. [225002380200] |However, some of the lines are similar to common phrases (e.g., "I'll be back" vs "I'll be right back"). [225002380210] |To account for this, I put those lines in double quotes, to restrict the returns to exact matches. [225002380220] |Being a semi-trained researcher, I realized that I should go back and put all lines in double quotes and try to compare apples to apples. [225002380230] |Then I discovered something weird. [225002380240] |In some cases, the more restrictive, double-quoted query returned more hits that the unquoted query. [225002380250] |A lot more. [225002380260] |And the results have stood up through repetition. [225002380270] |For example: [225002380280] |Gone With The Wind about 797,000 for "Frankly, my dear, I don't give a damn!" about 163,000 for Frankly, my dear, I don't give a damn! [225002380290] |Taxi Driver about 17,500,000 for "You talkin' to me?" about 7,450,000 for You talkin' to me? [225002380300] |Maybe I just don't get what the double quotes are doing. [225002380310] |And Google doesn't make money from helping linguist study language; they make money from pairing ads with search queries, and bully for them. [225002380320] |I'm a capitalist at heart. [225002380330] |I don't begrudge anyone making a buck, especially a bunch of seriously smart Stanford PhDs. [225002380340] |But still, it's disappointing that such a powerful engine as Google's isn't more useful to the research community. [225002380350] |I should re-read Adam Kilgarriff’s “Googleology is bad science." [225002390010] |Nine as Narcissism Porn [225002390020] |a rare non-linguistics post... [225002390030] |I just saw the much ballyhooed film Nine.Comparisons with Moulin Rouge and Chicago are unavoidable (especially since the director did Chicago also). [225002390040] |It compares favorably with Chicago in style and substance. [225002390050] |That was not a compliment. [225002390060] |It is a movie dedicated to style over substance. [225002390070] |As Marion Cotillard's character exclaims "style is the new content." [225002390080] |This is unfortunately true (also a nice example of X is the new Y). [225002390090] |And as the spurned wife of the lead, she should know. [225002390100] |However, Moulin Rouge is far superior. [225002390110] |But first, let me be kind and detail the movie's considerable strengths: [225002390120] |
  • It is GORGEOUS. [225002390130] |Not only are all the actors beautiful, but the film techniques will make the average NYU film student scribble furiously in a notebook...or iPhone app, whatever. [225002390140] |This movie is an editor's dream come true. [225002390150] |The juxtaposition of scenes, the rapid camera lens switches, the luxurious theatrical song &dance numbers, and oh my, the colors!...well, they'll make your head swirl with awe at the magic that only film can convey. [225002390160] |This is a technically brilliant film. [225002390170] |Screw Avatar. [225002390180] |These editors and cinematographers deserve Oscars. [225002390190] |Don't wait, give it to them now.
  • [225002390200] |
  • I rather liked the cute dance scene homage to Goldie Hawn's 60s image by daughter Kate Hudson. [225002390210] |Nice bit of meta-Hollywood there.
  • [225002390220] |
  • Two words: Sophia Loren.
  • [225002390230] |Now, on to the critique: [225002390240] |
  • This movie displays a strange nostalgia for 1960s Italian misogyny. [225002390250] |Really? [225002390260] |Why? [225002390270] |The misogyny is masked by brilliant film techniques/tricks, but it's there. [225002390280] |Throughout this entire film, women are little more than a prop to a jackass' journey.
  • [225002390290] |
  • Nine = a man is defined by the women in his life...starting with his mother...c'mon, Freud is sooooo dead.
  • [225002390300] |
  • To quote Gertrude Stein, there is no there there. [225002390310] |This movie is not deep. it only pretends to be. [225002390320] |There is little story worth watching. [225002390330] |A self-indulgent, arrogant narcissist who makes garishly bad movies gets to revel in his own image while the world fawns around him. [225002390340] |The movie fails in its attempt to expose this narcissistic orgy (weak plot twists at the end with the wife and producer; Cotillard says in the end "I can see now it is hopeless") and fails to redeem the character himself (weak self-realization at the end). [225002390350] |This is the worst kind of artistic self-indulgence where the the art is supposed to be redemptive. [225002390360] |No, it's not. [225002390370] |In the same way a crack addict cannot be redeemed by smoking more crack, an arrogant narcissistic artist cannot be redeemed by making yet another movie. [225002390380] |There is no real redemption in this movie. [225002390390] |There is just voyeurism. [225002390400] |It deserves its own 70 min Phantom Menace take down (as I suspect Avatar does too...haven't seen it, ain't gonna).
  • [225002390410] |
  • It celebrates lechery while superficially condemning it. [225002390420] |Make no mistake, this film exalts this man's lechery. [225002390430] |Make this film with ugly people, it's genius; with beautiful people, it's porn. [225002390440] |The beautiful actors and film tricks mask its utter depravity.
  • [225002390450] |
  • It has to be said: Nicole Kidman is a botoxed cartoon. [225002390460] |I cannot take her seriously as an actress.
  • [225002390470] |PS: Okay, gotta throw in a little linguistics. [225002390480] |Should my title be "narcissism porn" or "narcissistic porn"? [225002390490] |The distinction requires me to decide what exactly I think the porn is about. [225002400010] |Interesting... [225002400020] |Neuroskeptic has pledged to avoid the word interesting in his blog posts because it begets intellectual laziness: [225002400030] |Sadly it's easier to just call something interesting than to explain why it is. Partly this is because "interesting" (or "fascinating", "thought-provoking", "intriguing", "notable" etc.) is just one word, and it's easier to write one word than a sentence. [225002400040] |More important is the fact that you probably don't know why you're interested by something until you do some thinking about it. [225002400050] |Reading this, I couldn't help but be reminded of a conversation between two of my academic advisers, quite early in my graduate linguistics studies, about Chomsky's use of the word "interesting" in that he tended to use it as an insult. [225002400060] |We had formed a reading group one summer to discuss The Minimalist Program and discovered that Chomsky would boldly proclaim that one topic was "interesting" while another was not, seemingly by fiat, with little or no explanation. [225002400070] |Our group consensus was that what he really meant was that a linguistic topic was "interesting" if it helped him make his argument; it was "uninteresting" if it did not (we came to the same conclusion about his notion of "narrow syntax", btw; this wiki page lists a variety of other criticisms). [225002410010] |was was syntax [225002410020] |After re-reading my post below I had a moment of syntactic beguilement at my own use of the was that what X was that X construction: Our group consensus was that what he really meant was that a linguistic topic was "interesting" if it helped him make his argument. [225002410030] |I imagined four relevant sentences; two grammatical/acceptable and two ungrammatical/unacceptable according to my own most excellent judgment*. [225002410040] |My challenge to you, dear reader, is to explain why sentences (1-2) are grammatical/unacceptable and why sentences (3-4) are ungrammatical/unacceptable. [225002410050] |
  • our consensus was that what he really meant was X
  • [225002410060] |
  • our consensus was he really meant (that) X
  • [225002410070] |
  • *our consensus was what he really meant (that) X
  • [225002410080] |
  • *our consensus was he really meant was X
  • [225002410090] |*having been schooled by a prominent typologist, I avoid using the term "ungrammatical" in any strict sense, grammaticality judgments being such slippery things. [225002420010] |Behold! The Tweet King! [225002420020] |Has Twitter made us all better computational linguists? [225002420030] |Their 140 character limit forces us all to think in terms of characters (including whitespaces) rather than the slippery notion of words. [225002420040] |Betcha tweetheads understand the concept of offset better than the average 1st year linguist. [225002430010] |Manning on NLP [225002430020] |Freely available: The complete set of 18 lectures from Stanford Professor Christopher Manning's Natural Language Processing course. [225002430030] |With a nifty web player that allows you to take notes on the video. [225002430040] |CS224N - Natural Language Processing. [225002430050] |Excellent description of topics so you can pick and choose your lecture. [225002440010] |More Experiments! [225002440020] |I love online demos and live experiments because they give non-experts a user friendly, non-intimidating way to see some of the bread and butter tools of contemporary linguistics. [225002440030] |Thanks to the excellent blog from the Human Language Processing (HLP) lab at the University of Rochester, I've discovered a few more to pass on (I have a list to your left under Call For Participation). [225002440040] |
  • Alex Drummond 's self paced reading demo (this is a common experimental paradigm within psycholinguistics).
  • [225002440050] |
  • Masha Polinsky’s lab has a variety of experiments up for several languages:
  • [225002440060] |English English: Pre-Test Questionnaire English Experiment English Experiment: Acceptability Judgments Only [225002440070] |Czech Čestina: Experiment [225002440080] |Russian Russian: Pre-Test Questionnaire Russian Experiment [225002450010] |Proud Brother [225002450020] |My sister Lori, a long time pre-school teacher throughout Northern California who now owns her own preschool in Orland CA (and who has big plans to be a huge success as a children's book author someday) has started her own blog. [225002450030] |And it's about time. [225002450040] |She has the soul of a blogger. [225002450050] |
  • Preschool Diary
  • [225002460010] |Theory of Meaning [225002460020] |Posdcasts of most of the lectures from Professor of Philosophy John Campbell's Theory of Meaning course at Cal. [225002460030] |Philosophy 135 - Theory of Meaning [225002460040] |Unfortunately the site doesn't list what topics each podcast covers, so it's a bit of a gamble. [225002460050] |Just open one and have a listen (some video available as well). [225002470010] |That's why they call it money. [225002470020] |How much are NLP start-ups worth? [225002470030] |About $100 million. [225002470040] |That's about how much Nuance just paid for SpinVox, and that's about how much Microsoft paid for Powerset a year and a half ago. [225002470050] |From TechCrunch: [225002470060] |SpinVox, a London-based technology startup that transcribes voicemails to text so that they can be more easily digitized, searched, and manipulated, has been acquired by speech recognition company Nuance for $102.5 million. [225002470070] |Loyal perusers of The Linguist List's job board should be familiar will all of those companies. [225002470080] |But don't let that price tag fool you, SpinVox also had $200 million in investment, so somebody's still waiting to get paid. [225002470090] |(Disclaimer: yes, I understand that valuation is complicated and this coincidence in price tags means nothing, just funnin'). [225002480010] |Ambiguous Hookers & Psycho Sheep Wrestlers [225002480020] |'Tis the season for lists, and this one caught my eye: 50 Funniest Headlines Of 2009 (HT Daily Dish ). [225002480030] |I expected more of them to be linguistically interesting, but few were. [225002480040] |Instead, there are a lot of tasered grammas and schoolboy sex jokes. [225002480050] |Nonetheless, there are a few whose humor lies in the linguistic structure of the headline. [225002480060] |Personal fav: #9 Nutt faces sack. [225002480070] |Here are the others by linguistic category [225002480080] |Lexical Ambiguity #4. [225002480090] |Hooker Named Lay Person Of The Year #7. [225002480100] |Pittsburgh Police Want To See Junk In Your Trunk #23. [225002480110] |Facebook Forms Board To Lick Molesters #38. [225002480120] |Courtney Love Banned From Using Hole #44. [225002480130] |Hooker Named Indoor Athlete Of The Year. [225002480140] |Garden Path #6. [225002480150] |Trooper Fired After Hat Fib Wants Back In [225002480160] |Pseudo-Garden Path #31. [225002480170] |Sheep Wrestlers Feared Psycho [225002480180] |Misspelling #30. [225002480190] |Church Kids Raid Panty's For Foodbank Supplies (note: bonus misuse of apostrophe). [225002480200] |Dyslexia??? #21. [225002480210] |Winter Storm Closes Schools Across P.E.I., N.S [225002490010] |Island Constraints and Mr. Snuffleupagus [225002490020] |Tomorrow, 60 Minutes will air a segment called, no joke, Elephant Language (HT Daily Beast). [225002490030] |It's about a group out of Cornell called the Elephant Listening Project. who believe that the low-frequency infrasonic sounds made by elephants might constitute a language. [225002490040] |I am naturally suspicious because these kinds of claims tend to conflate the notion of language with the more general notion of communication system into a muddled mess of a concept. [225002490050] |Without a good definition of human language, how can we say that some non-human communication system is also a "language." [225002490060] |It's an untestable claim. [225002490070] |There are thousands of human language problems to solve, and few linguists to solve them. [225002490080] |Investigating elephant language is low on the priority list, I'd say. [225002490090] |As I've noted here, animal language stories are just one of those things that gets regular people to say, "gee wiz, really? wow" while it gets academic linguists to say "meh." [225002510010] |Meaning Is A Bit Mysterious... [225002510020] |I love iconoclasts, and writer Edmund Blair Bolles is playing the linguistic iconoclast at his intriguing blog Babel's Dawn (a blog about the origins of speech) by posing 10 Hypotheses About Language and Thought. [225002510030] |Here are the ten, but you'll have to click through to Bolles' page to read his complete thoughts. [225002510040] |Money Quote: [225002510050] |For most people meaning is a bit mysterious. [225002510060] |It seems to be some kind of content that is passed from speaker to listener, but all sorts of paradoxes appear when you investigate that idea closely. [225002510070] |Meaning becomes as mysterious as mind. [225002510080] |On this blog, the meaning of words comes from their ability to pilot the attention of both the speaker and listener... [225002510090] |It occurs to me that I’m in a different position. [225002510100] |I don’t have a mysterious definition of meaning, so I ought to just lay out a series of hypotheses about how this non-mysterious power arose, and suggest what might be sought in order to disprove the hypothesis. [225002510110] |So here is my list of what I’d like to see tested. [225002510120] |
  • All apes perceive well enough to understand language at the single-word level.
  • [225002510130] |
  • Apes can direct one another’s attention.
  • [225002510140] |
  • The critical difference between apes and humans at the single-word level is that humans are motivated to share attention in a triangle of speaker, listener, and topic.
  • [225002510150] |
  • We have evolved special mechanisms that give us more control over our powers of attention.
  • [225002510160] |
  • The power to attend to absent things (remembered or imaginary things) is not exclusive to humans but is probably much more common to them and we probably have special brain mechanisms that facilitate it.
  • [225002510170] |
  • The ability to speak in metaphors came after speech was established because metaphors require an ability to pay attention to two things at once—the perceivable world the metaphors point to, and the invisible world the metaphor is about.
  • [225002510180] |
  • Informal abstractions are metaphors whose meaning has been lost.
  • [225002510190] |
  • Speech contracts came late and gain strength through ritual.
  • [225002510200] |
  • Mysterious symbols are special and came even later.
  • [225002510210] |
  • Logical or mathematical symbols came even later, yet rest on very old powers.
  • [225002510220] |Now go read his blog and think deeply about his questions... [225002520010] |Booting Smack [225002520020] |(image from Boing Boing) [225002520030] |Having discovered this NYC guide to shooting smack (HT Boing Boing), I was little perplexed at the final guideline: Only "boot" once or twice in one shot. [225002520040] |Honestly, I've never heard the term "boot" in reference to drug use before, but drug users are famously inventive linguists (see here) so I rolled with it (note that this definition of "boot" is only 5th on Urban Dictionary's list of meanings), but I'm not clear what the guideline means. [225002520050] |If "to boot" means to "to inject" then what does it mean to inject more than once in one shot? [225002520060] |What does "one shot" refer to? [225002520070] |Apparently it does not refer to the injection, otherwise that would mean "only inject once or twice in one injection," which is incoherent. [225002520080] |Since this is buried under Tip #6: Take Care of Your Veins, I wonder if it means "only use a particular vein for one or two injections at a time" where "a time" means multiple injections in a short period or something like that. [225002520090] |Urban Dictionary does not list this as a meaning for "shot," but the Drug Slang &Terminology Vault lists "blow a shot" as meaning "when an injection misses a vein." [225002520100] |But that doesn't seem to quite have the same meaning either. [225002530010] |Hiro Yoda Speaks [225002530020] |(screen shot from NBC.com/heroes) [225002530030] |I couldn't help but notice on Monday night's episode of Heroes, the recently discombobulated Hiro spoke a couple of sentences in Yoda-speak. [225002530040] |Namely these two: [225002530050] |
  • Good you have done, Princess.
  • [225002530060] |
  • Defeat the dark side, we will.
  • [225002530070] |Well, let me clarify that. [225002530080] |He was translated into English subtitles as speaking Yoda-speak, which is, in English, rather already Japanese-like (verb final and all that). [225002530090] |BUT WAIT! [225002530100] |What was Hiro saying in Japanese? [225002530110] |If Yoda-speak means you take verbs and put them at the end of sentences, then how is Yoda-speak represented in a language that's already verb final? [225002530120] |Were his sentences verb-initial? [225002530130] |I'm curious to know what grammatical funny business the writers came up with to pull this off (or was he simply speaking grammatical Japanese and the English subtitles alone contained the allusion?). [225002530140] |Any Japanese speakers care to clear this up? [225002530150] |PS: I believe my title Hiro Yoda Speaks follows acceptable topic comment order for Japanese for me to model the intended sentence Hiro Speaks Yoda. [225002530160] |Were my title Yoda Hiro Speaks, I believe a better English translation would be "It is Yoda that Hiro speaks." [225002530170] |No? [225002550010] |Word Of The Decade [225002550020] |Benjamin Zimmer posted about the American Dialect Society's Word of the Decade vote coming up tonight in Baltimore and I noticed something unusual about one of the candidates: 9/11. I commented thusly: [225002550030] |Hmmm, for word of the decade I find 9/11 most interesting, linguistically speaking. [225002550040] |While google follows a well known pattern of turning a brand name into a verb (e.g., xerox that for me), 9/11 names an infamous event by the date it occurred. [225002550050] |Are there any other examples of this? [225002550060] |We don’t refer to Pearl Harbor as 12/7 or Waterloo as 6/18 (yep, had to wiki that one). [225002550070] |Normally we use place names. [225002550080] |I’m trying to think of another example of this usage and I’m coming up blank. [225002550090] |Only the fourth of July comes to mind as similar. [225002550100] |Can anyone think of other examples of this, in any language? [225002550110] |I'll be in Baltimore tonight meeting friends at the LSA. [225002550120] |I might pop into the meeting and put in my two cents. [225002550130] |Hopefully there will be rabid debate, angry protestations, booze...too much to hope for fisticuffs? [225002550140] |UPDATE: Peter Taylor posted a nice response over at LL in the comments: it's far more common in Spanish. [225002550150] |Cinco de Mayo probably rings a bell, even if you can't say what happened then. [225002550160] |My city (Valencia, Spain) has a metro stop, a hospital, and I don't know what else named for the 9th October, commemorating the day it was captured from the Moors in 1238. [225002550170] |There are also streets named for (at minimum) the 3rd April, 25th April, 1st May, and 18th July. [225002560010] |Infiltrating The Secret Cabal [225002560020] |Having managed to infiltrate the secrete cabal held in City X (i.e., Baltimore) I discovered a couple of things (pssst, my infiltration might continue into the weekend..I'm not authorized to comment any further). [225002560030] |First, the secret city, famous for its triumphant inner harbor renovation, has a FAR MORE INTERESTING Little Italy neighborhood just a mile further down Pratt street. [225002560040] |My advice, for what it's worth, screw the Inner Harbor's plastic corporate food and walk a mile down the road to a great little restaurant scene. [225002560050] |Second, Ensconced within the glassy, plush confines of the Hilton, I couldn't help but hear Jean Baudrillard Ryan Bingham whispering in my ear, "welcome to the desert of the real." [225002560060] |With its vestigial ports rusting before our eyes, this shipping and steel city desperately clings to its hopes and dreams of reclaiming glory's past by flashing the lights of its corporate sponsors ESPN Zone and Cheesecake Factory. [225002560070] |Yet, its true charm (and yes, there truly is charm in Baltimore) lies in its people and small businesses. [225002560080] |Third, I shared a few flagons of aqua vitae with the chair of a prominent department of brain and cognitive sciences and we seemed to agree on some critical points (can't rule out the effects of the aqua vitae, of course). [225002560090] |I sum up thusly (with the caveat that these are my explications alone on what was expressed under the influence of said aqua vitae and may not reflect any opinion other then mine, in the here and now, blogging under the influence of said aqua vitae): [225002560100] |
  • The Bayesians are coming: the next linguistic wars will not be between different theoretical factions, but between the traditional theoreticians and the statistical computationalists (not necessarily a bad thing, btw).
  • [225002560110] |
  • The bar was good: the demise of comprehensive exams is a bad thing. [225002560120] |They forced students to live up to a basic standard of competence that the wishy-washy replacement requirements fail to enforce.
  • [225002560130] |
  • Good help is hard to find: the scarcity of people who know both the computational/statistical side AND the linguistic side is frustrating.
  • [225002560140] |
  • Brother, can you spare a dime: what happened to the jobs???? [225002560150] |Ain't no jobs no more, don't matter what you wrote your diss on.
  • [225002570010] |More Russian Illusions Than I [225002570020] |Colin Phillips gave a nice plenary talk at the LSA this afternoon on the role grammatical illusions can play in studying the online processing of sentences (was it just me, or did his English accent seem more pronounced than usual? [225002570030] |Was this a social register effect or am I off my rocker?). [225002570040] |He drew a really nice parallel with optical illusions and the value they have added to the study of vision. [225002570050] |The point is that there are some sentences that seem perfectly grammatical at first, but upon reflection, are completely incoherent. [225002570060] |For example: [225002570070] |
  • More people have been to Russia than I have.
  • [225002570080] |Most native speakers of English will read this sentence and be perfectly happy, but re-read it a few times. [225002570090] |Do you see the incoherence? [225002570100] |It's incoherent because ... it's comparing apples to oranges. [225002570110] |In the more people have Xed than Yed construction, both X and Y should be events that "people" have participated in (e.g, more people have watched Avatar than read Moby Dick). [225002570120] |Be careful not to force an interpretation. [225002570130] |Yes, I (and Colin) understand that you can find an interpretation of this sentence that kinda makes sense, but that's not grammar. [225002570140] |Take the following sentence: [225002570150] |
  • the Wallace ball Gromit threw.
  • [225002570160] |Now, most of us can kinda make some sense out of this if we try, sure. [225002570170] |But that's not the point. [225002570180] |The point is that this is clearly an ungrammatical sentence in the English language. [225002570190] |The same is true of the Russia sentence above (well, its ungrammaticality is less clear, but its ungrammatical nonetheless). [225002570200] |Colin's point is that sentences like the Russian sentence can give us valuable insight into the online process of parsing sentences. [225002570210] |His other point seemed to be that we have at least two mechanisms for processing a sentence. [225002570220] |I'll have to dig into this one deeper to explain it, but he has a paper in press detailing these findings and a pre-print is available right now HERE: [225002570230] |
  • Grammatical illusions and selective fallibility in real-time language comprehension. [225002570240] |Colin Phillips, Matt Wagers, &Ellen Lau. [225002570250] |26pp. June 2009. [225002570260] |To appear in Language and Linguistics Compass. pdf.
  • [225002570270] |From that paper:...speakers build richly structured representations as they process a sentence, but that they have different ways of navigating these representations to form linguistic dependencies. [225002570280] |The representations can be navigated using either structural information or using structure insensitive retrieval cues. [225002570290] |In order to explain why structural constraints dominate in some situations but are at least temporarily overridden in others, one does not need to assume architectural priority for structural information. [225002570300] |Rather, structural constraints may impact linguistic dependency formation most strongly in situations where relevant structural information is available in advance of potentially interfering material in the bottom up input. [225002580010] |Code-Switching [225002580020] |I'm rather shocked, pleasantly, that Slate has managed to publish a story involving linguistics that is not completely bonkers. [225002580030] |Chris Beam wrote a remarkably sane and thoughtful explanation of the Harry Reid kerfuffle, couching it in terms of code-switching. [225002580040] |He also quotes John McWhorter, the LLer who has written the best analysis of the story, as far as I can tell. [225002590010] |Give it to me! [225002590020] |Sean, a grad student in linguistics at University of Edinburgh who blogs at The Adventures of Auck (which has a nifty header that you get to play with), has a nice post where he walks through competing hypotheses and experiments regarding the role of pragmatic cues in children's word learning. [225002590030] |Read it HERE. [225002590040] |Money quote: [225002590050] |Children try to integrate cues from different domains into one coherent communicative intention. [225002590060] |It is suggested that it may be harder to modify lexical entries for familiar words without a clear reason than to link novel words to familiar objects. [225002600010] |silly pronouns [225002600020] |kottke gets silly with pronouns. [225002600030] |Money quote: Lemme get this straight...when me was subtracted from you, what's left over is ours? [225002610010] |Transliteration Preferred Over Translation [225002610020] |Ingrid, at her Language On the Move blog, posts about an interesting M.A. thesis that studied the translation of brand names into Arabic. [225002610030] |Money quote: [225002610040] |...basically, he’s saying that the entire target population of an advertising message doesn’t get it. [225002610050] |Small wonder that Arabic speakers often gripe about the way the Arabic language has become “infested” (Al Agha’s term; p. 82) with English. [225002610060] |Al Agha notes that the preferred “translation” strategy in his corpus of Saudi fast-food ads is transliteration rather than translation. [225002610070] |Having worked in the international branding industry (for an ever so brief amount of time), I can attest to the issues and problems that arise that Ingrid discusses further, and they are indeed non-trivial. [225002610080] |It's a good read. [225002620010] |Blue Meat and Clever Research [225002620020] |Cognitive Daily reviews some really clever research on synesthesia, the phenomenon of associating words with colors, as well as other multimodal associations (not to be confused with its poor cousin sound symbolism). [225002620030] |For example, there are people who will experience seeing the color blue when they hear the word meat (the actual word-color associations are not fixed or predictable, as far as I know). [225002620040] |There is neuroscience research suggesting that people who experience this have some sort of overlap in processing areas for the word-color pairs (read an excellent roundup of the research here at NeuroLogica Blog). [225002620050] |But this is a difficult area to study because there are so few true synesthetes and their experiences are inconsistent. [225002620060] |Bargary et al. 2009 wanted to discover when the color association was triggered in the time course of lexical recognition. [225002620070] |Exactly how were they going to track that? [225002620080] |Clever people that they are, the fell back on an old standard in psycholinguistics, the classic McGurk effect which shows how people integrate both auditory cues and visual cues (i.e., lips) to determine what word they're hearing. [225002620090] |More to the point, the McGurk effect shows how people will mis-recognize a word when the word they hear is slightly different from the word they see lips pronounce. [225002620100] |For example, if subjects hear an audio file of the word been and see a soundless video clip of lips pronouncing beep, then they report having heard the word beam (ignore the orthographic difference). [225002620110] |Cognitive Daily has a good YouTube demo here. [225002620120] |Bargary et al. used lexical stimuli that had reliable McGurk effects, but they used several conditions to test synesthesetic associations. [225002620130] |In (1), participants got the full McGurk effect and were asked to choose which color they saw. [225002620140] |In (2), participants got the visual cue, but white noise on audio, then were asked which color they saw. [225002620150] |In (3), the lips were pixellated and only the audio cue was present. [225002620160] |The researchers had to do some clever teasing apart of color terms too, but in the end they concluded that synesthesetic associations occurred late in lexical processing, after both auditory and visual cues were integrated. [225002620170] |I'd have to take a much closer look at the research to see if I felt their conclusion was valid, and this head cold I'm nursing rather precludes such close reading of empirical research right now; nonetheless, I'm impressed with the cleverness of the methodology. [225002630010] |Set Match Run [225002630020] |(screen shot from Comedy Central) [225002630030] |Above, Demetri Martin teaches us about the value of punctuation. [225002640010] |Why Linguists Should Study Math [225002640020] |Bob Carpenter recently made the following comment on one of my posts: I'm very excited to hear that linguists are beginning to take statistics seriously (again). [225002640030] |I'd heard the same thing from Chris Manning a year or so ago, but then other linguists I queried were more skeptical about the role of statistics. [225002640040] |This brought to mind a post by Harvard economist Greg Mankiw called Why Aspiring Economists Need Math. [225002640050] |Some of his comments are relevant to linguists (not all, though). [225002640060] |I Googled around to see if anyone had already blogged something like this, but couldn't find much (I'd be happy to hear I missed something). [225002640070] |Being a bold blogger with little fear of humiliation (often a poor combination, btw) I decided to take a stab at it (UPDATE: I finally discovered a post from summer 2009 by Liberman at LL on basically this same topic here). [225002640080] |Linguists should study math* because... [225002640090] |
  • Math is a tool that helps you.
  • [225002640100] |
  • It's not that hard.
  • [225002640110] |
  • Math is good training for the mind.
  • [225002640120] |
  • Math is the future.
  • [225002640130] |
  • You will be left behind without it.
  • [225002640140] |
  • You will be a better linguist.
  • [225002640150] |Math is a tool that helps you.Math helps you find patterns and make reliable predictions, among other things. [225002640160] |If you are truly serious about studying linguistics, you should be greedy to get your hands on any and all tools you can find that help you study whatever sub-field you specialize in. [225002640170] |I have provided a list of Resources for Linguists on the right panel of this blog and I continue to update it as I find more. [225002640180] |Tools are good. [225002640190] |It's not that hard. [225002640200] |The math a linguist needs ain't rocket science. [225002640210] |And no one is asking you to be brilliant, just competent. [225002640220] |And you don't need to obsess over it, just a few courses. [225002640230] |The biggest challenge is to develop a set of learning materials that are geared towards non-majors. [225002640240] |Math and stats book are generally poorly written for the lay audience and that turns off aspiring linguists and such. [225002640250] |As I said in my response to Carpenter: There is a natural hurdle left to encouraging linguistics students to study stats: they don't like it, that's why they're linguists. [225002640260] |I recall a professor promoting linguistics to a large general ed undergrad course by saying it was one of the few analytical, empirical fields that did not require math. [225002640270] |That resonated with a lot of 19 year olds. [225002640280] |A little hand holding at the undergrad level would go a long way. [225002640290] |A simple "stats for linguists" handbook would be perfect. [225002640300] |I know there are some new R books focused on language data, but I don't know if they do enough hand holding. [225002640310] |Math is good training for the mind.To quote Mankiw: Math is good training for the mind. [225002640320] |It makes you a more rigorous thinker. [225002640330] |Most athletes do push-ups. [225002640340] |Tennis players do push-ups. [225002640350] |Swimmers do push-ups. [225002640360] |Cricket players do push-ups. [225002640370] |Speed skaters do push-ups. [225002640380] |Why do athletes from such a wide range of sports do the same exercise? [225002640390] |Because it's a good basic exercise that helps them regardless of their sport. [225002640400] |Math is push-ups for your mind. [225002640410] |Nuff said. [225002640420] |Math is the future.Like it or not, mathematical models are fast becoming the best way to understand complex phenomenon. [225002640430] |It's no coincidence that biologists, economists, sociologists, neuroscientists, etc. are developing mathematical models to understand their chosen phenomenon. [225002640440] |They work. [225002640450] |Once a phenomena reaches a certain level of complexity, the human mind is simply not able to understand it as a whole. [225002640460] |Our brains evolved to reason about things close in time and space, but complex phenomena like language involve variables that are neither. [225002640470] |How can we understand the interaction of thousands of variables? [225002640480] |With mathematical models and statistical analysis. [225002640490] |Math is not only "a" tool, it's the right tool. [225002640500] |You will be left behind without it. [225002640510] |Any 21st century linguist will be required to read about and understand mathematical models as well as understand statistical methods of analysis. [225002640520] |Whether you are interested in Shakespearean meter (pdf), the sociolinguistic perception of identity (pdf), Hindi verb agreement violations (pdf), or the perception of vowel duration (pdf), the use of math as a tool of analysis is already here and its prevalence will only grow over the next few decades. [225002640530] |If you're not prepared to read articles involving the term Bayesian, or (p<.01), k-means clustering, confidence interval, latent semantic analysis, bimodal and unimodal distributions, N-grams**, etc, then you will be but a shy guest at the feast of linguistics. [225002640540] |You will be a better linguist. [225002640550] |In sum, you want to be a good linguist. [225002640560] |That's why you're getting into this. [225002640570] |That's why you've read this far. [225002640580] |Language problems challenge and fascinate you. [225002640590] |You lie awake at night thinking about them. [225002640600] |You want to be a part of the community of scholars who work to unfold the mysteries of language. [225002640610] |Math is a tool that will help you enter that community and contribute to it in a highly productive way. [225002640620] |HAVING SAID THAT... [225002640630] |It's equally fair to say that those who are more math oriented than linguistics oriented (like the NLPers, computational linguists and such who barge into our language territory with their fancy schmancy algorithms) should tread softly as well. [225002640640] |Yes, it is our responsibility as linguists to understand the math, but it is your responsibility to understand the linguistics, and failing to do so can lead to flawed, vacuous, and even comical results. [225002640650] |To quote Cab Calloway in The Blues Brothers, "your lazy butts are in this too." [225002640660] |I have consistently used this blog to critique such foolishness (and the folks at Language Log have perfected the genre). [225002640670] |It is a mistake to take the linguistics part lightly. [225002640680] |It's not all math. [225002640690] |It's a little math, but it's mostly linguistics. [225002640700] |Here are some of my previous attempts to hold non-linguists accountable for their failure to take the linguistics part seriously enough: [225002640710] |
  • The Full Liberman (taking aim at a psychologist)
  • [225002640720] |
  • Thinking Words (taking aim at a philosopher)
  • [225002640730] |
  • SEX! [225002640740] |TORTURE! [225002640750] |BANANA! (taking aim at psychologists)
  • [225002640760] |
  • On Linguistic Fingerprinting (taking aim at physicists)
  • [225002640770] |
  • Draft of a post on sentiment analysis, in press, so to speak (taking aim at NLPers)
  • [225002640780] |*For simplicity's sake, I chose to conflate the fields of mathematics and statistics into the single term "math." [225002640790] |I'm sure objections can be raised. [225002640800] |**I can imagine a reader complaining that these terms are not necessarily math/stats terms, strictly speaking. [225002640810] |Fair enough. [225002640820] |But I believe it is basically a math/stats education that will help an aspiring linguist understand and make use of them. [225002640830] |Also fair enough? [225002650010] |Ambivalent Unintelligible Syntax [225002650020] |The folks at Talking Brains posted a detailed walk through HERE of a neurolinguistic experiment that looked at where in the brain syntax and intelligibility are processed. [225002650030] |They are happy that the research concludes that it is NOT "a left hemisphere function that primarily involves anterior temporal regions" nor is it "a portion of Broca's area, BA44, [that is] is critical for hierarchical structure processing." [225002650040] |Their ambivalence is based on their perception that the original authors don't see the contradictions inherent in their study. [225002650050] |Money Quote: [225002650060] |What possible syntactic computation could be invoked BOTH by a grammatical violation and unintelligible noises but not by grammatical sentences? [225002650070] |Yes, what computation indeed. [225002660010] |The Daft Effect [225002660020] |Gotta love a scientist who sneaks the word "daft" through the peer review process: Body in Mind. [225002670010] |The Linguistics of Food [225002670020] |At Gambler's House, blogger teofilo provides a very nice walk through of a couple of studies that use linguistics to study the spread of agriculture into the Southwest Unites States from Mexico. [225002670030] |The methodology hinges on 1) tracking loanwords and 2) the assumption that Proto-Northern-Uto-Aztecan (PNUA) is a valid genetic unit. [225002670040] |I'm not qualified to comment, but I felt the post was thorough and raised some fair objections as well as noting strengths. [225002670050] |Money quote: [225002670060] |"...the fact that the loans seem to have gone both ways shows that whatever contact took place involved both groups continuing to exist as social entities of some sort. [225002670070] |This is not evidence for assimilation, in other words, but for peaceful contact between agricultural and hunter-gatherer groups involving the exchange of information that enhanced the subsistence options of both parties." [225002680010] |Blob Wars [225002680020] |(images from Neuroskeptic) [225002680030] |Neuroskeptic reports on some disturbing news that the results of fMRI studies can be seriously impacted by the software package used to analyze the results. [225002680040] |There are several packages available and while most do much the same thing, at least one uses a unique statistical approach which produces different results. [225002680050] |Not "better" or "worse" mind you, just different. [225002680060] |The image above contrasts results using the same data but different analysis software. [225002680070] |Money quote: [225002680080] |Analysis using both programs revealed that during the processing of emotional faces, as compared to the baseline stimulus, there was an increased activation in the visual areas (occipital, fusiform and lingual gyri), in the cerebellum, in the parietal cortex [etc] ... [225002680090] |Conversely, the temporal regions, insula and putamen were found to be activated using the XBAM analysis software only (emphasis added). [225002680100] |The comments on Neuroskeptic's post are detailed and instructive. [225002690010] |A Most Excellent Blog [225002690020] |The excellent blog Cognitive Daily is calling it quits. [225002690030] |This is one of the blogs that has most inspired me and helped me understand just how good academic blogs can be in reviewing and critiquing academic research. [225002690040] |They leave a legacy not only of their excellent posts, but also the founding of the exceptional aggregator Research Blogging which continues to be a constant source of intriguing and thought-provoking reviews of the most current academic research (it happily dominates my feed). [225002690050] |One of the prime architects, Dave Munger, promises a "Mystery Project to be named later" (soon?) and I'm holding my breath (so hurry up Dave!). [225002710010] |Bad Linguistics ... sigh [225002710020] |(cropped image from Huffington Post) [225002710030] |It has long been a grand temptation to use simple word frequency* counts to judge a person's mental state. [225002710040] |Like Freudian Slips, there is an assumption that this will give us a glimpse into what a person "really" believes and feels, deep inside. [225002710050] |This trend came and went within linguistics when digital corpora were first being compiled and analyzed several decades ago. [225002710060] |Linguists quickly realized that this was, in fact, a bogus methodology when they discovered that many (most) claims or hypotheses based solely on a person's simple word frequency data were easily refuted upon deeper inspection. [225002710070] |Nonetheless, the message of the weakness of this technique never quite reached the outside world and word counts continue to be cited, even by reputable people, as a window into the mind of an individual. [225002710080] |Geoff Nunberg recently railed against the practice here: The I's Dont Have It. [225002710090] |The latest victim of this scam is one of the blogging world's most respected statisticians, Nate Silver who performed a word frequency experiment on a variety of U.S. presidential State Of The Union speeches going back to 1962 HERE. [225002710100] |I have a lot of respect for Silver, but I believe he's off the mark on this one. [225002710110] |Silver leads into his analysis talking about his own pleasant surprise at the fact that the speech demonstrated "an awareness of the difficult situation in which the President now finds himself." [225002710120] |Then, he justifies his linguistic analysis by stating that "subjective evaluations of Presidential speeches are notoriously useless. [225002710130] |So let's instead attempt something a bit more rigorous, which is a word frequency analysis..." [225002710140] |He explains his methodology this way: [225002710150] |To investigate, we'll compare the President's speech to the State of the Union addresses delivered by each president since John F. Kennedy in 1962 in advance of their respective midterm elections. [225002710160] |We'll also look at the address that Obama delivered -- not technically a State of the Union -- to the Congress in February, 2009. [225002710170] |I've highlighted a total of about 70 buzzwords from these speeches, which are broken down into six categories. [225002710180] |The numbers you see below reflect the number of times that each President used term in his State of the Union address. [225002710190] |The comparisons and analysis he reports are bogus and at least as "subjective" as his original intuition. [225002710200] |Here's why: [225002710210] |
  • We don't know what causes word frequencies.
  • [225002710220] |
  • We don't know what the effects of word frequencies are.
  • [225002710230] |
  • His sample is skewed.
  • [225002710240] |
  • Silver invented categories that have no cognitive reality.
  • [225002710250] |
  • There are good alternatives.
  • [225002710260] |We don't know what causes word frequencies. [225002710270] |Why does a person use one word more than another? [225002710280] |WE. [225002710290] |DON'T. KNOW. [225002710300] |I understand the simple intuition that this should mean something, but no one actually knows what it means. [225002710310] |We simply don't understand the workings of the brain well enough to study the speech production system well enough to answer this question (despite these guys' suspect claims). [225002710320] |So we are left with pure intuition (which is generally bad in the cognitive sciences because we don't think the way we think we do). [225002710330] |So, again, this methodology is not "objective" as Silver claims (not the simplistic way he implemented it, anyway). [225002710340] |We don't know what the effects of word frequencies are. [225002710350] |The correlate to #1: When a person hears another person use one word more than another, what effect does it have? [225002710360] |WE. [225002710370] |DON'T. KNOW. [225002710380] |Same reasons as above. [225002710390] |This remains the realm of intuition and guesswork. [225002710400] |His sample is skewed. [225002710410] |While I understand that to the lay person, the set of SOTU speeches seems like a coherent category to analyze, it is in fact a linguistically incoherent grouping because these sorts of speeches are constructed slowly, painfully, over time, by teams of individuals, NOT spoken extemporaneously by a single individual. [225002710420] |Silver could spin this as a positive in the sense that the speeches represent presidential administrations as a whole, but this makes the "evidence" (i.e., word frequency) extremely messy. [225002710430] |What factor is driving the frequency of a particular word in a speech? [225002710440] |No clue. [225002710450] |The variables are numerous and unknown (two bad things for "rigorous" analysis). [225002710460] |Having such a messy data set makes interpretation nearly impossible even if we DID know the answers to #1 and #2 (which we don't). [225002710470] |Silver invented categories that have no cognitive reality. [225002710480] |Silver's 70 buzzwords are shoved into six arbitrary categories. [225002710490] |Linguists have bee keen on word categories for ... well ... let's say at least 2500 years. [225002710500] |This we care about. [225002710510] |Deeply. [225002710520] |William Labov famously wrote, "If linguistics can be said to be any one thing it is the study of categories" (full text here). [225002710530] |More recently, in the last few decades, linguists have expanded their repertoire of tools for analyzing lexical categories using psycholinguistic, cognitive linguistic, and computational linguistic tools and methods. [225002710540] |None of these were employed by Silver in determining whether or not his six categories have any coherence or cognitive reality. [225002710550] |He just made them up. [225002710560] |How is this MORE objective than intuition? [225002710570] |There are alternatives. [225002710580] |Let me be clear. [225002710590] |I am a fan of corpus linguistics. [225002710600] |Counting words is good (as Nunberg says, and as many linguists say. [225002710610] |We like this). [225002710620] |But this is just the beginning of a long road of analysis. [225002710630] |It must be done in a systematic and sophisticated way to be of any use. [225002710640] |There are numerous software tools and methodologies that Silver could have made use of that would have given him a more nuanced analysis. [225002710650] |There are whole books that teach people how to do this, such as Corpora in Cognitive Linguistics (just one of many). [225002710660] |Again, I have a lot of respect for Silver and his advanced skill set in stats. [225002710670] |I would love to see Silver bring the full weight of his skills to bear on linguist analysis (as I've said, every linguist should study math and stats), but this experiment falls far short of the mark and he should know better. [225002710680] |To a certain extent, this critique is unfair to Silver because he implicitly seemed to be acknowledging many of these deficits. [225002710690] |All he wanted to do was get a more objective picture of what the SOTU speech meant and how it fits into a bigger picture. [225002710700] |On the other hand, it's a fair critique because he put in a lot of effort and posted the results to his popular and influential blog (yes, I note my blog is neither); one ought not to waste such effort. [225002710710] |There is the glaringly negative possibility that his popularity and influence as a statistician will actually serve to further strengthen the popular but wrong notion that simple word counts are somehow meaningful. [225002710720] |This would be bad. [225002710730] |*By "simple word frequency counts" I mean counting the words a person uses (say, in a speech) without counting anything else or adding any other data to give the frequency counts meaning and context. [225002720010] |yo xochitl [225002720020] |I'm a sucker for installation art projects. [225002720030] |Even though most of them suck, I'm an optimist and I find the genre interesting and compelling so I keep waiting until I find one that doesn't suck. [225002720040] |You can find me sitting in the little white rooms in the basement of the Hirshhorn on many Saturday mornings ... waiting. [225002720050] |So I was happy to stumble upon (though, not via StumbleUpon) the video above called "Barrio Linguistics: An experimental study of the linguistic landscape of Spanish Harlem, New York" (HT Mahalo). [225002720060] |While it is neither experimental nor a study in the academic sense, I found it enjoyable as linguistics related art. [225002720070] |FYI, I'm not sure how to translate the phrase from the video "yo xochitle" (apparently, xochitle is Nahuatle for 'flower' ... [225002720080] |I flower???). [225002720090] |From the video's YouTube page: [225002720100] |Barrio Linguistics is an exploration of nomadic Spanish found in the linguistic landscape of Spanish Harlem, New York City through interactive video installation. [225002720110] |Makes use of photography, animation, poetry, video projection and electroacoutic audio to incite dialog about the role of language and advertising in contemporary society. [225002720120] |The piece is an experimental documentary of the linguistics of south-north migration and the changing face of urban semiotics. [225002720130] |The piece was created by sampling all publicly viewable Spanish language text within a five block radius in Spanish Harlem. [225002720140] |Later, a poem was written using exclusively this vocabulary. [225002720150] |In the installation, the order of the videopoem is dictated by the user. [225002740010] |Good For Them [225002740020] |Titled Software Company Helps Revive 'Sleeping' Language, NPR just did a story on software-based revitalization efforts for Chitimacha, a dead language once spoken by the Chitimacha tribe in Southern Louisiana. [225002740030] |According to the story, "the last native speaker died in 1940" so the revitalization efforts utilize "hundreds of hours of scratchy recordings on wax cylinders, along with extensive notes from linguist Morris Swadesh." [225002740040] |Since I did my graduate work at a linguistics department steeped in descriptive field linguistics, the name Swadesh is well known to me (I've actually used the Swadesh lists). [225002740050] |He was crucial to the early 20th century efforts to classify the indigenous languages of North America. [225002740060] |But the story really piqued my interest when they noted that Rosetta Stone, who is creating the software package, will not own the final product. [225002740070] |Rather, the Chitimacha tribe will and they will have the right to distribute it for free (or charge, whatever they want, they'll own it). [225002740080] |Rosetta Stone has a web page describing their revitalization and preservation efforts here. [225002740090] |They appear to work with communities to procure funding through government and private foundation grants. [225002740100] |I was impressed with the description of their process: [225002740110] |You select the team of language experts, teachers, and speakers from your community. [225002740120] |Rosetta Stone provides the language teaching template, training, technology, recording and photography services, and project planning. [225002740130] |Rosetta Stone turns your knowledge into the final user-ready software. [225002740140] |After 5 years in industry, I have come to respect the value of smart leadership at the project planning level. [225002740150] |It sounds like Rosetta Stone is leveraging their considerable skills and resources at the project planning and execution level to help small communities realize their language and culture related goals. [225002740160] |Good for them. [225002740170] |(PS: just to be clear, I have absolutely no connection, professional or otherwise, to Rosetta Stone. [225002740180] |I've never even used any of their products; this just struck me as a good example of corporate responsibility). [225002750010] |Unreasonable Effectiveness [225002750020] |Let's be honest, many of us find math intimidating. [225002750030] |But it need not be. [225002750040] |I recently explained why linguists should study math; now, over the next several weeks, Steven Strogatz, professor of applied mathematics at Cornell, will be blogging an informal introduction to the basic concepts of mathematics from pre-school to grad school. [225002750050] |He starts with Sesame Street and counting fish to explain the basic idea that numbers are abstractions: [225002750060] |The creative process here is the same as the one that gave us numbers in the first place. [225002750070] |Just as numbers are a shortcut for counting by ones, addition is a shortcut for counting by any amount. [225002750080] |This is how mathematics grows. [225002750090] |The right abstraction leads to new insight, and new power. [225002750100] |This is a NYT blog, so let's hope they don't put it behind their new paywall.. [225002750110] |HT kotkke [225002760010] |100 Years and Counting [225002760020] |(image from The MacGuffin) Neuroblogger, and all around skeptic, The MacGuffin has a nice review of the remarkable relevance of Brodman's 100 year old map of functional areas of the brain HERE. [225002760030] |Money quote: [225002760040] |Brodmann's work helped to revolutionize modern neuroscience. [225002760050] |While many other maps have followed Brodmann's, and even though contemporary research has shown that "his map is incomplete or even wrong in some of the brain regions," many of the areas do correlate very well with various functional areas of the cortex, which is why his work still has relevance 100 years later. [225002780010] |My Many Words for Snow [225002780020] |As the snow descends upon Northern Virginia in the latest winter storm, and as DC's elite line-up at their local Whole Foods and Trader Joe's clutching their reusable bags filled with heavily packaged prepared meals, cardboard-container salads, 6 bottles of wine, and one bottle of water ('cause, ya know, it's an "emergency"), I am struck by the fact that the great Eskimo vocabulary hoax (pdf) is no hoax at all! [225002780030] |It turns out that I too have a great many words for snow. [225002780040] |This evening, while running a few modest errands before the night's predicted 20 inch snow drop, I meticulously recorded the various terms I uttered as synonyms for the fluffy white stuff which descended, rather gracefully, upon the landscape. [225002780050] |A few choice examples (NSFW): [225002780060] |shit [225002780070] |
  • "Why do people drive like such morons in this shit?"
  • [225002780080] |
  • "Hey asshole! [225002780090] |This shit's not Vasoline! [225002780100] |You can drive faster that 6 miles an hour!"
  • [225002780110] |crap [225002780120] |
  • "This crap's gonna be piled up in disgusting dirty brown heaps for weeks."
  • [225002780130] |fuck [225002780140] |
  • "Fuck these fucking fuckers who can't drive in this fuck!"
  • [225002780150] |asshole-shit-motherfucker* [225002780160] |
  • "Ahhhh! [225002780170] |You drive on this asshole-shit-motherfucker like it's nuclear!"
  • [225002780180] |fucking-fuck-fuck [225002780190] |
  • (directed at a plow driver) "push the fucking fuck fuck onto the curb, not back into the road!"
  • [225002780200] |grrrrrrr [225002780210] |
  • "gawd I hate everybody! [225002780220] |All of you! [225002780230] |All because of this ... grrrrrrr!" (picture head exploding)
  • [225002780240] |*asshole-shit-motherfucker is actually quite productive in my dialect. [225002780250] |It replaces a great many phrases. [225002790010] |Dolphin-Bikes and The Iconicity Effect [225002790020] |Since the journal Cognition typically allows free online access to its current volume, I was able to read a recent paper on a topic that I've always found interesting: the role of embodied experience in language processing. [225002790030] |The basic question is, how does our size and shape and orientation as human beings affect our language? [225002790040] |Think about a creature that's physically very different from us, like jelly fish or bacteria or dolphins. [225002790050] |Now imagine those creatures magically had the same cognitive capacity that we do. [225002790060] |Would our language system work for them or would it necessarily have to be different? [225002790070] |As an analogy, think of a simple bicycle. [225002790080] |Bicycles are designed for human bodies. [225002790090] |Now think of dolphins. [225002790100] |Dolphins are smart and can be trained to do many things, but could you train a dolphin to ride a bicycle? [225002790110] |No. [225002790120] |Because a dolphin body would not fit a bicycle properly. [225002790130] |You would have to design another mechanism around the dolphin body. [225002790140] |In all likelihood, the dolphin-bike would not be ridable by a human because our bodies just wouldn't fit properly. [225002790150] |And we could point to the features that were not right. [225002790160] |Now, think of language as a mechanism that was designed by evolution to fit humans. [225002790170] |Can we discover what parts of the language mechanism require human-like experiences (e.g., standing upright, two-eyes in front of our head, etc). [225002790180] |Yes, we can. [225002790190] |For example, researchers have already shown what's called an iconicity effect for semantic similarity recognition. [225002790200] |When participants are shown the word ceiling visually presented above the word basement, they are faster to recognize that the words are semantically related than when the same words are shown in the opposite presentation. [225002790210] |Note that a creature like bacteria might perceive the relationship between ceiling/basement very differently than we humans do. [225002790220] |In The linguistic and embodied nature of conceptual processing, Louwerse & Jeuniaux (full citation below) performed four related experiments to test what exactly was driving these kinds of results. [225002790230] |In particular, they wanted to know "to what extent, and under what conditions, both embodied and linguistic factors are used in conceptual processing." [225002790240] |From their abstract The embodiment factor predicted error rates and response time better for pictures, whereas the linguistic factor predicted error rates and response time better for words. [225002790250] |These findings were modified by task, with the embodiment factor being strongest in iconicity judgments for pictures and the linguistic factor being strongest in semantic judgments for words. [225002790260] |Both factors predicted error rates and response time for both semantic and iconicity judgments. [225002790270] |These findings support the view that conceptual processing is both linguistic and embodied, with a bias for the embodiment or the linguistic factor depending on the nature of the task and the stimuli. [225002790280] |This was well done research and I liked the paper, but I feel the need to nit-pick (it's what I'm best at). [225002790290] |
  • Interesting use of LSA to determine semantic similarity (LSA is a sophisticated statistical analysis of corpora). [225002790300] |But I wonder if this is the right measure. [225002790310] |In fact, they discuss a 15% error rate (where participants decided two particular words are not similar when LSA says they are) and they didn't feel this was an issue. [225002790320] |But in the bigger picture, why use LSA at all? [225002790330] |These are small data sets. [225002790340] |You could just run a bunch of word pairs through a paper-pencil judgment task and use those results. [225002790350] |That would match the experiment's task more closely. [225002790360] |Don't get me wrong, I Like LSA. [225002790370] |I'm impressed with its value as a tool for linguistic research. [225002790380] |But it's not right for everything.
  • [225002790390] |
  • I didn't see the frequency of word pairs as a specifically linguistic factor. [225002790400] |This is a quip regarding terminology, not methodology. [225002790410] |I think it was a smart metric to utilize. [225002790420] |There certainly are plenty of frequency effects in language, but to call those effects linguistic factors seems misleading.
  • [225002790430] |
  • They only used one iconic dimension (over/under) to create all of their iconic stimuli (e.g., monitor/keyboard, boot/heel, steeple/church). [225002790440] |It may be the case that it's just easier to find example pairs for this dimension (which would be an interesting fact unto itself). [225002790450] |But it occurred to me that a mix of dimensions would be good.
  • [225002790460] |NOTE: In my first draft of this post I used dogs and doggie bikes in my analogy, then got wise and checked with the youtubes; sure enough, some Japanese guy taught a dog to ride a bike. [225002790470] |Cute video here. [225002790480] |Full Citation Louwerse, M., &Jeuniaux, P. (2010). [225002790490] |The linguistic and embodied nature of conceptual processing Cognition, 114 (1), 96-104 DOI: 10.1016/j.cognition.2009.09.002 [225002800010] |Speaking in Tongues [225002800020] |A couple good blog posts on neurolinguistic research on the phenomenon of glossolalia (aka, speaking in tongues). [225002800030] |The take away message seems to be that yes, there is some curious brain activity correlated with speaking in tongues, it's just not clear what it means and there's so little data that not much can be confirmed or denied. [225002800040] |But as Brain Blogger put it, the studies point to "the act of speaking in tongues as a verifiable language phenomenon that invites further study." [225002800050] |
  • Speaking in Tongues –A Neural Snapshot at Brain Blogger (Feb 7, 2010)
  • [225002800060] |
  • Glossolalia – The Neurocritic (Nov 4, 2006)
  • [225002810010] |Math Rocks [225002810020] |(image from NYT) The post title is intentionally ambiguous. [225002810030] |In this case, rather than it being a full clause where math is the subject and rocks is the intransitive verb, it is a simple NP where math modifies the plural noun rocks. [225002810040] |This is the better reading in relation to this post simply because I am referring to the second installment of Steven Strogatz's excellent NYT series wherein he explains the elements of mathematics to a lay audience. [225002810050] |His first topic was the value of abstractness. [225002810060] |His second, the value of rocks (or rather, the value of concrete teaching methods like using groups of rocks to demonstrate the meaning of squares, primes, odd vs even numbers, etc). [225002810070] |This series is fast turning into a must read. [225002810080] |In case anyone wonders why a linguist is referencing a math blog, read THIS. [225002820010] |Snowmageddon 2010!!! [225002820020] |(image from AP) As winter's fury descends yet again on the Metro DC area (and my personal list of words for snow grows even larger), two words are competing for the right to name this bloody awful event. [225002820030] |Snowmageddon &Snowpocalypse. [225002820040] |So which is it to be? [225002820050] |As of right now, Snowmageddon is leading the Google/Bing frequency counts. [225002820060] |I'm not sure if Bing always gives higher counts, but my faith, what little there ever was, in Google counts is all but gone (see here, here, here for relevant discussion). [225002820070] |Snowmageddon = 801,000/1,880,000 Snowpocalypse = 375,000/1,060,00 [225002820080] |UPDATE (02/13/2010): Snowmageddon maintains its lead. [225002820090] |Snowmageddon = 855,000/2,280,000 Snowpocalypse = 791,000/1,350,000 [225002820100] |For what it's worth, I personally prefer Snowmageddon because the w-m transition seems more natural (i.e., in accord with English phonotactics) than the w-p transition. [225002820110] |Diphones are the backbone of speech synthesis systems. [225002820120] |Surely someone has published frequencies of diphone transitions, right? [225002820130] |I found one paper referencing frequency counts but I haven't found the data. [225002820140] |CitationKuperman, V., Ernestus, M. and Baayen R. H. (2008). [225002820150] |Frequency distributions of uniphones, diphones and triphones in spontaneous speech. [225002820160] |The Journal of the Acoustical Society of America 124(6), 3897-3908. [225002830010] |Terminator 2.0 [225002830020] |Watch Skynet become self aware at the 0:48 mark (HT Boing Boing) [225002840010] |A Brief History of 'Snowmageddon' [225002840020] |Following a lead from a Facebook response I saw on a friend's comment, I thought I had discovered the origin of the term Snowmageddon from a 1998 2008 storm in Minnesota HERE. [225002840030] |However, being a linguist, I decided to follow-up a bit. [225002840040] |Of course, I started with Mark Davies' BYU Corpora, but had no luck discovering the term. [225002840050] |Then I did some Googling/Binging. [225002840060] |In fact, the earliest instance of the term I could find comes from that distant year 2007 HERE. [225002840070] |2008 seems to have been a banner year for the term across the whole country. [225002840080] |Numerous examples follow: [225002840090] |
  • Here
  • [225002840100] |
  • Here (YouTube)
  • [225002840110] |
  • Here (Urban Dictionar, def 3)
  • [225002840120] |
  • Here
  • [225002840130] |
  • Here
  • [225002840140] |
  • Here
  • [225002840150] |
  • etc...
  • [225002850010] |#snowtoriousBIG [225002850020] |(screen grab from The Daily Show) The twitter world is abuzz with snowmageddon-fever and the synonyms are coining at a rapid pace. [225002850030] |Here's a modest list of known hashtags referring to the recent storms hitting the East Coast of the US (personal fav = #KaiserSnowze) [225002850040] |Twitter Hashtags #snowpocalypse #Snowzilla #snotoriousBIG #snowtoriousBIG #snOMG [225002850050] |More After The Jump #snowmageddon #Snowverkill #snogasm #Snowdiculous #thundersnow #snowdiculous #sNoMAS #snotastic #snomulent #snorrific #snolysmokes #snowverit #Snowsanity #snoverlords #toocoldtokill [225002850060] |I'd love to see an app that would give me co-occurrence counts for Twitter hashtags. [225002850070] |My understanding is that te Twitter API limits the number of returns though. [225002850080] |UPDATE: there seems to be good evidence that the hashtag #snowmageddon originated in Minnesota in 2008 HERE. [225002850090] |Just discovered, evidence of the word's existence in 2007 HERE. [225002850100] |UPDATE 2: Two new additions [225002850110] |#tsnownami #snownami [225002860010] |Figure This One Out [225002860020] |This headline writer is just flat messin' with us: [225002860030] |Summer born lucky are born rich. [225002860040] |And yes, it's grammatical. [225002870010] |Having Reason To Discourse Upon The Particle -soever [225002870020] |Having spent the better part of this weekend reading Thomas More's Utopia for Monday's book club meeting, for truly no more suitable exercise of mind fits me than a quiet afternoon's reading, I'm naturally predisposed to write in a style more favorable to the musty halls of libraries, once the repositories of great and wonderful learning, now the lodgings of vagabonds and stools of too too solid a material, than this the new and vast tubular nebula...(shakes it off). [225002870030] |I discovered in the free PDF version I downloaded from HERE* a use of the particle -soever, that I found odd. [225002870040] |In my dialect (Northern Californian American English), there is one and only one acceptable use of -soever: 'whatsoever.' [225002870050] |All other uses sound awkward or flat ungrammatical. [225002870060] |But in this book, I discovered five distinct uses: [225002870070] |
  • 12 - whatsoever
  • [225002870080] |
  • 8 - 'how X soever'
  • [225002870090] |
  • 1 - whichsoever
  • [225002870100] |
  • 1 - whithersoever
  • [225002870110] |
  • 1 - 'as X soever'
  • [225002870120] |The 'how X soever' construction first jumped out at me as surprising, then I noticed the other uses. [225002870130] |For me, 'whichsoever' is flat ungrammatical and 'withersoever' is clearly archaic (wither anything sounds archaic to me). [225002870140] |I decided to do just a tiny bit of research on these constructions to see what I could find (in a short time, using freely available resources). [225002870150] |What I discovered was ... [225002870160] |Google/Bing counts [225002870170] |
  • soever 4,170,000/222,000
  • [225002870180] |
  • whatsoever 32,200,000/21,900,000 "what * soever" 13,200,000/25,300,000 (included whatsoever)
  • [225002870190] |
  • howsoever 1,330,000/121,000 "how * soever" 9,750,000/654,000
  • [225002870200] |
  • whichsoever 155,000/17,400 "which * soever" 13,600,000/664,000
  • [225002870210] |
  • whithersoever 296,000/35,900 "whither * soever" 2,320,000/658,000
  • [225002870220] |
  • "as * soever" 12,200,000/27
  • [225002870230] |
  • whomsoever 1,380,000/231,000 "whom * soever"4,650,000/661,000
  • [225002870240] |
  • wheresoever 637,000/67,400 "where * soever" 6,170,000/661,000
  • [225002870250] |
  • whysoever 3,060/294 "why * soever" 6/14
  • [225002870260] |Note that the "wh- * soever" counts were very noisy and are not worth the bits their they're written on. [225002870270] |Also note that Bing returned 5 counts in the 600,000 range. [225002870280] |I found that a tad suspicious. [225002870290] |Clearly, whatsoever is the most frequent, and that accords with my intuition. [225002870300] |But whysoever is virtually nonexistent. [225002870310] |Time Corpus Next, and last, I went to the trusty Time Magazine Corpus of American English. [225002870320] |This lets me sketch the frequency of a word decade by decade over the last 100 years. [225002870330] |I searched for the whole words only (not the "wh- * soever" constructions). [225002870340] |soever [225002870350] |whosoever whomsoever whatsoever wheresoever whensoever whysoever whithersoever What these tables tell us (apart from the fact that whatsoever has always been the most frequent variation), is that all of these uses have been in use at some point over the last 100 years. (pssst, is it possible to download the data to a csv file or something so I can display the data in different ways, rather than screen grabs?). [225002870360] |Thus endeth the day's blogging. [225002870370] |*Try as I might, I could not find any info on who the translator was for this version. [225002870380] |I choose not to follow that thought to its plausible conclusion. [225002880010] |Tumblr, Flickr, rrrrrrrrrrrrrrrrr [225002880020] |After considering a post on names like Tumblr and Flickr, I discovered that linguistic mystic was a couple years ahead of me having posted on the use of syllabic consonants in Web 2.0 apps HERE. [225002880030] |Money quote: [225002880040] |...people seem to be recognizing the syllabicity of these final consonants, and skipping the written vowels altogether when creating their site names. [225002880050] |The flickr -r may well have started the game, but now completely unrelated sites are becoming Web 2.0 by not including the written vowel in words with syllabic endings. [225002880060] |Pooln chose its site name over “Poolin” or “Poolen”, tumblr over “tumbler”, and I suspect it’s only a matter of time before the first sites ending in /l/ pop up (at the time of writing, rumbl, tumbl and bumbl were already reserved). [225002880070] |Interestingly, I’m yet to see a syllabic M site (perhaps because we generally just write the m with now vowel, as in “chasm” or “orgasm”). [225002880080] |Who knows, though, maybe “phantm” is the next Web 2.0 ghost hunting site. [225002890010] |Eureka! [225002890020] |Generally I'm not a fan of new journals. [225002890030] |Too much academic fluff is getting published already, I see no reason to fluff even more. [225002890040] |However, this new journal struck me as having a novel and valuable mission behind it: The Journal of Serendipitous and Unexpected Results (JSUR). [225002890050] |An important component of scientific discovery is a disciplined examination of research results that contradict or negate extant hypotheses. [225002890060] |Indeed the history of science is rife with examples of important discoveries arising from such results. [225002890070] |However, there is a distinct lack of a forum in which such results can be presented and discussed in any meaningful way. [225002890080] |We believe a forum for and dialogue on serendipitous and unexpected results will provide valuable insight and inform modern research practices (emphasis added). [225002890090] |It's like they created a whole journal just for Dan Everett! [225002890100] |My first reaction was to double check that this wasn't coming from The Onion, but it appears to be legit. [225002890110] |Jonah Lehrer recently made a similar point (see here) about the value of failure in science. [225002890120] |In fact, there are informal forums for this kind of discussion; namely, meetings with advisors and lab meetings (as Lehrer points out). [225002890130] |But rarely does this discussion get formalized and published. [225002890140] |To pique the imagination of researchers, the journal editors pose a serious series of question templates. [225002890150] |Which of the following are relevant to linguistics? [225002890160] |Can you demonstrate that: [225002890170] |
  • Technique X fails on problem Y.
  • [225002890180] |
  • Hypothesis X can't be proven using method Y.
  • [225002890190] |
  • Protocol X performs poorly for task Y.
  • [225002890200] |
  • Method X has unexpected fundamental limitations.
  • [225002890210] |
  • While investigating X, you discovered Y.
  • [225002890220] |
  • Model X can't capture the behavior of phenomenon Y.
  • [225002890230] |
  • Failure X is explained by Y.
  • [225002890240] |
  • Assumption X doesn't hold in domain Y.
  • [225002890250] |
  • Event X shouldn't happen, but it does.
  • [225002890260] |(HT Boing Boing) [225002900010] |Inuktitut's Millionth Word!! [225002900020] |For some time now, the English speaking linguistics world has anxiously awaited the arrival of our millionth word in English (see here and here). [225002900030] |I have a bottle of Freixenet permanently on ice just for that wondrous day. [225002900040] |But alas! [225002900050] |It appears that Inuktitut has beaten my native language to the prize of all prizes. [225002900060] |According to this story about Microsoft's Inuit language software, [225002900070] |More than one million words have been programmed in Inuktitut through the collaboration, about 5,000 of which are new Inuktitut words (emphasis added). [225002900080] |I'm a gracious loser. [225002900090] |Congratulations Inuktitut. [225002900100] |See you at the 2 millionth mark. [225002910010] |A Constraint Based Approach To Figure Skating [225002910020] |While perhaps not quite a pure crash blossom, this headline caught me off guard: [225002910030] |Is Figure Skating Fixed? [225002910040] |Honestly, my first reaction was to wonder if there was a new scoring system (yes, there is) and what was wrong with the old one (bias and collusion). [225002910050] |In other words, what was broken and how was it improved? [225002910060] |Of course, there's another meaning of fixed -- 'to cheat.' [225002910070] |In other words, are figure skating outcomes rigged by cheating? [225002910080] |Were this headline from any other publication than the increasingly dumbed down Slate, I'd assume the ambiguity was intentional, but with Slate these days, you just never know. [225002910090] |Note that there are at least two other senses for the word fixed: to spay/neuter a pet and to have sufficient amount of something like money (British English as in 'You Kev mate, you fixed for goin' out later? [225002910100] |HT Urban Dictionary). [225002910110] |With at least 4 senses to choose from, no wonder I was a tad confused. [225002910120] |But how did my super duper human language processing system resolve this? [225002910130] |This headline reminded me of James Pustejovsky's somewhat older work on lexical ambiguity (aka polysemy) and the mechanism of lexical underspecification. [225002910140] |For example, in The Semantics of Lexical Underspecification (1998) Pustejovsky contrasts the various senses of good: [225002910150] |
  • a good book
  • [225002910160] |
  • a good meal
  • [225002910170] |
  • a good knife
  • [225002910180] |The adjective good does not mean the same thing in these three phrases because a book is good in a different sense than a knife is good. [225002910190] |The qualities that make a book good are mental while the qualities that make a knife good are physical. [225002910200] |There is, however, some core meaning of good that's consistent across its different senses. [225002910210] |In Pustejovsky's words, the adjective good "can be analyzed as an event modifier which subselects for a relational interpretation available in the head noun." [225002910220] |In lay terms, he's proposing that each noun (book, meal, knife) carries with it as part of its meaning some notion of the kinds of things people normally do with them. [225002910230] |So, people normally read books, eat meals, and cut with knives. [225002910240] |If the meaning of the nouns (book, meal, knife) includes these events, then the adjective good could be modifying the events (the verbs read, eat, cut), not the noun. [225002910250] |Note that this cuts against the grade school definition of adjectives as words that modify nouns. [225002910260] |To work, this hypothesis requires a very complex lexical definition stored in the mental lexicon (the brain dictionary). [225002910270] |Here's what the definition for book would have to look like: [225002910280] |For our purposes, we simply need to see that the lexical entry for good contains a TELIC feature read. [225002910290] |When the noun book is modified by the adjective good, so says this hypothesis, the adjective is modifying the act of reading, not the book itself. [225002910300] |Since each noun (book, meal, knife) has a different set of TELIC event features unique to them (read, eat, cut), the polysemy of good can be explained rather elegantly. [225002910310] |The adjective good isn't necessarily polysemic at all; rather, the word good picks out something different depending on the noun it's modifying. [225002910320] |It's the nouns that differ, not the adjective. [225002910330] |Back when I was a grad student, I really liked Pustejovsky's work, as well as HPSG, which bears a lot of similarity (Stanford has a nice page describing the Leading Ideas of HPSG, first amongst them is "Strict Lexicalism"). [225002910340] |But I always had a sneaking suspicion that it was more engineering than science. [225002910350] |In other words, there was a tremendous amount of literature explaining how a lexicalist system could account for a large variety of language phenomena, and precious little literature on whether or not there was any psychological reality to any of it. [225002910360] |This suspicion was part of my own evolution into a very psycholinguistics oriented linguist (i.e., I mostly want to know how the brain does language). [225002910370] |I'll note that this is a bit unfair for two reasons: [225002910380] |
  • There actually is some psycholinguistics research on HPSG and lexicalist approaches, just not as much as I'd like. [225002910390] |Maybe my expectations are unfair but I spent a few minutes searching through Stanford's lengthy HPSG Bibliography and found only two citations that looked like experimental tests of HPSG hypotheses (see here, and here). [225002910400] |I didn't search thoroughly though.
  • [225002910410] |
  • It's really really hard to design psycholinguistic experiments to test these hypotheses.Until the tools and field of neuroscience progress further, there's not much syntacticians can do about this.
  • [225002910420] |Neither of these reasons are unique to HPSG either. [225002910430] |Grammatical theory is hard to test in general because we just don't have the tools and understanding of the brain necessary to thoroughly investigate all the complex hypotheses. [225002910440] |Nonetheless, as a student I tired of reading yet-another-HPSG-analysis-of construction X in language Y papers. [225002910450] |In any case, the grammar game was fixed by Chomsky's thugs for decades when they took even the most tame non-dominant grammarians to the CFG/Transformational/P&P/Minimalist vet to be fixed so I'm happy to see any and all lively debate supported. [225002910460] |I'll leave it to Ivan and Tom to fix HPSG as needed to account for new psycholinguistic data. [225002910470] |Besides, I'm fixed with blogging topics for now, I don't need any more. [225002910480] |James Pustejovsky (1998). [225002910490] |The Semantics of Lexical Underspecification Folia Linguistica [225002920010] |The World's Lousy Fart [225002920020] |Dear gawd I love Sitemeter. [225002920030] |The Brits will never get over their love of fart jokes, will they? [225002930010] |Paper Is The Enemy Of Words [225002930020] |Thanks to the Twitter hashtag #linguistics, I discovered 5 Must-See TED Talks On Language. [225002930030] |It's an interesting collection of short videos from past TED talks (still waiting for most of the 2010 TED talks to be available). [225002930040] |I found Pinker's 2005 talk enjoyable, if a bit conventional for anyone who has spent time in a linguistics department, that is. [225002930050] |He runs the gamut of ditransitive/direct object alternation, Gricean maxims, game theory, etc. [225002930060] |His key point is that language is a way of negotiating relationships. [225002930070] |But the real gem by far is the 2007 TED talk by Erin McKean, Editor-in-chief of the American Heritage dictionary. [225002930080] |She is one of those rare people whose enthusiasm and bright personality is infectious and delightful. [225002930090] |Highlights of her talk: [225002930100] |
  • Dictionaries are compiled, not carved.
  • [225002930110] |
  • Lexicographers get to say fun words like lexicographical = double dactyl like Higgledy Piggledy.
  • [225002930120] |
  • Lexicographers are not linguistic traffic cops, they're fisherman.
  • [225002930130] |
  • The idea of the dictionary was fixed in the 1800s by the OED (this is bad).
  • [225002930140] |
  • "Dictionaries are Victorian design merged with modern propulsion".
  • [225002930150] |
  • OMG! [225002930160] |She references steampunk at TED (3:47 mark). [225002930170] |This is awesome!
  • [225002930180] |
  • Bad online dictionaries take away serendipity -- this is bordering on brilliant.
  • [225002930190] |
  • She ascends into sublime genius as she explains the ham-butt problem with dictionaries (5:01 mark).
  • [225002930200] |
  • Don't hate bad words, hate bad dictionaries.
  • [225002930210] |
  • Paper is the enemy of words (6:12 mark).
  • [225002930220] |
  • Interesting analogy: what if biologists only studied cute animals?
  • [225002930230] |
  • How do you know if a word is real? [225002930240] |Not because it's in a dictionary; rather, a word is real because people use it.
  • [225002930250] |
  • Worry less about control, more about description.
  • [225002930260] |
  • Undictionaried words. [225002930270] |Brilliant.
  • [225002930280] |
  • Asking for help is good.
  • [225002930290] |
  • "We're missing California from American English." [225002930300] |(11:55 mark)
  • [225002930310] |
  • "If we can find comets without a telescope, shouldn't we be able to find words?" [225002930320] |Preach it sistah!
  • [225002930330] |
  • "The internet is made up of words and enthusiasm."
  • [225002930340] |
  • Nice point: a word without its context is pretty... pretty useless.
  • [225002930350] |
  • In which she uses a word with which I am not familiar, and as yet am unable to discover: synochdocaly or signicdocically or cynicdocically...
  • [225002930360] |
  • Right now, dictionaries are imperfect samples, but we could make THE dictionary with ALL the words.
  • [225002930370] |
  • Web dictionaries mean we can discard the artificial distinction between good words and bad words.
  • [225002930380] |
  • I love this woman.
  • [225002940010] |affect effect its it's dolphin [225002940020] |I find this search query remarkably disturbing. [225002940030] |I just don't understand what this person could have possibly been searching for. [225002940040] |I want searching to be more .. well .. rational. [225002940050] |I may not get to sleep tonight. [225002950010] |senses and metaphors [225002950020] |NLP guru Hal Daume (who just announced he's taking a new position at U Maryland) has a nice post on senses vs metaphors with interesting comments as well. [225002950030] |Money quote: [225002950040] |But I can imagine a system roughly like the following. [225002950050] |First, find the verb and it's frame and true literal meaning (maybe it actually does have more than one). [225002950060] |This verb frame will impose some restrictions on its arguments (for instance, drive might say that both the agent and theme have to be animate). [225002950070] |If you encounter something where this is not true (eg., a "car" as a theme or "passion" as an agent), you know that this must be a metaphorical usage. [225002950080] |At this point, you have to deduce what it must mean. [225002950090] |That is, if we have some semantics associated with the literal interpretation, we have to figure out how to munge it to work in the metaphorical case. [225002950100] |For instance, for drive, we might say that the semantics are roughly "E = theme moves &E' = theme executes E &agent causes E'" If the patient cannot actually execute things (it's a nail), then we have to figure that something else (eg., in this case, the agent) did the actual executing. [225002950110] |Etc. [225002950120] |Sounds like a job for FrameNet (if FrameNet were better ... and the page actually loaded, that is, you may have to settle for the Wikipedia entry). [225002950130] |My own review of a sense disambiguation hypothesis here. [225002960010] |Who Dat in Maryland [225002960020] |Is the University of Maryland the hottest linguistics school in the US? [225002960030] |I started thinking about this after reading that Hal Daume will be joining the faculty. [225002960040] |We don't normally talk about schools this way, but we talk about sports teams like this every day. [225002960050] |So I'm gonna play a little game and cast a few linguistics departments as contemporary NFL teams. [225002960060] |While goofing off on this, I was surprised by how similar some of the schools are to their local NFL teams. [225002960070] |
  • U. Maryland = NO Saints. [225002960080] |Spent the last few years quietly building a top team and now everyone sees how good they are. [225002960090] |All around quality in all positions and solid special teams. [225002960100] |Depth and breadth in one team. [225002960110] |Still adding new skill players, they're looking to the future. [225002960120] |Tough to beat. [225002960130] |Fun to watch. [225002960140] |Can they repeat?
  • [225002960150] |
  • MIT = NE Patriots. [225002960160] |They still get a respectable number of wins, but the dynasty is over and no one fears them any more. [225002960170] |Not likely to be a factor in the near future. [225002960180] |Who will replace Brady?
  • [225002960190] |
  • Penn = Philly Eagles. [225002960200] |Always in the playoffs. [225002960210] |Always tough. [225002960220] |Lots of weapons. [225002960230] |McNabb scares everyone. [225002960240] |Fearsome reputation. [225002960250] |But the Lombardi trophy haunts them.
  • [225002960260] |
  • SUNY Buffalo = Buffalo Bills. [225002960270] |Flashes of greatness here and there, but you can't win the big one on special teams alone. [225002960280] |Lots of talent has come through, but too many top players have come and gone without staying. [225002960290] |Loyal fans, but still longing for the good old days. [225002960300] |They need to retain players and show they can upset the big dogs to regain their reputation.
  • [225002960310] |
  • Stanford = Indy Colts. [225002960320] |Always a factor. [225002960330] |Always a threat to win it. [225002960340] |Too many great players not to be pre-season #1.
  • [225002960350] |
  • UC Santa Barbara = Oakland Raiders. [225002960360] |Still got some big names. [225002960370] |Still can make the big play. [225002960380] |But the brash boldness of its reputation doesn't carry as much weight these days. [225002960390] |Who's the next Howie Long?
  • [225002960400] |
  • UT Austin = Dallas Cowboys. [225002960410] |Dangerous team. [225002960420] |Some scary weapons. [225002960430] |They can beat anybody on any given day. [225002960440] |But they can be beaten on any given day too. [225002960450] |Need a spark to be seen as a top dog.
  • [225002960460] |
  • Harvard = Cleveland Browns. [225002960470] |Umm ... they still have a team?
  • [225002960480] |
  • UC Berkeley = SF 49ers. [225002960490] |My sentimental pic. [225002960500] |I've been a fan for too long to give up on you, but the glory days are fading fast. [225002960510] |The 80s ended 20 years ago and you're still looking for Joe's replacement. [225002960520] |Your Hall of Fame is impressive, but what have you done for me lately?