[225005130010] |Call For Participation [225005130020] |More and more researchers are using the web to gather data for serious research, but they need your help as participants. [225005130030] |As a proponent, I like to do my part and share the calls for participation that I know about. [225005130040] |If you know of any others, I'm happy to add: [225005130050] |MPI -- The Max Planck Institute for Psycholinguistics: Investigates how people use language. [225005130060] |Cue-word memories -- Clare Rathbone, University of Reading: The study is specifically interested in the way people remember events from their lives. [225005130070] |You will be asked to recall 16 memories and then rate them for details, such as vividness and how often you have thought about the memories before. [225005130080] |You will also be asked to fill in two short questionnaires. [225005130090] |Please note, this questionnaire is for people over the age of 40 only - please do not take part if you are aged 39 or younger. [225005130100] |Phrase Detectives -- University of Essex: Lovers of literature, grammar and language, this is the place where you can work together to improve future generations of technology. [225005130110] |By indicating relationships between words and phrases you will help to create a resource that is rich in linguistic information. [225005130120] |Games With Words -- Joshua Hartshorne, Harvard University: Test your language sense! [225005130130] |Play a game while participating in cutting-edge research. [225005130140] |How good is your language sense? [225005130150] |Color Naming -- Dimitris Mylonas, London College of Communication: This is a multi-lingual colour naming experiment. [225005130160] |It is part of research on colour naming and colour categorisation within different cultures, and aims to improve the inter-cultural colour dialogue. [225005130170] |By taking part you are helping us to develop an online colour naming model which will be based on the "natural" language provided from your responses. [225005130180] |CogLab 2.0 -- The Cognitive Psychology Online Laboratory: Aggregated set of many online research projects. [225005140010] |do we need parsed corpora? [225005140020] |Maybe not, according to Google: For many tasks, words and word combinations provide all the representational machinery we need to learn from text...invariably, simple models and a lot of data trump more elaborate models based on less data. [225005140030] |I've been wondering about this very issue for 5 years or so. [225005140040] |When I first started collecting parsed BNC data for my defunct dissertation, I needed sentences involving various verbs and prepositions, but the examples I found were often of the wrong structural type because of preposition attachment ambiguity. [225005140050] |I used Tgrep2 queries to find proper examples, but even then there were false positives, so I did some error correction. [225005140060] |One of the more interesting discoveries I made was a relationship between a verb's role in its semantic class and its error rate. [225005140070] |I was trying to find a way to objectively define core members of a semantic verb class and peripheral members. [225005140080] |I had a pretty good intuition about which were which, but I wanted to get beyond intuition (yes yes, it's all very Beth Levin). [225005140090] |For example, one of the objective clues for barrier verbs (a class of negative verbs encoding obstruction, like prevent, ban, exclude, etc) was the unusual role of the preposition from in sentences like these: [225005140100] |
  • She prevented them from entering the pub.
  • [225005140110] |
  • He banned them from the pub.
  • [225005140120] |
  • They were excluded from the pub.
  • [225005140130] |The preposition from is usually used to mark sources (He drove here from Buffalo) but in these sentences it's acting much more like a complementizer. [225005140140] |This is fairly unique to barrier verbs and I felt it was distinctive of the verb class, so I wanted a bunch of examples. [225005140150] |Because I needed to exclude examples involving old-fashioned source from, I used a Tgrep2 search that required the PP to be in a particular relationship to the verb (the BNC parse was a bit odd as I recall, and required some gymnastics). [225005140160] |Again, I had a lot of false positives even with Tgrep2 so I did some manual error analysis and discovered that certain verbs had very low error rates while others had very high rates and the difference coincided nicely with my intuition about which verbs were core members of the class and which were peripheral: core members like prevent had very low error rates. [225005140170] |This means that when prevent is followed by a from-PP, it's almost always the complementizer from; obvious to adults, the meaning of a barrier verb doesn't easily include source (necessary for old-fashioned from), but how would a kid learn that? [225005140180] |If I ban you from the pub, how does a kid know the pub is NOT where you started (source) but rather the opposite, it's where you're not allowed to end up (goal)? [225005140190] |Cool little learning problem, I thought ... and with a data set other than frikkin dative (which Pinker and Levin have, let's face it, done to death). [225005140200] |I assumed there was something central to the meaning of the verb class that caused this special use of from. [225005140210] |Then it occurred to me, if this is true, why do I need the parse? [225005140220] |Imagine I ignore structure, take all sentences where from follows a relevant verb, then sample for false positives. [225005140230] |That should give me basically the same thing. [225005140240] |I became increasingly fascinated with this methodology. [225005140250] |I was now interested in how I was studying language, not what I was studying. [225005140260] |And that led me to ask whether or not the parse info was all that valuable for other linguistic studies? [225005140270] |But then I realized that when big news stories start getting old, the media always, always starts reporting on themselves, on how the news gets made ... [225005140280] |I didn't like where I was heading ... [225005140290] |...and then I got a job and that was that ... [225005140300] |HT: Melodye [225005150010] |more jobs for linguists [225005150020] |As the economy continues to grow (Dow over 12,000), so do the non-academic opportunities for linguists. [225005150030] |Here's an interesting one for an Analyst in the California* Bay Area : [225005150040] |The Analyst looks for opportunities to improve our Natural Language and Directed-dialog applications using the data logged by them. [225005150050] |The Analyst is primarily the responsible team member charged with analyzing the data and making new implementation recommendations [...] [225005150060] |Besides analyzing our speech applications and improving our Analytics framework, you will also have the opportunity to carry out independent research, which forms a big part of the success of our speech applications [...] [225005150070] |*living in the metro DC region has taught this Northern California boy that there's more than one "Bay Area." [225005160010] |can a machine learn jazz? [225005160020] |There's a contest dedicated to trying to answer that question: ISMIS 2011 Contest: Music Information Retrieval. [225005160030] |Computer scientists and engineers have long used contests and bake-offs to stimulate cutting edge research in linguistics (e.g., MUC), but linguists have lagged in this department. [225005160040] |You rarely if ever hear about contests that pit one linguistic theory against another using a standardized data set (or maybe I've just missed them). [225005160050] |Nobel prize winning economist Joseph Stiglitz argues here that prizes are good for stimulating academic research. [225005160060] |I agree whole heartedly and would like to see more direct competition between theorists. [225005160070] |Exactly how a contest would be constructed is up for debate (I have a vague memory of some group trying to devise criteria by which to evaluate linguistic theories, maybe out of UCLA, but I can't seem to track it down; it's a remarkably difficult Google query to form). [225005160080] |HT: Jochen L. Leidner [225005170010] |the linguistics of heaven and hell [225005170020] |The value of pop culture data for legitimate research is being put to the test. [225005170030] |Exactly what, if anything, can the reality show Big Brother tell us about language change over time? [225005170040] |Voice Onset Time is a measure of how long you wait to begin vibrating your vocal folds after you release a stop consonant. [225005170050] |Voiced stop consonants like /b/ and /d/ require two things: 1) stop all airflow from escaping the airway by closing the glottis and 2) after the air is released, begin vibrating the glottis (by using the rushing air). [225005170060] |For non-linguists, think of a garden hose. [225005170070] |Imagine you use your thumb to stop the water for a second and you let the pressure build, then you let go and water rushes out, but then you use your thumb to clamp down just a bit on the water to spray it. [225005170080] |This is kinda like the speech production of voiced stop consonants in human language. [225005170090] |(image from Kval.com) Though I’m no phoneticist, I really like VOT as a target of linguistic study for one crucial reason: it’s a clear example of a linguistic feature that varies according to your human language system but which you do NOT have conscious control over. [225005170100] |What that means is that you cannot consciously change the length of your own personal VOT. [225005170110] |Go ahead, try it. [225005170120] |Make your VOT 20 milliseconds longer. [225005170130] |Go ahead, I’ll wait… [225005170140] |Of course you can’t. [225005170150] |Well, not consciously, but what researchers have found is that your brain, quite independent of conscious will or knowledge, can! [225005170160] |Lab studies have found that people will unknowingly alter their VOTs according to certain situations, and the results are predictable. [225005170170] |For example, they found that when listening to a set of long VOT stimuli, subjects will begin to lengthen their own VOTs, in essence accommodating the longer VOTs. [225005170180] |Over the longer term it has also been shown that people will lengthen their VOT over their lifetime to accommodate cultural shifts. [225005170190] |It has been shown that The Queen Mother herself had a longer VOT in her later life than during her younger days (few other people have been recorded consistently over a long period to provide such valuable data). [225005170200] |Here’s what Bane et al. did: They took recordings of confessional sequences from the UK reality TV show Big Brother (where groups of strangers are made to live with each other and occasionally speak to a camera alone like a video diary) and tested what happened to 4 crucial individuals (the ones that stayed on the show long enough to provide several months worth of data points). [225005170210] |What they found was that their VOTs did in fact change, though no linear pattern was discovered (i.e., they did not simply get longer in a steady line). [225005170220] |This paper is labeled as a progress report because they don't have a firm hypothesis about what actually is happening. [225005170230] |Nice trick there boys, ;) [225005170240] |They did find one interesting thing: During part of the show, the house mates were physically divided into basically a caste system where half the people were low caste and half were high (a heaven and a hell). [225005170250] |And this seemed to have an effect on VOT as well (sociolinguists are slap happy about this, I'm sure). [225005170260] |I haven’t looked at the actual numbers very closely, but in section 6, they say “Housemate trajectories seem to diverge when the divide is present…” However, just taking a glance at the Figure 3, it looks like they diverge at the beginning, then converge at the end, episode 65 (and remain somewhat similar until several episodes of non-DIVIDE have gone by). [225005170270] |If my cursory glance is correct, I would assume it takes awhile for the convergence to manifest, and then it persists for awhile after DIVIDE is gone. [225005170280] |But this is just me looking at the picture, not the actual data. [225005170290] |Finally, and this is just a readability point, but I would order the names in Figure 3 in the same order as the end point of each trajectory, making it easier to follow who is doing what. [225005170300] |Max Bane, Peter Graff, &Morgan Sonderegger (2011). [225005170310] |Longitudinal phonetic variation in a closed system Linguistic Society of America 2011 Annual Meeting. [225005180010] |chomsky and performance art [225005180020] |Artist Annie Dorsen has created a chatbot performance piece around the debate between Noam Chomsky and Michele Foucault on Dutch TV in 1971 (videos here). [225005180030] |This snippet is mildly interesting, but I couldn't help wondering what technology was used, especially the speech synthesis, because, frankly, it's a bit clunky and old-fashioned. [225005180040] |Perhaps that's part of the point. [225005180050] |The computer screens appear to be running DOS shells too. [225005180060] |Nothing wrong with that, purists will likely prefer it even, but combined with the clunky speech, the performance appears to be trading on a very outdated computational linguistic aesthetic. [225005180070] |Truly the desert of the real? (okay, I had to through in some Baudrillard) [225005190010] |the sociolinguistics of height in China [225005190020] |Ingrid at Language on the Move has some thoughtful comments on the relationship between height and learning English in China. [225005190030] |If you're under 1.6 meters, forget it. [225005190040] |There are subtle but very real socioeconomic barriers in your way. [225005190050] |Money quote: [225005190060] |I have supervised research related to English language learning and teaching in China for almost a decade and have read most of the research on the topic published in English. [225005190070] |However, never before have I come across the importance of height. [225005190080] |I take this as evidence for the importance of doing ethnographic research. [225005190090] |Otherwise, what is the point of doing sociolinguistic research if you can’t discover anything you hadn’t already decided in advance would be important?! [225005190100] |I taught EFL in China back in 1998 at a private school in Guangzhou catering mostly to working professionals. [225005190110] |Much has changed since then, as China has changed so much. [225005190120] |I don't recall ever talking about height as a factor, but certainly cost and hours were a significant issue that made it virtually impossible for any poor workers to consider taking English classes. [225005190130] |As a 6 foot 4 blond American, though, I was treated like a rock star. [225005190140] |It was kinda weird. [225005200010] |the false narrative of small town slang [225005200020] |There is a common critique of journalists that they often let an internal narrative color their reporting, to the point where they simply parrot back the narrative in their head rather than report the facts on the ground (see here for a discussion of this). [225005200030] |My hometown of Chico got its spotlight in the sun recently because its favorite son Aaron Rogers is the star quarterback of the Packers about to play in the Super Bowl. [225005200040] |Unfortunately, the NYT's article is a near perfect example of journalists letting a narrative do the talking when the facts blatantly contradict their claims: [225005200050] |The usual slang words like awesome or cool are not heard much. [225005200060] |Nice is in. [225005200070] |As in: “You won the lottery? [225005200080] |Nice.” [225005200090] |The narrative this spins is that small towns are all Mayberrys where everyone is pure and innocent and righteous and better than them damned city-folk. [225005200100] |It has been evoked routinely in political reporting. [225005200110] |I'm a Chico boy. [225005200120] |I graduated from Chico jr. [225005200130] |High and walked across the street and graduated from Chico High then walked across the street and graduated from Chico State*. [225005200140] |And I can assure you that awesome and cool are every bit as frequent there as anywhere else (personally, I had an unhealthy fondness for hella back in 1987). [225005200150] |And believe me, if you won the lottery in Chico, no one would say nice. [225005200160] |They would say, "No fukkin way! [225005200170] |No fukkin way! [225005200180] |Really! [225005200190] |No fukkin way!" ... just like everywhere else. [225005200200] |*no joke, those three schools are literally across the street from each other. [225005210010] |how (not) to do linguistics [225005210020] |Jonah Lehrer, the neuro-blogger, has a mixed track record, as far as I'm concerned. [225005210030] |His initial blogging was nice, but a tad lightweight, then he started to sound a bit too Malcom Gladwell-ee (in that I wasn't entirely sure he knew what he was talking about beyond having a few short phone calls with one or two scientists then babbling on about a topic). [225005210040] |But he's hit a home run with this long New Yorker piece about the failure of the journal review process in science: The Truth Wears Off. [225005210050] |He draws examples from medicine, physics, and psychology. [225005210060] |Perhaps the most disappointing part is the realization the the standards of testing and conclusiveness in linguistics are so far from those in more established science. [225005210070] |Before the effectiveness of a drug can be confirmed, it must be tested and tested again. [225005210080] |Different scientists in different labs need to repeat the protocols and publish their results. [225005210090] |The test of replicability, as it’s known, is the foundation of modern research. [225005210100] |Replicability is how the community enforces itself. [225005210110] |It’s a safeguard for the creep of subjectivity. [225005210120] |Repeating studies is virtually unheard of in linguistics. [225005210130] |Also, Lehrer mentions the publication bias in journals. [225005210140] |When a result is discovered, there is a bias towards positive results. [225005210150] |After a while, once the result is accepted, then only negative results are published because only that is "interesting" anymore. [225005210160] |But I would expand this point to say this same bias exists at every stage of the research process. [225005210170] |We want to find things that happen, we don't care about spending 5 years and thousands of hours discovering that X does NOT cause Y! So when young grad students begin scoping out a new study, they throw away anything that doesn't seem fruitful, where fruitful is defined as yielding positive results. [225005210180] |This bias affects the very foundation of the research process, namely answering the basic question: what should I study? [225005210190] |As a side note, engineers seem perfectly happy to follow through on null results. [225005210200] |They need to know the full scope of their problem before solving it. [225005210210] |Scientists can learn a lot from engineers (and vice versa). [225005210220] |[Psychology professor Jonathan] Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results. [225005210230] |“I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says. [225005210240] |“It would help us finally deal with all these issues that the decline effect is exposing.” [225005210250] |Coincidentally, I was recently tweeting with moximer and jasonpriem about this and we agreed that research wikis are worth explolring. [225005210260] |My vision would be something akin to Wikipedia but where a researcher stores all of their data, stimuli, results, etc, finished or not. [225005210270] |The data could be tagged as tentative, draft, failed, successful, etc. [225005210280] |As the research goes on, the data get updated. [225005210290] |Not only would this record failure (which, as Leherer points out in the article) is as valuable as success, it also records change. [225005210300] |How did a study evolve over time? [225005210310] |True, the data would become huge over time across many disciplines, but that just means means we need better and better data mining tools (and the boys at LingPipe are working away at those tools). [225005210320] |HT rapella [225005220010] |my classic snowclone rant [225005220020] |As yet another winter storm threatens the US, lingo-tweeter cum lingo-grad student Lauren Ackerman marvels at the media's lust for snowmageddon and terms of its ilk, and I was reminded of my own ruminations on the many words for snow in my own peculiar dialect (it helped that I spent 6 hours in near motionless traffic a few days ago while the DC metro region was castrated by a vicious and sudden sleet storm that halted traffic as well as sanity). [225005220030] |So I offer this re-post from February 5 2010: [225005220040] |As the snow descends upon Northern Virginia in the latest winter storm, and as DC's elite line-up at their local Whole Foods and Trader Joe's clutching their reusable bags filled with heavily packaged prepared meals, cardboard-container salads, 6 bottles of wine, and one bottle of water ('cause, ya know, it's an "emergency"), I am struck by the fact that the great Eskimo vocabulary hoax (pdf) is no hoax at all! [225005220050] |It turns out that I too have a great many words for snow. [225005220060] |This evening, while running a few modest errands before the night's predicted 20 inch snow drop, I meticulously recorded the various terms I uttered as synonyms for the fluffy white stuff which descended, rather gracefully, upon the landscape. [225005220070] |A few choice examples (NSFW): [225005220080] |shit [225005220090] |
  • "Why do people drive like such morons in this shit?"
  • [225005220100] |
  • "Hey asshole! [225005220110] |This shit's not Vasoline! [225005220120] |You can drive faster that 6 miles an hour!"
  • [225005220130] |crap [225005220140] |
  • "This crap's gonna be piled up in disgusting dirty brown heaps for weeks."
  • [225005220150] |fuck [225005220160] |
  • "Fuck these fucking fuckers who can't drive in this fuck!"
  • [225005220170] |asshole-shit-motherfucker* [225005220180] |
  • "Ahhhh! [225005220190] |You drive on this asshole-shit-motherfucker like it's nuclear!"
  • [225005220200] |fucking-fuck-fuck [225005220210] |
  • (directed at a plow driver) "push the fucking fuck fuck onto the curb, not back into the road!"
  • [225005220220] |grrrrrrr [225005220230] |
  • "gawd I hate everybody! [225005220240] |All of you! [225005220250] |All because of this ... grrrrrrr!" (picture head exploding)
  • [225005220260] |*asshole-shit-motherfucker is actually quite productive in my dialect. [225005220270] |It replaces a great many phrases. [225005220280] |Addendum (1-31-2011): I want to see a Visual Thesaurus word map of my words for snow! [225005230020] |Neuro-blogger Bradley Voytek posts a nice discussion helping us all understand how to consume neuroscience in the news: [225005230030] |In this post, I will teach you all how to be proper, skeptical neuroscientists. [225005230040] |By the end of this post, not only will you be able to spot "neuro nonsense" statements, but you'll also be able to spot nonsense neuroscience questions. [225005230050] |Well worth the read. [225005250010] |Linguist List FAIL [225005250020] |I've been kicked around a few NLP blocks in my time so I've developed a sixth sense about what employers are looking for when they post job announcements. [225005250030] |When I read this one from Intelius on The Linguist List today, my reaction was clear, concise, and unconditional: This is NOT for linguists. [225005250040] |This posting says engineers only to me! [225005250050] |There's nothing wrong with that, but why use the Linguist Lists' job postings board with a job that no actual linguist will be considered for? [225005250060] |My reaction is based on what I consider to be engineering dog-whistles that are designed to encourage the "right" people to apply (i.e., engineers) and the wrong people to go away (i.e., linguists). [225005250070] |A quick breakdown of their rhetorical dog-whistles: [225005250080] |
  • The Data Research Group is a team of scientists at Intelius... [225005250090] |Much as I would like linguists to be considered scientists, the truth is, in the "real world" of job announcements, they are not. [225005250100] |This is a red flag.
  • [225005250110] |
  • Team members have published papers in top research conferences...Ah hah, not "conferences" per se, but "research conferences". [225005250120] |This means ACL.
  • [225005250130] |
  • Mentors will include Dr. Vitor R. Carvalho and Dr. Andrew Borthwick (diss PDF)... [225005250140] |NOT linguists.
  • [225005250150] |
  • Required Skills: Strong hands-on skills in Java and/or Python... i.e., we assume you lay awake at night worrying about arrays and functions, not unnaccusative marking and tone sandhi.
  • [225005250160] |
  • Required Skills: Self-motivated, creative, and independent researching skills ... we will teach you nothing. [225005250170] |You are on your own. [225005250180] |Your teachers are gone. [225005250190] |What can you give us?
  • [225005250200] |FYI: Recently, bulbul has quite rightly taken me to task for being a tad hypocritical in arguing two seemingly contradictory points: (1) that 21st Century linguists should study math and (2) that the time consuming effort of learning computational tools is a deterrent to being a linguist. [225005250210] |I can imagine this post as falling victim to that same complaint. [225005250220] |My pre-defense is that I believe there is a skill set distinct to linguists that is valuable and worthy of investment by NLP capitalists that has been largely ignored. [225005250230] |Engineers alone will not solve the critical language issues necessary to create the great products of the next generation of NLP tools. [225005250240] |I believe in team building where linguists and engineers work together as equals [225005260010] |our foundational tongues? [225005260020] |A commentator at The Daily Dish writes: I recently learned that in our foundational tongues of Latin, Greek, and Hebrew the words for breath and spirit are one and the same: spiritus, pneuma, and ruach [emphasis added]. [225005260030] |I'm not sure what the author had in mind for "our foundational tongues." [225005260040] |Assuming the author is referring to English, then Latin, okay sure, Romance languages have had an important influence on English. [225005260050] |Greek, less so. [225005260060] |But Hebrew??? [225005260070] |What's most striking is the notable lack of Germanic languages as "foundational." [225005260080] |This author needs a Ling 101 class. [225005260090] |And as for the author's claim about words for breath and spirit being the same, there is a related poetic pairing common to good ol' fashioned English. [225005260100] |The word breath is often used as a metonymy for life or spirit. [225005260110] |Here are a few choice examples: [225005260120] |The Bard Henry V -- King Henry's Once more unto the breach, dear friends speech (III, 1): [225005260130] |Now set the teeth and stretch the nostril wide, Hold hard the breath and bend up every spirit To his full height. [225005260140] |On, on, you noblest English. [225005260150] |Whose blood is fet from fathers of war-proof! [225005260160] |In my reading of this line, King Henry pairs holding of the breath with spiritual courage to draw a parallel between the two. [225005260170] |Hamlet -- Hamlet's Mother, Queen Gertrude, whilst arguing with her tortured son (III, 4): [225005260180] |Be thou assured, if words be made of breath, And breath of life, I have no life to breathe What thou hast said to me . [225005260190] |Prior to this line, Hamlet prods his mother to stop sleeping with his uncle/king and to "break your own neck down." [225005260200] |In my reading of her lines, Gertrude connects the dots between words, breath, and spirit because of her son's harsh words. [225005260210] |She is saying it is not in my spirit to do what you are asking of me. [225005260220] |And here is a really nice 2009 discussion of poetry and breath by Melissa Zeiger: Grace Paley's Poetics of Breath. [225005260230] |Money quote: [225005260240] |The Romantic poets reemphasized breath as a force in poetry, liking to imagine that poetic breath mediated between the human and the transcendent, as, famously, in Coleridge's “The Eolian Harp,” where the wind joins breath to participate in “one Life within us and abroad,/ Which meets all motion and becomes its soul” [225005260250] |And this trope is not limited to Western literature either. [225005260260] |The traditional Chinese concept of Qi is deeply rooted in an analogy of breath = life. [225005260270] |From the Wikipedia page: [225005260280] |Qi is frequently translated as "energy flow". [225005260290] |Qi is often compared to Western notions of energeia or élan vital (vitalism), as well as the yogic notion of prana, meaning vital life or energy, and pranayama, meaning control of breath or energy. [225005260300] |The literal translation of "qi" is air, breath, or gas. [225005260310] |Compare this to the original meaning of the Latin word "spiritus", meaning breathing; or the Koine Greek "πνεῦμα", meaning air, breath, or spirit; and the Sanskrit term "prana", meaning breath. [225005260320] |What this suggests to me is that there is something deeply natural to our cognitive perceptions about this analogy between breath and life. [225005260330] |It is natural for humans to perceive breathing and thinking to be related somehow. [225005260340] |Without breath, you cannot think. [225005260350] |Fair enough. [225005260360] |But this might be a deeply human logic insofar as ants or dolphins may not conceive of this relationship in the same way. [225005260370] |I blogged about this last year in Dolphin-Bikes and The Iconicity Effect. [225005260380] |I'm still waiting for a dolphin bike. [225005270010] |Ngram Viewer sucks, true dat [225005270020] |proof positive.... [225005270030] |true dat... [225005280010] |evolution = chaos? [225005280020] |Kottke points to a graphical variation of the Chinese whispers game whereby an original sign (in this case, a line drawn by a human) is rapidly degraded by multiple repetitions (the more people try to repeat the original line, the less line-like it becomes, eventually degrading into chaos). [225005280030] |A Sequence of Lines Traced by Five Hundred Individuals from clement valla on Vimeo. [225005280040] |Kottke marvels that "The lines get really messy surprisingly fast [...] this is a nice demonstration of evolution." [225005280050] |But is it? [225005280060] |Is it the case that evolution leads to chaos*? [225005280070] |I don't think so. [225005280080] |Evolution leads to variation and change, sure, but chaos? [225005280090] |The difference between evolution and this line transformation, I think, is pressures. [225005280100] |In evolution there are pressures that greatly effect which changes last more than one generation and hence become permanent stable. [225005280110] |But in this game, there are no pressures, as far as I can tell. [225005280120] |There is no survival of the fittest because each turn gets to survive for exactly one generation with no pressure to be fitter than another in order to persist beyond one generation. [225005280130] |So this exercise, cute as it may be, does not resemble evolution at all, I don't think. [225005280140] |*or messiness in Kottke's phrasing [225005290010] |economists are bad linguists [225005290020] |Dominik Lukes at Metaphor Hacker has a thorough discussion of Harvard economist Ed Glaeser's mis-use of metaphor theory by trying to use NYC restaurants as a metaphor for schools. [225005290030] |Lukes teases out the mis-mappings that Glaeser fails to recognize. [225005290040] |Money quote: [225005290050] |[Restaurants] also use a number of tricks to make the dining experience better –cheat on ingredients, serve small portions on large plates, etc. [225005290060] |They rely on ‘secret recipes’ –the last thing we want to see in education. [225005290070] |And this is exactly the experience of schools that compete in the market. [225005290080] |They fudge, cheat and flat out lie to protect their competitive advantage. [225005290090] |They provide the minimum of education that they can get away with to look good. [225005290100] |Glaeser, as he conveniently forgets, there is a huge amount of centralized oversight of New York restaurants –much more, in some ways, than on charter schools. [225005290110] |The full discussion is thorough and well worth reading. [225005300010] |fuck C++ [225005300020] |Andrew Vos provides us with valuable data analysis of the correlation between programming languages and profanity: [225005300030] |The plan was to find out how much profanity I could find in commit messages, and then show the stats by language. [225005300040] |These are my findings: Out of 929857 commit messages, I found 210 swear words (using George Carlin's Seven dirty words). [225005300050] |Oh, Python, beautiful Python ... no wonder the NLTK guys chose it as their NLP language of choice. [225005310010] |the linguistics of 404 FILE NOT FOUND [225005310020] |A cute site providing humorous translations of the world's most frustrating search result. [225005310030] |Personal favs: [225005310040] |
  • American South - Ah cain't find th' page yer lookin' fer.
  • [225005310050] |
  • Australia - Strewth mate yer bloody page has shot through.
  • [225005310060] |
  • Blond - like omg! ur file has not been found, go paint ur nails and try back later, lol^^....I FOUND A QUARTER!
  • [225005310070] |
  • Cockney - No chance luv, carrnt find it neever.
  • [225005310080] |
  • Pirate - Haaarr, Lubber! [225005310090] |I've sailed yon seas with toil and trial, and yet I cannot find ye file!
  • [225005310100] |
  • Pittsburghese - This page needs fixed n'at... it's all caddywhompus! [225005310110] |Yinz needs look somewheres else.
  • [225005310120] |
  • Zombie - Arrgrg 404 BrAiNs aAAArrggh No ggrrgrh page brAiNz heRe BrAAAAIIINNSSSS!
  • [225005320010] |Hosni prefers "Hosny" in transliterated attire [225005320020] |Rachel Maddow et al. discovered a delicious gem fit for the annals of transliteration. [225005320030] |Namely, how to write a specific Arabic name in the Roman alphabet (what we English speakers like to call "regular spelling"). [225005320040] |She (and her staff) reported that Hosni Mubarak attended a head-of-state meeting in Albania a couple years ago wearing the world's most narcissistic pinstriped suit*, where the pin stripes were actually composed of lines of his name written in Roman alphabetic transliteration (this man really knows how to live the life of a tyrant, am I right?): [225005320050] |It is a troublesome fact of human language that writing the damned thing down is never easy. [225005320060] |It's difficult enough to construct a writing system that is consistent for a single language, more difficult still to take a linguistic term (like a person's name) and write it down in a script which was not designed for that particular language. [225005320070] |So when English language writers (like journalists) have to write down Arabic names in "regular spelling" they inevitably face difficult choices about which letters to use to represent particular sounds. [225005320080] |Vowels are particularly difficult creatures to pin down with alphabetic rope (e.g., the whole and sometimes y fiasco). [225005320090] |The act of writing a linguistic term in a foreign script is called transliteration, and it's troublesome enough to have spawned a cottage industry sub-field within computational linguistics. [225005320100] |For example, if you wanted to Google information about the currently exiled president of Egypt, you would be wise to Google the term "Hosni Mubarak." [225005320110] |That is by far the most common spelling of the man's name on the internet (by a better than 20-1 margin, at least according to Google hit counts). [225005320120] |Even if you choose the "Hosny" variant, you're basically just redirected to the "Honsi" results anyway. [225005320130] |Yet the tyrant himself, ever the maverick, prefers the road less traveled. [225005320140] |Sadly, there's not much more to say about this than to emphasize the simple fact that transliteration is largely arbitrary and disputes about guidelines are largely trivial. [225005320150] |Just flip a coin and move on ... [225005320160] |(I just seriously pissed off the world's four transliteration experts). [225005320170] |...and in closing I'd like to repeat my assertion that Hosni/y Mubarak looks suspiciously like The Face of Bo**: *FYI, I have no independent verification of the truth of this story. [225005320180] |If Maddow's staff got punk'd, their bad. **Damn you Captain Jack!! [225005330010] |turning gaga into water = 200 terabytes [225005330020] |How much storage would it take to store the first 5 years of a child's linguistic environment? [225005330030] |Apparently, 200 terabytes. [225005330040] |From Fast Company: [225005330050] |...cognitive scientist Deb Roy Wednesday shared a remarkable experiment that hearkens back to an earlier era of science using brand-new technology. [225005330060] |From the day he and his wife brought their son home five years ago, the family's every movement and word was captured and tracked with a series of fisheye lenses in every room in their house. [225005330070] |The purpose was to understand how we learn language, in context, through the words we hear. [225005330080] |A combination of new software and human transcription called Blitzscribe allowed them to parse 200 terabytes of data to capture the emergence and refinement of specific words in Roy’s son’s vocabulary. [225005330090] |The data visualization techniques he uses are pretty cutting edge ... and awesome! [225005330100] |I love the fact that he is trying to use visualization techniques to help us understand something beyond raw statistics (which is where most graphs and pie charts die miserable deaths). [225005330110] |Statistics are like molecules. [225005330120] |Visualize them one by one and it's difficult for the average person to conceptualize the big picture of how they work together to create a grander whole. [225005330130] |Roy appears to be trying to get beyond the yawn-inducing graphs that plague modern science. [225005330140] |I mean, he uses freaky-deaky time-worms! [225005330150] |How cool is that! [225005330160] |Roy talk's about feed-back loops as well: [225005330170] |..."Caregiver speech dipped to a minimum and slowly ascended back out in complexity.” [225005330180] |In other words, when mom and dad and nanny first hear a child speaking a word, they unconsciously stress it by repeating it back to him all by itself or in very short sentences. [225005330190] |Then as he gets the word, the sentences lengthen again. [225005330200] |The infant shapes the caregivers’ behavior, the better to learn. [225005330210] |He gave a TED talk recently, but the video is not yet available. [225005340010] |Korean in Killeen [225005340020] |Having spent nearly 4 months of the last year and a half working at Fort Hood, in Killeen Texas, I finally decided to leave the safe confines of the hotel-centric chain restaurants and Target/Wal-Mart shopping centers and take a drive to historic downtown Killeen. [225005340030] |I found pretty much what I expected to find, empty one storey store fronts, dusty unused parking spaces, and lots of lots of Hangul ... (screeching sound) ... huh? [225005340040] |Yep, turns out historic downtown Killeen, heartland of America, is being somewhat revitalized by Korean immigration. [225005340050] |My favorite grocery store by far is the Korean O-Mart (not the one pictured above, btw), where I can find genuinely fresh vegetables and dumplings (as well as shitake mushrooms, plenty of seaweed for soup, and a wide array of spicy sauces that I have been eagerly experimenting with). [225005340060] |It was a nice lesson in American multi-linguialism. [225005350010] |open science [225005350020] |Recently in North Carolina, moximer &David Dobbs and others discussed the value of opening up science research (such that all research is freely available for searching and interpretation, even draft versions and failed experiments, at least under the strong proposal). [225005350030] |It's an interesting discussion (audio is a bit crappy, but whaddayagonnado?): [225005350040] |What's Keeping Us from Open Science? [225005350050] |Is It the Powers That Be, Or Is It... [225005350060] |Us? from Smartley-Dunn on Vimeo. [225005350070] |Hence, I thought it might be nice to list some open source journals offering free access to scientific research: [225005350080] |
  • PLoS is a nonprofit organization of scientists and physicians committed to making the world's scientific and medical literature a freely available public resource.
  • [225005350090] |
  • The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. [225005350100] |Like a paper library, we provide free access to researchers, historians, scholars, and the general public.
  • [225005350110] |
  • CiteSeer: The NEC Scientific Literature Digital Library incorporating autonomous citation indexing, awareness and tracking, citation context, related document retrieval.
  • [225005350120] |
  • arXiv.org e-Print archive: Open access to 664,014 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.
  • [225005350130] |
  • Directory of open access journals: This service covers free, full text, quality controlled scientific and scholarly journals. [225005350140] |We aim to cover all subjects and languages. [225005350150] |There are now 6271 journals in the directory. [225005350160] |Currently 2722 journals are searchable at article level.
  • [225005350170] |
  • Wikipedia Open access journals. [225005350180] |A list of open access journals...
  • [225005350190] |
  • Google "open journals"
  • [225005350200] |
  • Cognitive Science Network directed by Mark Turner (HT Sport Linguist).
  • [225005350210] |
  • Free Full Text: a search engine returning full text scientific articles with no access fees.
  • [225005360010] |Google linguist interview [225005360020] |Purpose: This post reviews my experience interviewing for a Linguist position at Google in Santa Monica, CA on February 29, 2008. [225005360030] |I've long meant to post this but only now got around to it. [225005360040] |There are lots of Google interview stories on the web. [225005360050] |It appears to be its own genre. [225005360060] |This is my contribution to the genre. [225005360070] |I originally wrote it as an email to a friend who wanted to know how my big day at Google went. [225005360080] |It’s rather long, but then again, you don’t have to read it, you clearly have better things to do… [225005360090] |I found a job posting on the Google jobs board for a full time Linguist. [225005360100] |I applied and was given a phone interview with a recruiter around late January, 2008: [225005360110] |Thank you for your interest in Google. [225005360120] |I'd like to set up a time for us to discuss Google Linguist opportunities and your qualifications. [225005360130] |Please let me know a day/time when you would be available to speak with me as well as the best phone number for me to contact you. [225005360140] |I'll email you back to confirm. [225005360150] |I hope to hear from you soon! [225005360160] |Cheers, JF Google Staffing [225005360170] |During that phone interview the recruiter shared a Google doc which I was instructed to complete in about 45 minutes… [225005360180] |I do not recall the exact questions, but they mostly dealt with asking me to rank ads with web pages results (e.g., if I do a Google search for “lock” and an ad for a locksmith appears, how relevant to the search term was that ad?). [225005360190] |It also gave me topics and asked me to provide example web pages that matched the topics. [225005360200] |Once done with this task it was clear that the position I was applying for was not a linguist in any traditional sense. [225005360210] |After they reviewed my answers I was granted a phone interview with a “Linguist” at Google that occurred about a week after the online doc interview. [225005360220] |During this week I snooped around and discovered some interesting facts: The back story is that Google acquired a company called Applied Semantics in 2003. [225005360230] |This company was based in Santa Monica, so the office stayed there. [225005360240] |Applied Semantics created AdSense, a Latent Semantic Analysis-style algorithm to compare the linguistic similarity of web pages and ads. [225005360250] |That company hired people they called "linguists" to 1) evaluate the quality of the comparisons and 2) build and test taxonomies of web pages and ads. [225005360260] |These people now work for Google. [225005360270] |They also routinely hire people on 1 year contracts to act as test evaluators (see here), but I was interviewing for a full-time, permanent position. [225005360280] |Before the trip, I had a phone interview with LS in the taxonomy group. [225005360290] |He had a PhD in formal semantics (I’m pretty sure he was a UCLA linguistics alum). [225005360300] |I would say his academic background more closely matched his Google work than anyone else. [225005360310] |I don’t recall the specifics of that interview, but I must have done well enough to be invited to fly to Santa Monica for a full day of interviews in late February 2008. [225005360320] |Hi Christopher, [225005360330] |We are very excited that you are coming to talk with us about opportunities here at Google! [225005360340] |I've scheduled your interviews per the availability that you gave to JF. [225005360350] |We look forward to seeing you on Friday, February 29, 2008 at 10:45am. [225005360360] |Your interviews will last approximately 2 to 5 hours. [225005360370] |Please ask for MC upon arrival. [225005360380] |Google takes an academic approach to its interviewing process. [225005360390] |This means that we are interested in your thought process, your approach to problem solving, and in your programming skills as well. [225005360400] |You may also be asked questions that relate to algorithms, data structures, and distributed systems. [225005360410] |The dress is business casual/something you are comfortable in (we are more interested in what you have to say than what you are wearing). [225005360420] |The trip began oddly. [225005360430] |I was booked into a more expensive hotel than I was originally told (I had to pay for it myself and wait for re-imbursement). [225005360440] |Plus, my whole coast-to-coast trip was 36 hours long. [225005360450] |Little time to adjust. [225005360460] |I want to give my general impressions of those Google interviews. [225005360470] |Nothing specific (because I don’t recall the specifics, hehe) but rather the "vibe" I got. [225005360480] |Keep in mind I interviewed with what they call "linguists", not with any engineers, so these are people outside their core employee constituency. [225005360490] |I’m pretty sure most, if not all, of the people I interviewed with were hired by the previous company Applied Semantics prior to the Google acquisition. [225005360500] |Generally speaking, I walked away thinking I was qualified to perform any and all of the tasks they described (at least as qualified as anyone I met with given their own backgrounds). [225005360510] |I felt like all of these people were smart, but performing tasks for which they had no special training. [225005360520] |I met with five people on-site in addition to my phone interview with LS earlier. [225005360530] |One had a PhD in linguistics but on a theoretical topic and one was ABD in art history (both non-empirical methods training) and the phone interview guy had a PhD in linguistics and formal semantics (again, non-empirical training) but he had done empirical, lab-based linguistics work as a post-doc at a well respected east coast university. [225005360540] |One person had a B.A in linguistics and another had a B.A. in literature and classics. [225005360550] |One guy had a B.A. in French and literature and was a professional translator before being hired. [225005360560] |Almost all of them had been hired prior to the Google acquisition, so they mostly had 6 or more years of experience doing the job, but none seemed to have prior training that matched. [225005360570] |I was encouraged by this as my own background did not match what they did. [225005360580] |Apparently, that was fine with them. [225005360590] |There were two "groups" of linguists that I met with (literally in different buildings a couple blocks from each other in downtown Santa Monica (near the 3rd street promenade) and very close to the beach. [225005360600] |Group one dealt mostly with the taxonomy, or categorization, of web pages and ads (e.g., is web page X a kind of sports page or auction page, etc). [225005360610] |The second group dealt with human evaluation of relevance of ad + search correspondence (e.g., ‘how relevant is ad X to web page Y?’). [225005360620] |The first interview I had during the trip was with a taxonomy guy M. who had a BA in French and was a translator who was hired pre-acquisition. [225005360630] |He had been with this group for almost 10 years, and he was now a "project manager" and he had the authority to originate his own projects. [225005360640] |We discussed a series of things that I felt were close to what I am good at and I liked that interview the best. [225005360650] |But, I realized that this guy I was talking to had had little background for the stuff he was doing. [225005360660] |He must have learned entirely on-the-job. [225005360670] |He asked me two semi-technical questions: He asked me to define “precision” and “recall” (easy) and then he asked me which would be better to use on a new project and why, ASCII or UTF-8 (UTF-8). [225005360680] |He described two projects he was working on (one involved trying to auto-detect compound nouns and their relationships, like the difference between “hotdog” and “wedding cake”) and I thought to myself, "why is a guy like you working on that? [225005360690] |These are computational linguistics problems. [225005360700] |There are people who have solid training in this. [225005360710] |Why is a French translator doing this?" [225005360720] |Part of the answer to this is the corporate culture of Google. [225005360730] |They let anyone at his level initiate any project they want. [225005360740] |He just has to show some kind of results at some point. [225005360750] |That probably works great when you have a small group of talented engineers working on tasks within their field of specialization, but when you get so big that you let French translators try to solve problems that PhDs in computational linguistics try to solve, it's bound to go bad. [225005360760] |The second guy I met with, J., only had lunch with me so it wasn't really an "interview". [225005360770] |He had a B.A. in lit and classics. [225005360780] |I got the worst vibe from him. [225005360790] |He wasn't very talkative and I got the impression he wasn't very ambitious. [225005360800] |He had been there as long as the first guy, but seemed to be doing lower level tasks, and I think he liked it that way. [225005360810] |I felt I had to work a bit too hard to get him to talk, like he was barely tolerating my presence. [225005360820] |After lunch I moved to the second group who deal with human evaluators where I met with three people. [225005360830] |I liked everyone I met. [225005360840] |They were all responsible for designing online studies to test relevance. [225005360850] |These tests were web based human evaluations. [225005360860] |I had spent a few years tangentially associated with a psycholinguistics lab in grad school. [225005360870] |That lab ran experiments on human subjects testing the natural processing of sentences. [225005360880] |I also taught a psycholinguistic course. [225005360890] |I have a modest background in testing methodology. [225005360900] |I'm not a pro by any means, but at least I've had some exposure to experimental methods. [225005360910] |I felt like most of these people had nothing even close to training in experimental design. [225005360920] |The first person, TW, had a BA in linguistics from Berkeley. [225005360930] |That’s a nice degree, but it’s unlikely she had much experimental design experience at that level. [225005360940] |Unfortunately, I was also suffering a bit of a post-lunch lull in energy and jet lag during her interview, so my concentration and energy was off and I fumbled a bit with some of her hypothetical situations. [225005360950] |I felt I did the worst with her. [225005360960] |The second evaluation guy, C., was ABD in art history. [225005360970] |I tended to over-talk with him and he had to cut me off a few times in order to ask more questions. [225005360980] |I may have been over-compensating given my weak performance with TW just before. [225005360990] |The final guy, A.M., had a PhD in linguistics (again, from UCLA). [225005361000] |His diss was an OT analysis of something. [225005361010] |His advisors were Steriade and Hayes as I recall. [225005361020] |We seemed to have the most fun together and laughed quite a bit (I was also loopy from jet lag and a full day of interviews). [225005361030] |He accidentally stabbed me with his pencil at one point. [225005361040] |It didn’t hurt, but I joked about it. [225005361050] |“The guy’s almost outta here and you killed him!” [225005361060] |All of the second group’s interviews went roughly the same. [225005361070] |Each one kept asking me how I would design an evaluation for specific kinds of relevancy tasks. [225005361080] |Essentially, it was a bunch of what-if questions. [225005361090] |They all seemed most interested in my ability to understand experimental methodology. [225005361100] |After the day was done, I walked to the beach (it was over 70 mdegrees and I was about to fly back to the east coast with temps in the teens). [225005361110] |I literary took my shoes off and just stood in the warm Pacific ocean for an hour. [225005361120] |So, they’ve got a French translator doing computational linguistics and an art historian doing experimental human evaluation. [225005361130] |Hmmmm? [225005361140] |What to make of this? [225005361150] |I don’t want to give the impression that I think a person’s college degree pigeon-holes their career. [225005361160] |Not at all; my own career proves the opposite, but Google is famous for hiring highly trained subject matter experts. [225005361170] |The Google Linguists are different. [225005361180] |They are not doing linguistics. [225005361190] |Therefore, my own education and experience should have been no more of an issue than the people with whom I interviewed. [225005361200] |I feel my resume was at least as good as anyone there for the tasks they described. [225005361210] |Another suspicious part was the fact that at least four of these interviewers asked me if I would be comfortable doing all the little, day-to-day tasks associated with the job. [225005361220] |M. even went so far as to say “we support the engineers”. [225005361230] |The message I received was that they are low men on the totem pole. [225005361240] |I gave the same answer I always give to this question: This is what being a professional means. [225005361250] |You do whatever tasks it takes to get the job done. [225005361260] |I left feeling very good. [225005361270] |I felt my resume matched well with the people I met and the interviews mostly went well. [225005361280] |Three years later, I still haven’t heard from them. [225005361290] |Any chance they’re still mulling it over?