[225001830010] |Cupertino Who? [225001830020] |(screen shot from Variety.com) [225001830030] |In publicizing the cancellation of the ABC show 'Samantha Who' (NOTE the network, kids, it becomes critical...), the famed entertainment industry mag Variety published a brief story rife with typos and bizarre errors, but in particular one jumped out at me: [225001830040] |ABC had been mulling a plan to decrease “Samantha’s” budget by changing the way the show is shot -- moving from single-cam to a multicamera format. [225001830050] |Alphabet was looking to slash as much as $500,000 per episode from the show’s budget. [225001830060] |I highlighted in blue the strange error. [225001830070] |It appears as though the network name 'ABC' has been miscorrected to 'Alphabet' (see Language Log's extensive discussions of this phenomenon they have called the Cupertino Effect, for example here as well as Ben Zimmer's discussion here). [225001830080] |Upon first glance, this makes some sense, right? [225001830090] |'ABC' is a common way to refer to the Romanized alphabet ("do you know your ABCs" one might ask a child). [225001830100] |But what is truly perplexing is the randomness of the effect. [225001830110] |The string 'ABC' occurs 5 times (including the title) and the string 'alphabet' occurs twice. [225001830120] |As far as I can tell, there is no consistent context causing this. [225001830130] |The first example of 'alphabet' occurs at the beginning of a sentence as a bare noun, and there are 4 examples of 'ABC' occuring in this context (including the title). [225001830140] |The second example occurs in a definite NP headed by 'the' while the string 'ABC' does not occur in this context. [225001830150] |If you can find a triggering context, please let me know. [225001830160] |Your guess is as good as mine... [225001830170] |UPDATE: myl makes a critical point in the comments that this is an inside joke at Variety and not a Cupertino, then I wonder what the joke context is, just fyi, ya know...too much color? [225001850010] |Adam's Tongue pt 1 [225001850020] |(pic of hardback cover of Adam's Tongue) [225001850030] |On July 6th, I will be leading my DC area book club, Books and Banter, in our meeting on the new book Adam's Tongue: How Humans Made Language, How Language Made Humans by Derek Bickerton (Hardcover - Mar 17, 2009). [225001850040] |Amazon’s Product Description:Language is unique to humans, but it isn’t the only thing that sets us apart from other species—our cognitive powers are qualitatively different. [225001850050] |So could there be two separate discontinuities between humans and the rest of nature? [225001850060] |No, says Bickerton; he shows how the mere possession of symbolic units—words—automatically opened a new and different cognitive universe, one that yielded novel innovations ranging from barbed arrowheads to the Apollo spacecraft. [225001850070] |(opening page 3 of my copy of Adam's Tongue) [225001850080] |Since this book coheres closely with this blog’s topic of linguistics, I’m going to be posting my notes and thoughts as I read and prep for the discussion. [225001850090] |I won’t guarantee that I’ll revise and clean up my notes into entirely coherent prose (see pics above for my typically messy method of “reading” a book on linguistics...the left page was originally blank), but if you’re reading the book too, I hope this encourages your thoughtfulness and stimulates your critical reading. [225001850100] |UPDATE: My second post on this is here and my third is here. [225001850110] |This first post will cover only the Introduction, pages 3-15. [225001850120] |On general note: as I am no longer affiliated with a university, it is remarkably difficult for me to follow leads involving academic papers; therefore, many of the references Bicketon makes to published works (such as Derek Penn’s intriguing list of things humans can do that non-humans cannot, p8) are, for the time being, locked behind an impenetrable vault for the lowly lay Lousy Linguist and as such must go un-reviewed. [225001850130] |Apologies. [225001850140] |I shall review all that time and Google together permit. [225001850150] |Shall we begin? [225001850160] |My first reaction is that that the intro is written as a teaser (like most pop writing intros) and as such it leaves lots of questions to be answered. [225001850170] |This begs the question: will the rest of the book live up to the tease? [225001850180] |I’m a skeptic by nature, so I’m guarded in my expectations. [225001850190] |We shall see. [225001850200] |MAJOR POINTS [225001850210] |1. Thought experiment (p 3) –“imagine for a moment that you don’t have language and nobody else has either.” [225001850220] |Okay…hmmm…uh…wait, what? [225001850230] |First, as a linguist, I HAVE to ask: what is your definition of language? [225001850240] |This is a non-trivial question. [225001850250] |If you want me to understand how X originated, then you should help me understand exactly what X is. Note: the book index contains no entry for “language” per se. [225001850260] |UPDATE (June 17): the excellent blog Babel's Dawn (on the origins of speech), responds to Bickerton by asking a similar question: how is language to be defined, and then offering definitions here (HT The Outer Horde): [225001850270] |2. Language makes thought meaningful by putting thoughts together into meaningful wholes (pp 3-4). [225001850280] |Okay, so language is combinatorial syntax? [225001850290] |Can’t we say the same thing about logic? [225001850300] |Language is logic? [225001850310] |3. Darwin: having the tool of language caused us to develop greater cognitive capacity (p 5). [225001850320] |Is this similar to Jared Diamond’s Guns, Germs and Steel argument that the coincidental cooccurrence of geography, people, ecosystems was the “ultimate cause” of Western dominance? [225001850330] |I fear Bickerton may have an “ultimate cause” nightmare on his hand. [225001850340] |4. FOXP2 –Only one indexed reference to FOXP2 (p110). [225001850350] |He disparages the “brouhaha” around FOXP2 and I agree with his point here (there’s no such thing as a “gene for language”); nonetheless, FOXP2 is an interesting gene worth discussing at length, I think. [225001850360] |Yet, I wonder if this brevity isn’t an editorial function. [225001850370] |I recall the physicist Stephen Hawking retelling a caveat his publisher gave him when writing A Brief History of Time that every mathematical equation he chose to include in the book would cut his readership in half. [225001850380] |Perhaps the same could be said for each gene referenced. [225001850390] |5. Magic Moment (p 6) –Apparently he’s looking to explain the “magic moment” when our distant ancestors broke from other communication methods and started using language (uh...cough...hmm...please see 1). [225001850400] |6. Discontinuity (p 9) –evolutionary leaps = differences between species not attributable to gradual change. [225001850410] |7. Niche construction (p 11) –we “guide” our own evolution. [225001850420] |I don’t like the use of the word “guide” here. [225001850430] |Sounds too intentional. [225001850440] |Better if it’s just “affect”. [225001850450] |8. Learning vs. instinct (p11) –he writes “we adapt our environment to suit ourselves, in the same way ants and termites adapt the environment to suit them. [225001850460] |We do it by learning, they do it by instinct; big deal." [225001850470] |Whoa! [225001850480] |Whoa! [225001850490] |Yes, this IS a big deal. [225001850500] |Let us not trivialize the distinction between learning and instinct. [225001850510] |I’ve had just enough exposure to computational neuroscience to recognize that this is no small distinction. [225001850520] |MINOR POINTS [225001850530] |P 4 –“without language there wouldn’t be scientific questions” –here’s my interpretation of what he means: 1) the things we ask questions about exist apart from us but 2) the fact that we ask questions about them (and not others) is a function of our cognitive apparatus (this is a variation on Lakoff’s embodied consciousness, right?). [225001850540] |The fact that our embodied consciousness leads us to ask certain questions (and not others) does NOT mean that those questions are a priori more important than other questions; it only means that we consider them more important. [225001850550] |We could be wrong. [225001850560] |P 5 –Quoting Darwin does not impress me any more than quoting Aristotle or Buddha or Chomsky: it’s all argument from authority and I have little patience for it. [225001850570] |P 9 –“in this book, for the first time ever, I’m going to show...” [225001850580] |This reminds me of a point Foucault made in, I believe, History of Sexuality vol 1, that there is a tempting addiction to being the one who sees and reports the “truth” that others do not. [225001850590] |As I recall, his point was that this temptation leads people to report “truths” that are, in fact, not true. [225001850600] |Rather narcissistic, really, don’t you think? [225001850610] |Is Bickerton a wise man or a narcissist? [225001850620] |We shall see. [225001850630] |P 10 –I like this idea of niche construction and “constant feedback loop”. [225001850640] |Sounds entirely commonsensical. [225001850650] |Of course we affect our environment (despite the claims of global warming skeptics). [225001850660] |P 13 –I like this point that any given communication system is suited only to take care of that species needs (not some lego block building up of features and functions). [225001850670] |P 15 –the big question: what did our ancestors do (that other species did not do) that caused language to explode? [225001850680] |I am a skeptic by nature but I am intrigued, yet doubtful. [225001850690] |The difficult part lay before me. [225001850700] |12 chapters of challenging linguistic exploration. [225001850710] |Okay, Professor Bickeron. [225001850720] |I accept the challenge. [225001850730] |Lay on, Macduff, And damn'd be him that first cries, 'Hold, enough!' [225001860010] |Yo! Google This! [225001860020] |(screen shot of Google Spell Check) [225001860030] |Apparently, Gmail spell check does not recognized "googled" as a word (past tense of "to google"). [225001860040] |Will Microsoft spell checkers recognize "binged" as a word? [225001860050] |...and that's all I have to say about that. [225001860060] |UPDATE: It looks like Roger Shuy at Language Log has gotten on this bandwagon (a month after us, but he's welcomed aboard). [225001860070] |He re-iterates Faldone's point (see my comments) and suggests that Microsoft might be bucking this trend of verbing a trademark with bing. [225001860080] |He then makes (ahem) the same joke I did. [225001860090] |Welcome aboard Roger. [225001870010] |Adam's Tongue (pt 2) [225001870020] |This is the second in a series of posts detailing my notes and thoughts about the book Adam's Tongue as I prepare to lead a book discussion meeting July 6, 2009 in the DC metro area (see my first post here. UPDATE: My third post is here). [225001870030] |Ch 1 - The size of the problem [225001870040] |
  • This chapter is designed to walk through what's wrong with other theories of language evolution.
  • [225001870050] |
  • The basic point of the chapter seems to be this: no animal communication system (ACS) allows itself to refer to things distant in time and space, therefore they are not likely the precursors of language. [225001870060] |Everyone who has taken or taught a Language Files course knows these two criteria as Hockett's two communication features unique to human language (Bickerton get's to Hockett in due course).
  • [225001870070] |
  • On the very first page of this chapter, I noted, "Is there a gavagai problem here?" [225001870080] |By which I meant, how can we know what one of these ACS references really refers to? [225001870090] |Bickerton's index lists nothing for either "Quine" or "gavagai," though he skirts this issue repeated for the next few chapters (and possibly the whole book). [225001870100] |This dilemma become particularly critical in chapter 4 Chatting Apes, but I'll come to that later.
  • [225001870110] |As a background, here's a passage from Wikipedia's Indeterminacy of translation page describing Quine's famous example: [225001870120] |Consider Quine's example of the word "gavagai" uttered by a native upon seeing a rabbit[1]. [225001870130] |The linguist could do what seems natural and translate this as "Lo, a rabbit." [225001870140] |But other translations would be compatible with all the evidence he has: "Lo, food"; "Let's go hunting"; "There will be a storm tonight" (these natives may be superstitious); "Lo, a momentary rabbit-stage"; "Lo, an undetached rabbit-part." [225001870150] |Some of these might become less likely –that is, become more unwieldy hypotheses –in the light of subsequent observation. [225001870160] |Others can only be ruled out by querying the natives: An affirmative answer to "Is this the same gavagai as that earlier one?" will rule out "momentary rabbit stage," and so forth. [225001870170] |But these questions can only be asked once the linguist has mastered much of the natives' grammar and abstract vocabulary; that in turn can only be done on the basis of hypotheses derived from simpler, observation-connected bits of language; and those sentences, on their own, admit of multiple interpretations, as we have seen. [225001870180] |
  • No gradual move from ACS to human language (17): Since evolution is gradual and slow, there would have to be a "missing link" (my term, not DB's); an ACS that made the jump from referring to the here and now to referring to the distant and far. [225001870190] |No such link exists
  • [225001870200] |
  • Therefore, ACSs grew out of non-communication behaviors
  • [225001870210] |
  • Uniqueness of language also not relevant because many species have unique features (Pinker's elephant trunk, 20).
  • [225001870220] |
  • Humans suddenly had something else to "talk" about other than the here and now and THAT'S what spurned language.
  • [225001870230] |
  • The new thing humans had was abstract concepts (22). [225001870240] |We can talk about dogs as a category (he makes an important distinction between categories and concepts much later in chapter 4 at the bottom of page 87).
  • [225001870250] |
  • This new ability to abstract is not associated with evolutionary fitness.
  • [225001870260] |
  • Critical Point: other species didn't develop language because they didn't need language (p 24).
  • [225001870270] |
  • Bickerton's 4 tests for any theory of language evolution: 1) uniqueness, 2) ecology, 3) credibility, 4) selfishness (p28). [225001870280] |Bolles' blog Babel's Dawn discusses these criteria at length here.
  • [225001880010] |Adam's Tongue (pt 3) [225001880020] |(classic depiction of Saussure's arbitrariness of the sign claim) This is the third in a series of posts detailing my notes and thoughts about the book Adam's Tongue as I prepare to lead a book discussion meeting July 6, 2009 in the DC metro area (see my first post here and second here). [225001880030] |Ch 3 - Thinking Like Engineers I've spent the last 5 years working in natural language processing and with engineers and I agree that there is something very valuable for a linguist to "think like an engineer" so I was curious from the start about this chapter, but I was also weary because the Chomskyan syntacticians also "think like engineers" and I believe they have led linguistics down a garden path of false starts and flawed theories for 40 years. [225001880040] |So I read on cautiously. [225001880050] |
  • DB notes that he came into linguistics via pidgins and creoles and they bear on his thinking about language evolution. [225001880060] |But does this bias him too, like the man who has a hammer and sees everything as a nail? [225001880070] |We shall see.
  • [225001880080] |
  • DB says there's no syntax when we try to speak with people who don't share our language (p 39) because we don't know enough of the language, the foreign words just pop out as we grope for them. [225001880090] |Now, I certainly defer to DB's far greater expertise of pidgin &creole formation, but this thought experiment of his does not jive with my own experiences. [225001880100] |Like many travelers, I've had this exact experience in places like Guangzhou China and Prague but I don't think the foreign words "just popped out" quite as randomly as he suggests. [225001880110] |I'm tending to side with Slobin here.
  • [225001880120] |
  • He claims that protowords must not have had any internal morphological structure (41) because early language users would have had no rules defining that structure. [225001880130] |On it's face, this makes sense, nonetheless this begs the question: which came first, the word or the morphology? [225001880140] |Is it not plausible that some neurologically based process for seeking internal structure to sounds developed prior to the advent or words? [225001880150] |I just don't know.
  • [225001880160] |
  • The boom vocalization of the Campbell's monkey occurs 30 seconds before the alarm (42). [225001880170] |My first reaction: wow! this is stretching the limits of transitional probabilities, isn't it? [225001880180] |Can we plausibly claim that an association between sounds 30 seconds apart is neurologically feasible?
  • [225001880190] |
  • DB claims these booms are not modifiers (p42) because the boom "cancels out" the alarm. [225001880200] |I'd have to review the literature on these boom carefully, but my first reaction is: does it really cancel the alarm? [225001880210] |If I understand the context, it simply means "not immediate threat (but still a threat)". [225001880220] |That's not a cancellation. [225001880230] |It's more like epistemic modality: "there MIGHT be danger."
  • [225001880240] |
  • Page 44 -- The gavagai problem restated.
  • [225001880250] |
  • Confused: I'm confused by DB's claim on page 45 that "words combine as separate units -- they never blend. [225001880260] |They're atoms, not mudballs." [225001880270] |I'm not sure what he means. [225001880280] |Blending and combining are different, in that blending suggests some elements of both previous words/calls are preserved in the new word/call. [225001880290] |This happens all the time in contemporary linguistic change (classic example: motel blends motor + hotel, persevering bits of each's morphology as well as semantic blending). [225001880300] |But I suspect DB is not referencing that. [225001880310] |So what is he referencing?
  • [225001880320] |
  • He makes a nice distinction between ACSs and Language: ACSs are primarily for manipulation of behavior while language is primarily for information sharing. [225001880330] |I have no clue if this is really true, but if yes, it's a good point (p 47).
  • [225001880340] |
  • He writes "language units are symbolic because they're designed to convey information." [225001880350] |A nice follow-up point on the difference point above, but it begs the question: what is "information"? [225001880360] |Any answer which supports DB would have to couch a definition in abstraction, right? [225001880370] |E.g., Information is a conceptualization that is independent from direct reference.
  • [225001880380] |
  • DB makes a bold claim on page 52 that strikes at the heart of post-Saussurean linguistics: displacement is a more important factor to language evolution than arbitrariness. [225001880390] |But it's worth noting that both are functions of abstraction, so perhaps this is just another version of his previous point that the jump to abstract thought is the key.
  • [225001880400] |On to chapter 3 -- Singing Apes.... [225001890010] |Space and Thought [225001890020] |(two of Boroditsky's stimuli, pdf here) [225001890030] |Yet again, Andrew Sullivan treads into the area of linguistics and cognition research. [225001890040] |But at least this time he's wise enough to make no comments about the studies he links to (he's typically misguided, or flat out wrong in his linguistic sensibilities, see here, here and here). [225001890050] |This time he reprints a quote here from an article titled How Does Our Language Shape The Way We Think? written by Stanford assistant professor Lera Boroditsky regarding how language influences thought. [225001890060] |Of course, Sullivan reprints the least interesting piece of information in the article, a mere behavioral anecdote about how speakers of different languages use different direction terms. [225001890070] |This fact has been well known for a long time (I first learned about it in an introductory cog sci course in 1998 and it was old news then). [225001890080] |The more interesting fact is the following effect she observed during a test to compare Russian and English speakers' ability to discriminate shades of blue (color terms is a classic topic within cognitive science going back to Berlin &Kay's work in the sixties, see here): [225001890090] |The disappearance of the advantage when performing a verbal task shows that language is normally involved in even surprisingly basic perceptual judgments —and that it is language per se that creates this difference in perception between Russian and English speakers. [225001890100] |After skimming Boroditsky's article, I felt had it was a very good review of the field of language and thought studies as I remember it, but it didn't add much, if anything, but it's clearly a layperson's article, so I looked at her Stanford page and skimmed her list of publications and more critically, the references she cites. [225001890110] |My first impression was, "she doesn't cite much, does she?" [225001890120] |I'm used to experimental psychology articles containing lists of references almost as long as the article itself, but most of her (first author) papers have a handful of citations. [225001890130] |But the more surprising thing was the notable absence of two names, Len Talmy and Jürgen Bohnemeyer. [225001890140] |I'll grant that I'm a little biased because both of them were professor's at my grad school, but the granting ends there. [225001890150] |I can't imagine writing a serious research paper on how language shapes thought without references to one or both of these researchers, especially as Talmy has written an extensive, typologically rich, two volume set on the relationship between language ands thought: Toward a Cognitive Semantics and he has a forthcoming book The Attention System of Language (his work in progress handout on the same topic can be read in this pdf). [225001890160] |Don't get me wrong, I basically like Boroditsky's research methods and approach. [225001890170] |I just think it's time for her to review Talmy and Bohnemeyer. [225001900010] |Immortality? [225001900020] |(pic from Huffington Post) [225001900030] |Headline: Henry Allingham: World's Oldest Man Dies At 113. [225001900040] |Am I wrong, or is it logically impossible for the world's oldest man to die? [225001910010] |Them Maths Is Hard [225001910020] |This morning's NYT contained an article on search engines which contained a claim of such discombobulated mathematical incompetence, I just had to share: [225001910030] |It’s no secret that even with their recently-announced alliance, Yahoo and Microsoft will lag well behind Google in the hugely profitable search and search advertising business. [225001910040] |How far behind? [225001910050] |With a combined 28 percent of the American search market, Yahoo and Microsoft could double their usage and still trail Google, which accounts for 65 percent of the market. [225001910060] |I don't have to get all Mark Liberman on you to explain what's wrong with this claim. [225001910070] |If Microsoft/Yahoo! doubled their 28% market share, that's 56%, at which point they would no longer trail Google who could have no more than 44% of the market. [225001910080] |Maybe it's finally time to stop reading the NYT... [225001920010] |Against Prescriptivism [225001920020] |It's all too common for prescriptivists to complain about word usage deviations, as if a word had one fixed meaning forever and ever. [225001920030] |This is not true. [225001920040] |A couple of good examples popped up on The Daily Dish when guest blogger Conor Clarke, a smart and well educated journalist, used two words (arbitrary and cynical) in ways that deviate from the way I would use them (and from what I would consider traditional usage); yet, his usage conforms to the way both of these words seem to be evolving in general usage in American English: [225001920050] |"We are all born with talents that are equally arbitrary -- strength and intelligence and social grace -- and yet we all compete for prizes under the impression that the outcomes are fair. [225001920060] |Perhaps something called free will enters the picture at some point. [225001920070] |And perhaps not: The ability to work hard might be doled out just as arbitrarily at a Y Chromosome or a great voice. [225001920080] |I don't know how you'd prove it either way. [225001920090] |Anyway, the cynical conclusion here is that there's nothing inherently just or fair about these outcomes." [225001920100] |On ArbitraryFor me, something is arbitrary when it is a function of decision making (note its obvious relationship to arbitrate). [225001920110] |For example, WordNet's definition:"based on or subject to individual discretion or preference or sometimes impulse or caprice." [225001920120] |But Clark uses it to mean something like 'not under our direct control' when he describes genetic traits as arbitrary. [225001920130] |I can imagine an historical shift whereby decisions that are arbitrary came to be viewed as being made on the idiosyncratic whim of the decider (rather than based on some sound, objective, logical reasoning). [225001920140] |Hence, the word came to mean 'unfair or without sound reason'. [225001920150] |Then, quite recently I believe, the word shifted again when users found a salient connection between 'lacking sound reason' and 'out of one's direct control'. [225001920160] |And this seems to be how American English speakers of Clarke's generation (I believe he's about 15 years younger than I am) use the word. [225001920170] |And this helps explain why it's now commonly used for situations where an outcome is indifferent to fairness. [225001920180] |On CynicalFor me, a person is cynical when they reduce the intentions of others down to one root cause, namely selfishness. [225001920190] |For example, WordNet's definition: "believing the worst of human nature and motives; having a sneering disbelief in e.g. selflessness of others." [225001920200] |But Clarke uses it to mean something like 'preferring the explanation that is most indifferent to fairness'. [225001920210] |The conclusion he predicates as cynical has nothing to do with human motives or intentions. [225001920220] |I believe what he's saying in the last sentence of the passage above is that there are two competing beliefs: [225001920230] |Belief A = competition outcomes are fair because all competitors start out equal. [225001920240] |Belief B = competition outcomes are indifferent to fairness because they are rooted in genetic differences (which themselves are indifferent to fairness). [225001920250] |Clarke then says that to prefer Belief B is to be cynical. [225001920260] |Final ThoughtAs for me, I believe word meanings are arbitrary, but then again, I'm cynical. [225001920270] |PS: For font geeks, that's Bradely Hand ITC. [225001930010] |Fuck The Bills!!! [225001930020] |A rare (but much deserved) non-linguistics rant: [225001930030] |The Bills suck ass. [225001930040] |Deal with it. [225001930050] |How can that moron run it back? [225001930060] |What's he thinking? [225001930070] |Why? [225001930080] |What value is there to a run back? [225001930090] |This reminds me of the game against Dallas when they led by 8 and all they had to do was position for a field goal and they would have had the greatest upset in NFL history. [225001930100] |But no, those fucking morons throw and it and it gets intercepted and they lose the fucking game. [225001930110] |Fuck the Bills. [225001930120] |Fuck 'em. [225001930130] |Don't watch their games. [225001930140] |Don't buy their merchandise. [225001930150] |Let them move to Toronto. [225001930160] |They don't deserve fans. [225001930170] |Fuck the Bills. [225001930180] |Fuck 'em. [225001930190] |The Buffalo Bills are like the worst relationship anyone has ever been in. [225001930200] |The one you're totally, blindly in love with but who just keeps fucking you over and you let them because you're so fucking deep you're willing to be shit on just to be in the same room with them and you'll never be the one to cut the chord. [225001930210] |Until the day they just stop being interested in toying with you and they just go away, and it's the best thing in the world for you, but you don't get that right away, you’re crushed until years later you get it. [225001930220] |They sucked. [225001930230] |They sucked ass and they should die, but they’re gone now and that’s good. [225001930240] |Fuck ‘em. [225001930250] |Fuck the Bills. [225001930260] |Go to Toronto and stop fucking us over. [225001930270] |We’ll fall in love with the Steelers soon anyway, because they actually win shit! [225001930280] |That’s right, I said it. [225001930290] |Fuck the Bills because they lose a lot and the Steelers actually win. [225001940010] |on "High Speed" [225001940020] |(screen shot of hotel connection speed) [225001940030] |It has become painfully clear that the meaning of "high speed" with respect to internet connection is being co-opted by hotel franchises as a marketing tool and as a result is fast being cleansed of any valuable meaning. [225001940040] |Case in point, I'm in Kansas this weekend for business, but it turns out that Kansas State and U. Kansas both have home football games this weekend and they're both less than an hour's drive of where I am, so the hotels around the area have all been booked solid. [225001940050] |Hence, I was forced to take a room at a modest priced hotel (with a poor reputation) but at least they had "high speed internet" so I could get work done, right? [225001940060] |This is a business trip, remember. [225001940070] |Not so fast (literally): I'll spare you the rant about the many other issues with this hotel and point you to the screen shot above which shows the speed of my connection (when it actually worked, that is). [225001940080] |How does 305kbs download speed count as "high speed"? [225001940090] |I hereby call upon the ISO to determine a minimum speed that shall henceforth be the standard for determining whether a connection is "high speed" or not...pretty please? [225001940100] |It will surely embiggen the hearts of my more gentle readers to know that I convinced my company to waive their per diem and find me more suitable lodgings for the remainder of the trip. [225001940110] |NOTE: This is another good example of the commercialization of Google's search engine. [225001940120] |Any query with "high speed internet" in it will be riddled with advertisements for service, not discussions about. [225001940130] |Yet another reason Google is not a good linguistics research tool. [225001940140] |See more discussion here. [225001950020] |I guess I'm a few months behind the curve on this one, but I just watched the Google Wave demo about the new social media/collaboration tool and I'm seriously impressed. [225001950030] |They said it should go live in 2009, so maybe by Christmas? [225001950040] |Pretty please... [225001950050] |In any case, when the video first started, the guy said something that caught my ear. [225001950060] |He mentioned that traditional email is built around the snail mail model where a message is an object that goes from a sender to a receiver. [225001950070] |But, Google Wave discards that model in favor of a model of a "conversational" where the conversation as a whole is a single object which simply gets updated in a single place, not sent around (like a chat session). [225001950080] |This struck me as linguistically interesting because this is more in line with traditional conversation analysis theory which centers around "the floor" where one can "hold the floor", or "interrupt the floor", etc. [225001950090] |This more natural model of conversations has yielded a beautiful and elegant collaboration tool that I can't wait to get my hands on. [225001950100] |Hopefully Google's model of conversations is more coherent than the ragtag sloppiness that pervaded the linguistic analysis of conversations. [225001950110] |It's a tough field, no doubt. [225001950120] |Also, near the end, the speaker made what I took to be a geek version of a linguistic relativity claim: he said the it was only the Google Web Toolkit (HTML 5 &Java) that allowed him to think of Wave's possibilities that he never would have thought of otherwise. [225001950130] |I'm not sure this is really true, of course, but a cute thought nonetheless. [225001960010] |Judge This! [225001960020] |(screen shot from NBC's Community) [225001960030] |I'm not normally a spelling fanatic, mainly because I'm such a horrible speller myself. [225001960040] |However, I'm also not a set designer for a major network sitcom production. [225001960050] |Unlike the person who designed the backdrop for the recent episode of NBC's Community which featured a gigantic sign reading "JUDGES BOOTH." [225001960060] |I'm reasonably certain that the booth in question is possessed by the judges in question, rendering the preferred orthography as "JUDGES' BOOTH" (same in MLA and APA, see Purdue's excellent OWL site for MLA and for APA): [225001960070] |add ' to the end of plural nouns that end in -s: --- houses' roofs --- three friends' letters [225001960080] |See for yourself: Community, Season 1 : Ep. [225001960090] |5 "Advanced Criminal Law", 9:26 minute mark (on Hulu) [225001970010] |Infrequently Asked Questions [225001970020] |A nice example of a linguistic construction is Frequently Asked Questions because, as far as I can tell from the lists of questions on most of these pages, they are almost cerytainly NOT frequently asked at all. [225001970030] |I've never once seen a page that lists the number of times a particular question has been asked nor any discussion of the method of counting said frequency. [225001970040] |It simply goes without saying that "Frequently Asked Questions" are simply those that the creator of the page either a) perceives as important or b) wants readers to think about (some are clearly designed by marketers to push certain points of view). [225001980010] |The Right to Write [225001980020] |Last Friday, one of the world's most articulate and brave bloggers, Yoani Sánchez, was brutally beaten and kidnapped by her own government. [225001980030] |Read her description of the events here A gangland style kidnapping. [225001980040] |Read her blog Generation Y. [225001980050] |Thankfully, she is recovering and remains resolute as a blogger and dissident. [225001980060] |In her own words: [225001980070] |"Thank you to friends and family who have looked after and supported me, the effects are fading, even the psychological ones which are the hardest. [225001980080] |Orlando and Claudia are still in shock, but they are incredibly strong and also will overcome it. [225001980090] |We have already begun to smile, the best medicine against abuse. [225001980100] |The principal therapy for me remains this blog, and the thousands of topics still waiting to be touched on." [225001990010] |Crowdsourcing Annotation [225001990020] |(image from Phrase Detectives) [225001990030] |Thanks to the LingPipe blog here, I discovered an online annotation game called Phrase Detectives designed to encourage people to contribute to the creation of hand annotated corpora by making a game of it. [225001990040] |It was created by the University of Essex, School of Computer Science and Electronic Engineering. [225001990050] |Of course, they have a wiki, Anawiki. [225001990060] |I'm not crazy about the cutesy cartoon mascot (they given it a name: Sherlink Holmes. [225001990070] |Ugh. [225001990080] |I guess Annie would be a bit too obvious?) . [225001990090] |I've wondered aloud about this kind of thing before, so I'm glad to see it coming to fruition. [225001990100] |I haven't started playing the game yet, but I'm looking forward to it. [225001990110] |For now, here is the project description: [225001990120] |The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora, but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more than one million words. [225001990130] |However, the success of Wikipedia and other projects shows that another approach might be possible: take advantage of the willingness of Web users to contribute to collaborative resource creation. [225001990140] |AnaWiki is a recently started project that will develop tools to allow and encourage large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (in the first instance, of a corpus annotated with information about anaphora). [225001990150] |Cheers. [225002000010] |Random Linguistics [225002000020] |(randomly discovered blog miresua conlang) [225002000030] |For reasons that are not entirely clear to me, there is a remarkable prevalence of what I'll call quazi-linguistics blogs on blogger.com. [225002000040] |Try, as I just did, using the "Next Blog" button above at the top left of this page ten or more times. [225002000050] |Each time it will take you to a randomly selected blog within the blogger network of blogs (No, I'm wrong here. see update below). [225002000060] |It's pretty cool. [225002000070] |Almost as good as StumbleUpon. [225002000080] |But I suspect you'll find, as I did, a preponderance of language/linguistic related blogs. [225002000090] |My rough estimate was 60% of the blogs were language related. [225002000100] |Now, this was driven up a bit by many ESL sites, but that counts, as far as I'm concerned. [225002000110] |Unfortunately, the quality of these blogs was poor, at best (e.g., see the tiresome anti-passive voice post here). [225002000120] |Why are so many bloggers blogging about language issues? [225002000130] |Maybe Geoff Nunberg is right and "the Internet turns everybody into a linguist" (see here). [225002000140] |UPDATE: Commenter MPJ cleared up the mystery. [225002000150] |Blogger.com's Next Blog button is NOT random (it used to be). [225002000160] |Blogger.com's explanation here (HT The Real Blogger Status). [225002000170] |Money quote: [225002000180] |We've made the Next Blog link more useful, by taking you to a blog that you might like. [225002000190] |The new and improved Next Blog link will now take you to a blog with similar content, in a language that you understand. [225002000200] |If you are reading a Spanish blog about food, the Next Blog link will likely take you to another blog about food. [225002000210] |In Spanish! [225002000220] |I'd be interested to know if they're using the same technology as their Ad Sense product to detect "similarity." [225002000230] |How do they determine the anchor blog? [225002000240] |Also, I think I can still make a similar claim to my original one: of the blogs that are related to language, most are prescriptivist. [225002000250] |Fair? [225002010010] |Are All Writing Systems Alike? [225002010020] |(image from The Topography of Language) [225002010030] |Just started reading an interesting article by the evolutionary biologist Marc Changizi who claims in The Topography of Language that all the world's writing systems utilize the same set of shapes because these shapes were selected for during the evolution of our visual system (or something like that). [225002010040] |More as I digest this interesting claim. [225002010050] |Money quote:Amongst both non-linguistic and linguistic signs, some visual signs are representations of the world­ e.g., cave paintings and pictograms, respectively­ and it is, of course, not surprising that these visual signs look like nature. [225002010060] |It would be surprising, however, to find that non-pictorial visual signs look, despite first appearances, like nature. [225002010070] |Although writing began with pictograms, there have been so many mutations to writing over the millenia that if writing still looks like nature, it must be because this property has been selectively maintained. [225002010080] |For non-linguistic visual signs, there is not necessarily any pictorial origin as there is for writing, because amongst the earliest non-linguistic visual signs were non-pictorial decorative signs. [225002010090] |The question we then ask is, Why are non-pictorial visual signs shaped the way they are? [225002010100] |HT: Stanislas Dehaene (via The Daily Dish) [225002020010] |Abracadabra! I Win! [225002020020] |(image from Slate.com) I tend to avoid Slate.com these days because, frankly, I typically find myself scoffing at some idiot article they've published that promotes such a ridiculous mis-reading of academic research that it's hardly worth finishing... like this one from today: A Better Way to Fight With Your Husband which linked to this article: The Healthiest Way To Fight With Your Husband. [225002020030] |It's a classic piece of idiot journalism worthy of a Full Liberman* if only it weren't so trivial and obvious as to be beneath the man, so I'll take a crack at it. [225002020040] |The big point is that fabulous new research from real life scholars (psychologists nonetheless, and they're almost like scientists) proves that women should use particular words when yelling at their husbands (the experiment used heterosexual married couples). [225002020050] |Pretty awesome, ain't it! [225002020060] |Just use the right words, and like a magic key you can unlock the mysteries of the brain and make it do what you please (okay, I'm starting to exaggerate, but less than you might think). [225002020070] |First let's look at the way the academic article is summarized in the puff piece that Slate linked to: [225002020080] |A new study of married couples, however, has found physiological evidence for one technique to diffuse tension: choosing the right fighting words. [225002020090] |Couples who used analytical language, such as “think,” “understand,” “because,” or “reason,” during heated arguments were able to keep important stress-related chemicals in check, according to research published in the latest issue of the journal Health Psychology. [225002020100] |Cytokines are inflammatory chemicals that spike during periods of prolonged tension and can lower your immunity and lead to early frailty, Type 2 diabetes, arthritis, and some cancers. [225002020110] |The authors noted a curious gender twist in their results. [225002020120] |Husbands benefitted from their wives’ measured language, but a man’s carefully chosen words had little effect on a woman’s cytokine balance. [225002020130] |To be fair, here is a passage from the authors' abstract of the original article: [225002020140] |Effects of word use were not mediated by ruminative thoughts after conflict. [225002020150] |Although both men and women benefited from their own cognitive engagement, only husbands' IL-6 patterns were affected by spouses' engagement. [225002020160] |Conclusion: In accord with research demonstrating the value of cognitive processing in emotional disclosure, this research suggests that productive communication patterns may help mitigate the adverse effects of relationship conflict on inflammatory dysregulation. [225002020170] |And here is a passage from this interview with the first author, Jennifer Graham, Penn State assistant professor of biobehavioral health: [225002020180] |"We specifically looked at words that are linked with cognitive processing in other research and which have been predictive of health in studies where people express emotion about stressful events," explained Graham. [225002020190] |"These are words like 'think,' 'because,' 'reason' (and) 'why' that suggest people are either making sense of the conflict or at least thinking about it in a deep way." [225002020200] |For the study, the 42 couples made two separate overnight visits over two weeks. [225002020210] |"We found that, controlling for depressed mood, individuals who showed more evidence of cognitive discussion during their fights showed smaller increases in both Il-6 and TNF-alpha cytokines over a 24-hour period," said Graham, whose findings appear in the current issue of Health Psychology. [225002020220] |During their first visit, couples had a neutral, fairly supportive discussion with their spouse. [225002020230] |But during the second visit, couples focused on the topic of greatest contention between them. [225002020240] |"An interviewer figured out ahead of time what made the man and woman most upset in terms of their relationship, and we gave each person a turn to talk about that issue," said Graham. [225002020250] |Researchers measured the levels of cytokines before and after the two visits and used linguistic software to determine the percentage of certain types of words from a transcript of the conversation. (my italics) [225002020260] |The researchers' results suggest that people who used more cognitive words during the fight showed a smaller increase in the Il-6 and TNF-alpha. [225002020270] |Cognitive words used during the neutral discussion had no effect on the cytokines. [225002020280] |When they averaged the couples' cognitive words during the fight, they found a low average translated into a steeper increase in the husbands' Il-6 over time. [225002020290] |There were no effects on the TNF-alpha. [225002020300] |However, neither couple's nor spouse's cognitive word use predicted changes in wives' Il-6, or TNF-alpha levels for either wives or husbands. [225002020310] |Graham speculates that women may be more adept at communication and perhaps their cognitive word use had a bigger impact on their husbands. [225002020320] |Wives also were more likely than husbands to use cognitive words. [225002020330] |Well, thank gawd they used fancy computers to count cognitive words! [225002020340] |After reading these three descriptions, it was clear to me that the original work is likely flawed. [225002020350] |I don't have access to the original study, unfortunately, but taken together, the abstract and first author's interview suggests to me that it makes the same mistake most non-linguists make: they assume the linguistics part is easy and don't put enough effort into it. [225002020360] |Dr. Graham's initial claim in the interview jumps out at me: "We specifically looked at words that are linked with cognitive processing in other research..." [225002020370] |Hmm? [225002020380] |Words that are "linked with cognitive processing?" [225002020390] |What does this mean? [225002020400] |I would love to see the references page to follow-up on this "other research." [225002020410] |Graham later refers to these as "cognitive words." [225002020420] |They are alternately referred to as analytical language, measured language, conflict-resolution words, and cerebral words. [225002020430] |From the puff piece and the interview we have five examples: [225002020440] |
  • because
  • [225002020450] |
  • reason
  • [225002020460] |
  • why
  • [225002020470] |
  • think
  • [225002020480] |
  • understand
  • [225002020490] |Huh? [225002020500] |One conjunction, one interrogative, and three verbs of cognition. [225002020510] |Hmmm. [225002020520] |Is there any intuitive reason to believe that "because" is "linked with cognitive processing" in some special way that other words are not? [225002020530] |Is it the fact that it grammatically links clauses? [225002020540] |Many words do this. [225002020550] |Are the verbs on the list simply because they are verbs of cognition? [225002020560] |Are run and jump less "linked with cognition" because they are verbs of motion? [225002020570] |I would have to speculate on what this "other research" discovered about the magical properties of the special words that make them the key to brain chemicals. [225002020580] |Abracadabra! [225002020590] |Poof! [225002020600] |Also, it's not at all clear to me why they averaged the couples' frequency count. [225002020610] |What is this average supposed to tell us? [225002020620] |However, the puff piece makes the leap into idiotsville all by itself: [225002020630] |"The study is significant because it’s one of the first to link language with biological markers and show what kinds of words help sparring couples rather than just recommending they “communicate more,” explains James Pennebaker, chair of the department of psychology at the University of Texas-Austin, who has studied the role of language on relationships." (my italics). [225002020640] |Nope. [225002020650] |No link. [225002020660] |Just a transcript. [225002020670] |Given the study's methodology of counting words in a transcript, at no point could they possibly have been able to show any causal relationship between a particular word's utterance and the levels of a particular chemical in a person's brain. [225002020680] |The puff piece authors pull the classic journalist's trick of "being fair" by adding actual linguist Deborah Tannen's skepticism of the "link" between particular words and particular chemicals, but they abandon all skepticism just a few sentences later and end with a bang! [225002020690] |"”Even when it seems like he is ignoring you, your words may be having an effect—at least on a chemical level,” says Graham" [225002020700] |Sigh. [225002020710] |*I'm going to start using the term "The Full Liberman" to refer to Mark Liberman's excellent manner of debunking bad journalism (see here and here for examples). [225002020720] |UPDATE (11/28/09): A nice summary of Full Liberman's at LL here. [225002030010] |Delicious Martian Fruit [225002030020] |(screen shot from University of Edinburgh) [225002030030] |I assume you'll be having some yummy neluka pie, fresh kapihu, or baked lanepi with cinnamon to finish off your Thanksgiving meal tomorrow. [225002030040] |Personally, I can't resist a stiff vodka &mola juice cocktail (only a radish garnish will do, people, I'm a stickler for proper cocktail garnishment). [225002030050] |Well, maybe this is what we'd eat if we spoke the spooky Alien Language Simon Kirby et al. are growing (HT LL). [225002030060] |The good folks across the pond at the University of Edinburgh's School of Philosophy, Psychology and Language Sciences, Department of Linguistics and English Language Language Evolution and Computation Research Unit (takes a breath) have been trying to discover how languages evolve. [225002030070] |To further this, they have been conducting some interesting experiments with artificial (aka 'alien') languages that begin small (e.g., with just a few fruit names), but which are then grown via cultural transmission of subsequent participants. [225002030080] |What they are finding, not unlike Marc Changizi in some ways (see here) is that "language has adapted to be good at being learned by us. [225002030090] |This can happen because language evolves culturally through being repeatedly learned and used by generations of individuals." [225002030100] |They have also posted online what they call "an early version of an online cultural evolution experiment game relating to this work." [225002030110] |However, it seems to be, at first at least, a version of the classic toy/game Simon (a sort-of prehistoric Play Station) where players have to repeat a series of sound/color stimuli. [225002030120] |Unfortunately, unlike the familiar kid's toy, this one starts out at a fairly difficult level. [225002030130] |No easy warm up period (hmmm, much like babies learning language???). [225002030140] |In any case, I found it frustrating and my gaze was quickly distracted by milk and cookies...well, beer and cookies (I'm saving the vodka molas for tomorrow). [225002040010] |Gee Wiz, Alien Language [225002040020] |(image of USC professor Paul R. Frommer from LA Times) [225002040030] |There are certain topics in linguistics that are far more interesting to non-linguists than linguists themselves. [225002040040] |Animal language is a classic example, as well as language evolution. [225002040050] |And third on the list is alien languages from movies (as opposed to Kirby's artificial languages). [225002040060] |For example, for decades now people have been fascinated by Marc Okrand's Klingon (this guy took it a little too far though; isn't this child abuse?). [225002040070] |When people hear that someone has "invented a language," they seem shocked, shocked! to discover that such a thing occurred. [225002040080] |As if it's a difficult feat. [225002040090] |There seems to be a gee wiz factor. [225002040100] |In fact, the average second year grad student in linguistics can do it, and typically they do, just for fun. [225002040110] |Logicians are required to do it. [225002040120] |Here, let's make up a language right now: [225002040130] |Language X [225002040140] |lexiconbbhl = /bel/, intransitive, 'to run', (actor)hhli = /hla:/, transitive verb, 'to hit', (undergoer, actor)ttrsh = /dos/, proper noun, 'Wally'pploi = /pli/, proper noun, 'Sparky'8_9 = /ha_mu/, particle, simple past [225002040150] |rulesS --> V + NS --> V + N + NV --> prt+V+prt [225002040160] |There. [225002040170] |Done. [225002040180] |I just invented language X and it took all of 20 minutes. [225002040190] |Now, which of the following sentences are grammatical in language X and what do they mean? [225002040200] |Which rules to do ungrammatical sentences break? [225002040210] |
  • bbhl ttrsh
  • [225002040220] |
  • ttrsh bbhl
  • [225002040230] |
  • 8hhli9 pploi ttrsh
  • [225002040240] |
  • ttrsh pploi
  • [225002040250] |
  • 8hhli9 ttrsh pploi
  • [225002040260] |
  • hhli9 ttrsh pploi
  • [225002040270] |Answers below. [225002040280] |The latest variation of this hoopla comes to us from James Cameron's latest big budget movie Avatar. [225002040290] |Cameron recruited a linguist from USC, Paul Frommer, to create a language for his goofy blue aliens. [225002040300] |But an article about this from the LA Times involved a bit of an exaggeration: "USC professor creates an entire alien language for 'Avatar'" (my emphasis). [225002040310] |Wow! [225002040320] |An entire language, you say? [225002040330] |That's gotta be at least 30 or 40 thousand words and at least a couple thousand rules, right? [225002040340] |Nope. [225002040350] |In fact, the language only contains about 1000 words. [225002040360] |From the article itself: "Between the scripts for the film and the video game, Frommer has a bit more than 1,000 words in the Na'vi language, as well as all the rules and structure of the language itself." [225002040370] |It seems a tad redundant to say "rules and structure" of a language, but that's neither here nor there. [225002040380] |As far as I can tell (after just a little bit of Googling) the Na'vi language has not been released so it's not possible to follow up on just how extensive this language is beyond the word count reported in the article. [225002040390] |I'm sure a grammar is on the way. [225002040400] |Sci fi fans are notoriously detail oriented. [225002040410] |But it brings up a more serious issue: what counts as a language? [225002040420] |Language X above certainly counts as a language in the simple sense of having a lexicon and set of rules for combining them. [225002040430] |Heck, I even threw in some phonetics. [225002040440] |If we want to claim that language X is not an entire language, we're gonna have to come up with some guidelines for what counts as an entire language. [225002040450] |The logicians have their rules for formal languages, of course, but we need some natural human language guidelines. [225002040460] |I'm sure the pidgin/creole experts have thoughts on this and this is one of things that pidgin &creole expert Derek Bickerton ruminates on in his book Adam's Tongue. [225002040470] |See my reviews here. [225002040480] |He's concerned with what proto-language must have looked like when humans first used language. [225002040490] |Now, I do not mean to belittle professor Frommer's accomplishment. [225002040500] |I can certainly imagine spending a lot of time and energy on creating a language. [225002040510] |But it's not rocket science. [225002040520] |It's closer to knitting. [225002040530] |Answers: [225002040540] |
  • bbhl ttrsh = 'Wally runs'
  • [225002040550] |
  • *ttrsh bbhl -- bad because all sentences in X begin with a verb
  • [225002040560] |
  • 8hhli9 pploi ttrsh = 'Wally hit Sparky'
  • [225002040570] |
  • *ttrsh pploi -- bad because all sentences in X must have a verb
  • [225002040580] |
  • 8hhli9 ttrsh pploi = 'Sparky hit Wally'
  • [225002040590] |
  • *hhli9 ttrsh pploi -- bad because past tense morpheme is not properly realized
  • [225002040600] |UPDATE: cute HTML note. [225002040610] |My original argument structure definitions used angle brackets and I only just now realized they didn't show up in the post, because, of course, those are interpreted as HTML tags. [225002040620] |So I used parens. [225002040630] |UPDATE 2: a commenter points out a more complete interview with Frommer here. [225002040640] |UPDATE 3: I scooped Ben Zimmer on this one (HT Language Hat), another LL scoop for me. [225002040650] |UPDATE 4: Ben Zimmer has posted a gust post by Frommer in which he gives a brief description of the language here.
    [225002050010] |Online Psycholinguistics Experiments (repost) [225002050020] |NOTE: Given this blog's recent surge in popularity (props to Language Log, Language Hat, and something called EastSouthWestNorth blog) I decided to update and repost this because I believe in increasing the use of online methodologies for linguistic research and I hope to send some of you good folks reading this right now over to these good folks below and hopefully you will participate in their experiments. [225002050030] |Generally it takes little of your time and the results could help further our understanding of just how the heck language works ('cause honestly, no one really knows). [225002050040] |I happily request submissions of other online linguistics related experiments. [225002050050] |Original post here. [225002050060] |Experimental psycholinguists requires experimental subjects like any other empirical cognitive science. [225002050070] |Unfortunately, researches are often constrained by limited resources. [225002050080] |Typically, psycholinguists use college students bribed with money or extra credit as subjects. [225002050090] |It's not unheard of for a published psycholinguistics study to have involved as few as 12 subjects. [225002050100] |This has been a necessary evil because there has never been a good way to collect large numbers of subjects together and provide them with a coherent experimental design. [225002050110] |Lately, however, researchers are turning to the web as a place to conduct experiments with large groups of subjects. [225002050120] |Yes, there are issues regarding control (e.g., if you need native speakers of English, how can you ensure that a subject really is a native speaker?), but these issues come up in all types of experimental paradigms. [225002050130] |I believe that good standards and practices to ensure quality online psycholinguistic experiments will emerge over time. [225002050140] |So, I'm all for moving ahead. [225002050150] |With that in mind, here are a set of sites offering online psycholinguistic experiments: [225002050160] |
  • The Max Planck Institute for Psycholinguistics -- Online Experiments.
  • [225002050170] |
  • The Portal for Psychological Experiments on Language (largest selection of experiments, that's a modified screen grab of some of their experiments above)
  • [225002050180] |
  • Cognition and Language Laboratory (they have a blog too!)
  • [225002050190] |
  • The Colour Imaging Research Group at the London College of Communication: Color Naming.
  • [225002050200] |
  • CogLab2 (the Cognitive Psychology Online Laboratory)
  • [225002050210] |
  • Other Web Experiments (from The Portal above).
  • [225002050220] |
  • University of Essex's Lexical Decision Task demo.
  • [225002050230] |
  • Psychological Research on the Net (Hanover College).
  • [225002060010] |Purplish Blue [225002060020] |I just completed a nifty little online color naming experiment that is being conducted by The Colour Imaging Research Group at the London College of Communication. [225002060030] |I'm a fan of using the web for linguistic experiments so I'm always looking for these kinds of things (see a related post here). [225002060040] |The experiment is being conducted in four languages: English, German, Greek, and Spanish (and they are adding more). [225002060050] |Try it for yourself here. [225002060060] |As you see from my responses above, I'm lacking in nuanced color naming skills. [225002060070] |Apparently my world is a giant purplish nightmare. [225002060080] |I had two impressions from my own responses: [225002060090] |1. I tended to want to blend names. [225002060100] |Partly this was my own lack of lexical items (who knew there was a color named catawba?), but it was equally due to my visual perception. [225002060110] |I perceived the colors as blends. [225002060120] |Now, is this because I only had a few color names and language constrained my thinking about what I was seeing? [225002060130] |Not sure and I ain't goin' there. [225002060140] |2. I tended to use a basic level term like blue when I first encountered a variation, then I was forced to come up with an adjectival variant like purplish blue when I encountered the next variation. [225002060150] |However, the original color was not necessarily what I actually think of as basic level blue when given the colors together. [225002060160] |I could imagine a second version of this experiment where all colors are given together and visual comparisons are made. [225002060170] |I believe I would have assigned the color names differently. [225002060180] |I do have a sense that there is such a thing as basic level blue, but I can't make that distinction in isolation. [225002060190] |BTW, there are thousands of color names. [225002060200] |Check out this extensive site of various color name dictionaries: Color-Name Dictionaries. [225002060210] |And here's a nice Wikipedia page on the classic work by Berlin and Kay that started a revolution in cognitive linguistics: Basic Color Terms: Their Universality and Evolution. [225002070010] |Google Linguistics 2 [225002070020] |(screen shot from WebCorp) [225002070030] |I have posted before about the use of Google as a linguistics search engine here. [225002070040] |Today, I ran across WebCorp Live, which allows a user to perform some linguistically interesting searches over the web as a corpus. [225002070050] |From their site: [225002070060] |WebCorp LSE is a fully-tailored linguistic search engine to cache and process large sections of the web. [225002070070] |WebCorp LSE offers: [225002070080] |* enhanced sentence boundary detection [225002070090] |* date identification [225002070100] |* 'boilerplate' removal [225002070110] |* collocation and other statistical analyses [225002070120] |* grammatical tagging [225002070130] |* language detection [225002070140] |* full pattern matching and wildcard search [225002070150] |In spirit, this is quite similar to Mark Davies excellent BYU Corpus resources. [225002070160] |If I get a chance to play with it some more, I might try running some of my old dissertation searches though it. [225002070170] |That should be a good test. [225002070180] |UPDATE: see my original post titled Google Linguistics which more specifically talks about using Google for research. [225002080010] |The Myth of 'Ghoti' [225002080020] |(cartoon found at Caldwell Reading) [225002080030] |In reviewing the new book Reading in the Brain by neuroscientist Stanislas Dehaene (do check out the cool Matrix-like book page), neuro-journalist Jonah Lehrer repeats the common claim that George Bernard Shaw coined the use of the spelling of fish as ghoti to demonstrate how weird English spelling is. [225002080040] |I myself repeated this same claim to many students in the past, and in a few business presentations. [225002080050] |Within linguistics, it has long been a truism. [225002080060] |Rarely did anyone think to challenge its veracity. [225002080070] |Until April 23, 2008 at 11:59 pm that is. Over a year and a half ago, Benjamin Zimmer debunked this claim as false on Language Log (see his post here). [225002080080] |Zimmer showed not only that there is no record of Shaw having used it, but also that the use of ghoti goes back at least to "1855, a year before Shaw was born." [225002080090] |It remains a fun little example, mind you, just not attributable to Shaw. [225002080100] |BTW, if you do a Google image search on ghoti, as I just did, you will discover an underground, almost cultish devotion to the word involving Jedis, bimbos, and indie rock bands, oh my. [225002090010] |On Pointiness [225002090020] |(screen grab from Stamp and Shout) [225002090030] |I've seen the Coexist bumper sticker above several times in the last week. [225002090040] |I don't know how long it's been around, but a thought struck me the last time I saw it: there's no 'x'. [225002090050] |All of the symbols used actually contain a version of the letter they are replacing, except the Star of David. [225002090060] |There's no actual X figure within that symbol. [225002090070] |Rather, it's the prevalence of pointiness that allows it to make for a suitable X replacement. [225002090080] |I wonder if there is a different cognitive process at work? [225002090090] |While we are reading the other letters, perhaps we are not actually reading the Star of David as an X, but rather engaging in some form of visual approximation (at least at first). [225002090100] |Similar issues arise with textings like l8ter. [225002090110] |Perhaps neuroscientist and reading expert Stanislas Dehaene has an answer in his new book Reading in the Brain. [225002090120] |In Jonah Leher's review of that book, he suggests a possible answer: [225002090130] |One of the most intriguing findings of this new science of reading is that the literate brain actually has two distinct pathways for reading. [225002090140] |One pathway is direct and efficient, and accounts for the vast majority of reading comprehension -- we see a group of letters, convert those letters into a word, and then directly grasp the word's meaning. [225002090150] |However, there's also a second pathway, which we use whenever we encounter a rare and obscure word that isn't in our mental dictionary. [225002090160] |As a result, we're forced to decipher the sound of the word before we can make a guess about its definition, which requires a second or two of conscious effort. [225002090170] |Perhaps this second pathway is the route needed to decipher the Star of David as X and 8 as -ate-. [225002090180] |Just wondering out loud... [225002090190] |Oh, and btw, after staring at it a moment, I see that my initial reaction was wrong. [225002090200] |There are actually four six Xs in the Star of David (thanks Q. Pheevr!), two each between each set of parallel lines. [225002090210] |It takes a bit of magic picture blurry eye technique to see them (there's a more scientific term for that, right?). [225002090220] |However, I doubt those Xs are recognized during the initial reading of the bumper sticker. [225002110010] |Thinking Words (part 1) [225002110020] |(image from make-noise.com) I’d like to present a brief lesson in contemporary linguistic research with the goal of showing that we live in a marvelous age of quick and ready research tools freely available to even the most humble of internet users. [225002110030] |Hence, a little effort goes a long way. [225002110040] |My point is that when we make claims about language usage (and by "we" I mostly mean those of us who present our claims about language to the public via the interwebz) we need not make such claims based on our intuitions and emotions; rather, we can perform a little due diligence in a way that linguistic pontificators of the past simply could not. [225002110050] |And bully for us. [225002110060] |My subject for today’s Full Liberman is this classic example of language mavenry from Prospect magazine: Words that think for us by Edward Skidelsky, lecturer in philosophy at Exeter University (HT Arts and Letters Daily). [225002110070] |In this article, Skidelsky laments the following “linguistic shift”: [225002110080] |No words are more typical of our moral culture than “inappropriate” and “unacceptable.” [225002110090] |They seem bland, gentle even, yet they carry the full force of official power. [225002110100] |When you hear them, you feel that you are being tied up with little pieces of soft string. [225002110110] |Inappropriate and unacceptable began their modern careers in the 1980s as part of the jargon of political correctness. [225002110120] |They have more or less replaced a number of older, more exact terms: coarse, tactless, vulgar, lewd. [225002110130] |They encompass most of what would formerly have been called “improper” or “indecent.”…“Inappropriate” and “unacceptable” are the catchwords of a moralism that dare not speak its name. [225002110140] |They hide all measure of righteous fury behind the mask of bureaucratic neutrality. [225002110150] |For the sake of our own humanity, we should strike them from our vocabulary. [225002110160] |UPDATE: A very lively discussion of the meaning of the words in question (something I largely ignore) has broken out on Language Log here) [225002110170] |This article makes four testable linguistic claims: [225002110180] |
  • The words inappropriate and unacceptable have increased in frequency over the last couple decades.
  • [225002110190] |
  • This frequency increase is due to replacing other words: coarse, tactless, vulgar, lewd, improper, and indecent.
  • [225002110200] |
  • These other words are “older”
  • [225002110210] |
  • These other words are “more exact”
  • [225002110220] |With a little investigation using entirely freely available online linguistics tools, we can easily fact check each of these claims. [225002110230] |In the interest of time, I'll answer the first two together. [225002110240] |First and Second -- Has the frequency of inappropriate and unacceptable increased since the 1980s? &have they replaced the other words? [225002110250] |In order to quickly get some data, I took this to mean the frequency of the first two words have increased while the frequency of the other words have decreased since the 1980s (is this is an unfair interpretation?. [225002110260] |In any case, that’s how I operationalized my methodology.). [225002110270] |Thanks to Mark Davies excellent resource, the TIME Corpus of American English (100 million words, 1923-2006, requires registration, but it's free) we can quickly get a snapshot of the frequency of each word’s usage for the last 9 decades (not bad, huh? Thanks Mark!!). [225002110280] |Caveat: raw frequency is a poor data point by itself. [225002110290] |What we really need is a way to compare apples to apples and oranges to oranges, and the problem we have is different sized corpora for each decade. [225002110300] |Fear not, Davies does this work for us. [225002110310] |His handy dandy interface allows us to report frequency per million, thus giving us comparable frequencies across different decades. [225002110320] |Using the TIME corpus, I discovered the frequency per million of each word per decade. [225002110330] |Then I entered that data into a spread sheet. [225002110340] |I used Excel 2007 to create a line graph of these frequencies. [225002110350] |Here's the relevant data: [225002110360] |And here's the graph: [225002110370] |UPDATE (2hrs after original post): original graph was confusing (same graph, just confusing labels) so I fixed it. [225002110380] |What this shows us is that both inappropriate and unacceptable do in fact show a rise in frequency (consistent with Skidelsky's claim), but starting in the 1960s, not 1980s. [225002110390] |However, unacceptable shows a more recent dramatic decline, which is inconsistent with his claim. [225002110400] |Lewd actually made a bit of a comeback in the 1990s (thank you Mr. Clinton?), but has since dropped back (it's a bit of a jumpy word, isn't it?). [225002110410] |The other words do seem to be falling off in usage, consistent with Skidelsky's claim. [225002110420] |So the picture is not quite what Skidelsky thinks it is, though he does seem to be on to something. [225002110430] |UPDATE: See myl's plot of this same data (but grouping the words as Skidelsky does) here which suggests that "'coarse', 'tactless', 'vulgar' etc. declined until WWII and then stayed about the same, perhaps with an additional decline in past decade; while 'inappropriate' and 'unacceptable' rose gradually from the 1930s to 1970 or so, and then leveled off. " The plot does suggest that we could view the two groups as having roughly inverted frequency, somewhat conforming to Skidelsky's hunch. [225002110440] |Third -- Are these other four words “older”? [225002110450] |Unfortunately, as I am no longer affiliated with a university, therefore I have no access to the OED (I’ve decided not to pay the $295 for their individual subscription. [225002110460] |Condemn me if you must). [225002110470] |If anyone would care to look those up and post them in comments, I’d be happy to update. [225002110480] |Most of these words have multiple senses and the question is, when did the most relevant sense enter usage? [225002110490] |For that, the OED is most valuable. [225002110500] |Again, you can do that work for me, or send me a check for $295. [225002110510] |However, a simple search of the Merriam Webster online dictionary gives us a quick answer: [225002110520] |unacceptable = 15th century inappropriate = 1804 coarse = 14th century tactless = circa 1847 vulgar = 14th century lewd = 14th century improper = 15th century indecent = circa 1587 [225002110530] |This data suggests these five words fall into roughly two groups: [225002110540] |A -- words that entered the language around the 19th century [225002110550] |
  • Set A = inappropriate, tactless
  • [225002110560] |B -- words that entered the language around the 15-16 centuries [225002110570] |
  • Set B = unacceptable, coarse, vulgar, lewd, improper, indecent
  • [225002110580] |This grouping does not conform to Skidelsky’s assumption that inappropriate &unacceptable fall together in a newer class and the others in an older class. [225002110590] |UPDATE: much thanks to commenter panoptical who provides the following OED dates which appear to largely confirm the Merriam Webster dates, with the notable except of lewd which dates back to Old English it seems...does have a certain Beowulf ring to it, doesn't it? [225002110600] |unacceptable: 1483 inappropriate: 1804 coarse: 1424 tactless: 1847 vulgar: 1391 lewd: c890 improper: 1531 indecent: 1563 [225002110610] |Fourth -- Are the other words "more exact"? [225002110620] |Finding a way to empirically test this is a challenge I will take up in later post (you can see Wordnet coming, can't you?). [225002110630] |It will require teasing apart senses and relationships between senses (oh my, I wish I had the OED right now...). [225002120010] |Lexical Decision Tasks [225002120020] |(screen grab from University of Essex demo) [225002120030] |Just found this online demo of a classic lexical decision experiment from the University of Essex here. [225002120040] |Some images on the page don't seem to load, but the experiment runs just fine. [225002120050] |It's a nice example of a simple psycholinguistics methodology that is commonly used in many experiments. [225002120060] |I'll let the good folks at Essex explain the task: [225002120070] |One of the key methods of investigating the processes involved in reading is the lexical decision task. [225002120080] |Any model of reading needs to explain how a particular word can be selected from many similarly featured items, (known collectively as the neighbourhood). [225002120090] |Neighbourhood size is a measure of the orthographic similarity between words (Coltheart et al., 1977). [225002120100] |If a target word is orthographically similar to many words, then the target word is said to have a large neighbourhood (e.g the word sell has many neighbours such as tell, well, bell, yell and sill). [225002120110] |A target word which is orthographically similar to few words is described as having a small neighbourhood (e.g. deny only has the neighbours defy and dent. [225002120120] |In lexical decision tasks, Andrews (1989), found that words from large neighbourhoods elicit quicker responses than words from small neighbourhoods. [225002120130] |This finding has been observed in a number of studies (e.g. Laxon et al., 1992: Scheerer, 1987). [225002120140] |The facilitatory effect of neighbourhood size suggests that presentation of a target word results in activation of all the lexical entries which are similar to the target, and this local activation somehow speeds up target access. [225002120150] |However, the precise nature of this facilitatory effect is a matter of continuing debate. [225002120160] |Now go enjoy the demo! [225002120170] |BTW, check out these other online psycholinguistics experiments here. [225002130010] |Google Words [225002130020] |TechCrunch reviews Google's newish dictionary app here (Google's dictionary has been lurking around for awhile, but now it gets its own page here). [225002130030] |I did a quick comparison of Google &Merriam Webster's entries for inappropriate and found they were remarkably different in scope. [225002130040] |Google returns a lot more data (plus they provided links to other web definitions, which seemed to mostly be Wordnet links). [225002130050] |I prefer Google's phonetic guide as it seems to be straight IPA (although their transcription of -pro- as pr'oʊ seems odd to me as it suggests a diphthong when I think they're just indicating rounding, but I never was much of a phoneticist, so no biggie). [225002130060] |I was particularly impressed to see some constructional patterns listed in Google entries (e.g., |'it' v-link ADJ to-inf| representing something like 'it is inappropriate to yell'). [225002130070] |However, Miriam Webster still wins on historical data, minimal as it is. [225002140010] |The Naked Vulnerability Of His Sentences [225002140020] |(pic from davidfosterwallace.com) [225002140030] |The grammatically whimsical author of Infinite Jest, the late David Foster Wallace, was, apparently, a prescriptivist. [225002140040] |Blogger Amy McDaniel at HTMLGIANT recently posted what she claims is a "complete text of a worksheet from his class" (HT kottke) which is, basically, a grammar test which begins with the following admonition: [225002140050] |IF NO ONE HAS YET TAUGHT YOU HOW TO AVOID OR REPAIR CLAUSES LIKE THE FOLLOWING, YOU SHOULD, IN MY OPINION, THINK SERIOUSLY ABOUT SUING SOMEBODY, PERHAPS AS CO-PLAINTIFF WITH WHOEVER’S PAID YOUR TUITION [225002140060] |Feel free to take the test yourself here, or to troll the answers folks are giving. [225002140070] |Personal fav: [225002140080] |2. I’d cringe at the naked vulnerability of his sentences left wandering around without periods and the ambiguity of his uncrossed “t”s. [225002140090] |UPDATE: it's always nice to scoop Language Log. [225002140100] |A day late and a dollar short, Chris Potts posts about the DFW grammar test here. [225002140110] |Psst, my post title is wayyyyy more cleverer. thhhpppt! [225002140120] |UPDATE 2: More LL on DFW and his prescriptivist bent here. [225002140130] |UPDATE 3: Looks like scooping LL is becoming a habit for me (pats self on back). [225002150010] |Paul Reubens on Rails [225002150020] |kottke was in a goofy mood recently and started a twitter game whereby users come up with blends of celebrity names and online apps. [225002150030] |Some of them are pretty good. [225002150040] |Personal favs: [225002150050] |
  • daniel craigslist
  • [225002150060] |
  • Gwyneth Paypaltrow
  • [225002150070] |
  • Sid Del.ico.us
  • [225002150080] |
  • Katrina and the (google) Waves (I'm a sucker for '80s retro)
  • [225002150090] |
  • Ali G(mail)
  • [225002150100] |
  • Bing crosby
  • [225002150110] |
  • Simon and Garflickr
  • [225002150120] |
  • Michael J FireFox
  • [225002150130] |
  • Ben Afflickr
  • [225002150140] |
  • Black IP's
  • [225002150150] |See more at #webappcelebs. [225002160010] |Outsourcing Fact Checking [225002160020] |Paul Spinrad guest blogs at boingboing and floats the idea of outsourcing fact checking (I'll support any proposal whatsoever that improves the fact checking process, believe me) but he adds the notion of, in essence, crowd sourcing linguistic annotation: [225002160030] |Now, what if these fact-checkers didn't just vet and correct the text? [225002160040] |While they dig into the logic and accuracy of everything, as usual, they could also use some simple application to diagram the sentences and disambiguate the semantics into a machine-friendly representation. [225002160050] |Just a little extra clicking, and they could bind all the pronouns to their antecedents, and select from a dropdown box to specify whether an instance of the string "Prince" refers to the musician Prince or to Erik Prince-- the president of XE, the company formerly known as Blackwater-- within an article that for whatever reason mentions both of them. [225002160060] |I have zero interest in diagramming sentences, mind you (because it's a dated and frankly messy pseudo-logical method of representing the syntax of a sentence), but there is a good idea at the core. [225002160070] |While it's true that the web has given us greater access to large corpora, this corpora remains unstructured text. [225002160080] |I'd like to see larger parsed corpora available (like the BNC). [225002160090] |With minimal training, editors and fact checkers could be utilized to mark up text with simple phrase boundaries and labels (this is a NP, this is a VP) as well as PP attachment ambiguity and co-reference, etc. [225002160100] |There would be messiness in this approach too, but Breck Baldwin has noted that this can be done effectively (for recall, at least) and the major issue is adjudicating the error rate of a set of crowd-sourced raters (see my previous post here and Baldwin's original post here). [225002160110] |A little sampling could adjudicate nicely. [225002170010] |Unsolved Problems in Linguistics [225002170020] |(pic from the Donders Institute) [225002170030] |Just discovered this page called Unsolved problems in linguistics. [225002170040] |It's a rather incomplete list, but a start. [225002170050] |This is the sort of topic that could easily form the core of a very interesting conference debate. [225002170060] |Linguistics remains a wide open field with competing theories and emerging methodologies, and the big questions remain dark and murky. [225002170070] |However, this page claims that the origin of language is the major unsolved problem. [225002170080] |I definitely disagree. [225002170090] |The main goal of linguistics, as I would state it, is to figure out how language works in the brain (hence, that is our major unsolved problem). [225002170100] |From that, most other questions can be answered (btw, see The Language Guy's take down of a recent report regarding the word most here). [225002170110] |As our understanding of the brain improves, so will our understanding of language. [225002170120] |I don't dispute that understanding the origin of language could be of use, but it is hardly the center of the linguistics world )I realize the Derek Bickerton might disagree). [225002170130] |NOTE: After Googleing the phrase "Unsolved Problems in Linguistics" I found a number of other sites dedicated to the same topic, including a Wikipedia page; however there is clear plagiarism/borrowing going on somewhere as there is word for word similarity between these sites; not sure who's cutting and pasting from whom. [225002170140] |But you need only go to one site to see the same stuff. [225002180010] |Which One Does Shakira Speak? [225002180020] |I thought I was going to have another Full Liberman on my hands (haven't finished the last one yet) but thankfully the article provocatively titled Do You Know Your 'Love Language'? doesn't really have much at all to do with language, and nothing to do with linguistics. [225002180030] |In this case, the word language is used as a metaphor for behaviors associated with personal relationships. [225002180040] |It's common to use language in this way, but I'd be happier with, say, the semiotics of love, or something like that. [225002190010] |The Noughties [225002190020] |The BBC is sponsoring a contest (with no prize, they had to stop doing that, hehe) to come up with the single word that best sums up the 2000s. [225002190030] |Some of their suggestions: [225002190040] |blingtweetsgreen [225002190050] |My suggestion: meh [225002200010] |ooops, forgot to carry the one [225002200020] |(image from MIT) [225002200030] |MIT has launched a well funded re-think of Artificial Intelligence principles called The Mind Machine Project, what they're calling a "do-over." [225002200040] |"MMP group members span five generations of artificial-intelligence research, Gershenfeld says. [225002200050] |Representing the first generation is Marvin Minsky, professor of media arts and sciences and computer science and engineering emeritus, who has been a leader in the field since its inception. [225002200060] |Ford Professor of Engineering Patrick Winston of the Computer Science and Artificial Intelligence Laboratory is one of the second-generation researchers, and Gershenfeld himself represents the third generation. [225002200070] |Ed Boyden, a Media Lab assistant professor and leader of the Synthetic Neurobiology Group, was a student of Gershenfeld and thus represents the fourth generation. [225002200080] |And the fifth generation includes David Dalrymple, one of the youngest students ever at MIT, where he started graduate school at the age of 14, and Peter Schmidt-Nielsen, a home-schooled prodigy who, though he never took a computer science class, at 15 is taking a leading role in developing design tools for the new software." [225002210010] |Scooping Language Log [225002210020] |Looks like I've managed to scoop Language Log authors twice in the last couple weeks. [225002210030] |
  • Ben Zimmer's Dec 4 NYT's article on the alien language in Avatar here. [225002210040] |My Nov 26 post on the same topic here.
  • [225002210050] |
  • Chris Potts' Dec 5 LL post on David Foster Wallace's grammar test here. [225002210060] |My Dec 4 post on the same topic here.
  • [225002210070] |...pats self on back. [225002220010] |Boom Boom Syntax [225002220020] |Mr. Verb has a post up about yet another NYT's article on animal language that does a poor job of reporting the facts: [225002220030] |I've been wondering about what syntax really is and how we would show it exists since reading this in the NYT this morning. [225002220040] |It reports work by Klaus Zuberbühler and others arguing that Campbell's monkeys (cute critters, see pic) in Ivory Coast not only have some sound-meaning correspondences (boom boom mean 'come here once', krak means 'leopard', etc.), but that they have what they're calling inflectional morphology, a suffix -oo, which sounds like an auditory evidential —indicating you've heard but not seen something. [225002220050] |As Mr. Verb points out, the original scholarly article is not yet available so we are unable to fact check this one...yet. [225002230010] |Monkey ThreatDown! [225002230020] |Following up on Mr. Verb's coverage of Zuberbühler's mokeys go boom boom in the banana patch story (see here), Stephen Colbert issued a threat down against the primates: [225002230030] |However, my own two-year-old war on Colbert continues. [225002230040] |I shall not rest! [225002250010] |On Linguistic Fingerprinting [225002250020] |Can an author's writing style be defined by the frequency of unique words in their writings? [225002250030] |According to physicist Sebastian Bernhardsson, the answer is yes. [225002250040] |He found a couple of interesting facts: 1) the more we write, the more we repeat words and 2) the rate of repetition (or rate of change) seems to be unique to individual authors (creating a "linguistic fingerprint"... literally his words, not mine). [225002250050] |Let me walk through his claims and findings, just a bit. [225002250060] |Bernhardsson et al. are in press with a corpus linguistics study which compared rates of unique words between short and long form writing (short stories vs. novels vs. corpora). [225002250070] |I stumbled on to this research earlier this week when a BBC News title caught my eye: Rare words 'author's fingerprint': Analyses of classic authors' works provide a way to "linguistically fingerprint" them, researchers say. [225002250080] |The idea of linguistically fingerprinting authors has been around for a while. [225002250090] |In some ways it acted as a lost leader decades ago, piquing interest in the use of corpora and statistical methods to study language and now there is even a whole journal called Literary and Linguistic Computing. [225002250100] |Plus, there is an established practice of forensic linguistics where linguistic methods are used to establish authorship of critical legal documents. [225002250110] |However, Bernhardsson makes a bold claim. [225002250120] |He claims that the process of writing (a cognitively complex process) can be described as the process of pulling chunks out of a large meta-book which shows the same statistical regularities of an authors real work (he hedges on this a bit, of course). [225002250130] |I always shiver when I run across a non-linguist jumping head first into linguistics making bold claims like this, but I also recognize that Bernhardsson and and his co-authors are pretty smart folks so I gave them the benefit of the doubt and skimmed one of their two available papers (freely available here). [225002250140] |
  • The meta book and size-dependent properties of written language. [225002250150] |Authors: Sebastian Bernhardsson, Luis Enrique Correa da Rocha, Petter Minnhagen. [225002250160] |New Journal of Physics (2009), accepted.
  • [225002250170] |First, I concentrated on the first section because the paper goes into a different direction that was not necessary for me to cover (and had lots of scary algorithms; it is Sunday and I do want to watch football, hehe). [225002250180] |What they did was count the number of words in a text, then count the number of unique words (this is a classic type/token distinction). [225002250190] |Here's what they found: [225002250200] |When the length of a text is increased, the number of different words is also increased. [225002250210] |However, the average usage of a specific word is not constant, but increases as well. [225002250220] |That is, we tend to repeat the words more when writing a longer text. [225002250230] |One might argue that this is because we have a limited vocabulary and when writing more words the probability to repeat an old word increases. [225002250240] |But, at the same time, a contradictory argument could be that the scenery and plot, described for example in a novel, are often broader in a longer text, leading to a wider use of ones vocabulary. [225002250250] |There is probably some truth in both statements but the empirical data seem to suggest that the dependence of N (types) on M (tokens) reflects a more general property of an authors language. (my emphasis and additions). [225002250260] |First, let's make sure we get what the author's did. [225002250270] |We have to use words more than once, right? [225002250280] |I've already repeated the word "we" in just the last two sentences. [225002250290] |And we repeat words like "the" and "of" all the time. [225002250300] |We have to. [225002250310] |So there are types of words, like "the" but there are also the number of times those words get repeated (tokens). [225002250320] |It's pretty straight forward to simply count the total number of words in a story, then count the total number of types of words. [225002250330] |Thus giving us a ratio. [225002250340] |For example, let's say we have a short story by Author X with 1000 words it (= tokens). [225002250350] |Then we count how many times each word is repeated and we find that there are only 250 unique words (= types), this means there is a ratio of 1000/250, or 100/25 (for comparison's sake I'm using this ratio). [225002250360] |This means that only 25% of the words are unique, which also means that, on average, a word is repeated 4 times in this story. [225002250370] |Now let's take a novel by Author X with 100,000 words (= tokens). [225002250380] |After counting repetitions we find it has 11000 unique words. [225002250390] |Our token/type ration = 100,000/11000, or 100/11. [225002250400] |This means that only 11% of the words are unique, which means, on average, a word gets repeated about 9 times. [225002250410] |That's higher than in the short story. [225002250420] |Words are being repeated more in the novel. [225002250430] |Now let's imagine we take all of Author X's written work, put it together into a single corpus and repeat the process and discover that the ratio is 100/7 (on average, a word gets repeated about 14 times). [225002250440] |UPDATE: whoa, my maths was off a bit the first time I did this. [225002250450] |That'll teach me to write a blog post while watching Indie crush Denver. [225002250460] |Sorry, eh, [225002250470] |This is what the author's found: "The curve shows a decreasing rate of adding new words which means that N grows slower than linear (α less than 1)." [225002250480] |They discovered something potentially even more interesting. there is a rate of change between these ratios is unique to each author: Here's is their graph from the article (H = Thomas Hardy, M = Herman Melville, and L = D.H. Lawrence): FIG. [225002250490] |1: The number of different words, N, as a function of the total number of words, M, for the authors Hardy, Melville and Lawrence. [225002250500] |The data represents a collection of books by each author. [225002250510] |The inset shows the exponent = lnN/ lnM as a function of M for each author. [225002250520] |Their conclusions about the meta-book and linguistic fingerprint: [225002250530] |These findings lead us towards the meta book concept : The writing of a text can be described by a process where the author pulls a piece of text out of a large mother book (the meta book) and puts it down on paper. [225002250540] |This meta book is an imaginary infinite book which gives a representation of the word frequency characteristics of everything that a certain author could ever think of writing. [225002250550] |This has nothing to do with semantics and the actual meaning of what is written, but rather to the extent of the vocabulary, the level and type of education and the personal preferences of an author. [225002250560] |The fact that people have such different backgrounds, together with the seemingly different behavior of the function N(M) for the different authors, opens up for the speculation that every person has its own and unique meta book, in which case it can be seen as a fingerprint of an author. (my emphasis) [225002250570] |They are quick to point out that this finding says nothing about the semantic content of the writings. [225002250580] |So what does it say? [225002250590] |I admit I was having a hard time seeing any conclusion about cognition or the writing process, even while finding this methodology interesting, I'm just not at all sure what it really says about the human brain and language, if anything at all. [225002250600] |The speculation that "every person has their own unique meta book" is bold. [225002250610] |Unfortunately, it is also almost entirely untestable. [225002250620] |Keep in mind that this research had zero psycholinguistic component. [225002250630] |They were just counting words on pages. [225002250640] |I'd caution against drawing any conclusion about the human language system based solely on this work. [225002250650] |(I should note that I skipped one of the most interesting findings, that the section of work doesn't matter, simply the size. meaning, they took random chunks from their corpora and found the same patterns, if I understood that part correctly.) [225002250660] |Which begs the question: why is this being published in a physics journal? [225002250670] |It's being published in The New Journal of Physics and a quick perusal of the articles from previous editions doesn't show anything remotely similar to this work (no surprise). [225002250680] |I'm a fan of corpus linguistics, but I'm also a fan of caution. [225002250690] |I'm not convinced any conclusions about the psycholinguistics of the complex writing process can be drawn from this work. [225002250700] |Not as yet. [225002250710] |But interesting, nonetheless. [225002250720] |FYI: it's easy enough to fact check some of these results using freely available tools, namely KWIC Concordance. [225002250730] |This tool will take any text and count the total tokens and number of repeats for us. [225002250740] |I did this for Melville's Bartleby, the Scrivener and Moby Dick. [225002250750] |I got text versions of each from Project Gutenberg, then ran the wordlist function within KWIC and here are my results: [225002250760] |Bartleby Total Tokens: 18111 Total Types: 3462 Type-Token Ratio: 0.191155 [225002250770] |Moby Dick Total Tokens: 221912 Total Types: 17354 Type-Token Ratio: 0.078202 [225002250780] |Bartleby = 0.191155 Moby Dick = 0.078202 [225002250790] |Yep, the short story Bartleby has more unique words than the longer Moby Dick. [225002250800] |FYI, this is a weak test simply because the tokens are not stemmed, meaning morphological variants are treated as different words. [225002250810] |I don't know if this is consistent with Bernhardsson's methodology or not. [225002260010] |Without The Hats [225002260020] |Ingrid at Language on the Move blog reports that the Student Council at Zayed University in the UAE is conducting a poll to see which languages students want to see offered, and Korean is winning (HT Research Blogging). [225002260030] |Rarely do language students get an actual say in institutional offerings and a current polling initiative by the Student Council at Zayed University is therefore the more exciting. [225002260040] |This internal poll has been running for a couple of days and I can’t take my eyes of it: for a sociolinguist this is like Melbourne Cup Day without the hats! [225002260050] |Needless to say, I had to google Melbourne Cup Day. [225002270010] |How good is your language sense? [225002270020] |The use of the web for language experiments is growing (see my Call for Participation links to the right) and the use of games to facilitate experiments makes the whole process fun for the subjects/users/participants (whatever the word du jour is for the people actually taking the experiment is). [225002270030] |The site Games With Words, run by Joshua Hartshorne, a graduate student in Psychology at Harvard University, is a great example of this. [225002270040] |They have two games running right now: [225002270050] |
  • Pronoun Sleuth
  • [225002270060] |
  • Puntastic!
  • [225002270070] |...and coming soon [225002270080] |
  • The Communication Game
  • [225002280010] |The Snowclone Cometh [225002280020] |(image from On The Scene) [225002280030] |Just bought the box set of seasons 1-4 of the sublime comedy It's Always Sunny In Philadelphia. [225002280040] |Catching up on episodes missed, I just watched the 2008's season 4 finale "The Nightman Cometh" (episode #13, 45). [225002280050] |When I saw the title as the episode began, I was struck by this thought: it might be the case that Eugene O'Neill's 1939 play "The Iceman Cometh" is the single most mimicked play title in history. [225002280060] |Can you think of a play title that has more homages than this one? [225002280070] |Then I wondered, is this a snowclone? [225002280080] |A snowclone is a linguistic construction like a cliché, with a somewhat rigid syntactic pattern, but allows substitutions, with a somewhat recognizable meaning. [225002280090] |A classic example is "X is the new Y" like "gray is the new black" or "knitting is the new yoga." [225002280100] |The Snowclone database lists two primary criteria for inclusion (these should be taken to be neither necessary nor sufficient; rather, they are a guide): [225002280110] |
  • high number of Google hits
  • [225002280120] |
  • significant variation
  • [225002280130] |So, I Googled the query "the *man cometh" and found about 3,390,000 hits. [225002280140] |No small number that (oooh, that construction might also be a snowclone...). [225002280150] |The first page of Google hits alone shows 9 variations out of 12 hits. [225002280160] |That's a lot of variation. [225002280170] |
  • The Meatman Cometh
  • [225002280180] |
  • The Tax Man Cometh
  • [225002280190] |
  • The Monkey Man Cometh
  • [225002280200] |
  • The Dark Man Cometh
  • [225002280210] |
  • The Repo Man Cometh
  • [225002280220] |
  • The Yogurt Man Cometh
  • [225002280230] |
  • The H-Man Cometh
  • [225002280240] |
  • The ad man cometh
  • [225002280250] |
  • The Con Man Cometh
  • [225002280260] |Like many snowclones, I suspect that the users of this construction rarely know of its origin. [225002280270] |I skimmed the first 10 pages of Google hits and found that almost NONE referenced the original play. [225002280280] |Might this be history's most successful snowclone? [225002280290] |As a side note, the writers of It's Always Sunny In Philadelphia chose their homage wisely. [225002280300] |Anyone who watches even a few episodes will note the clear synchronicity with this Wikipedia description of O'Neil's play: "It expresses the playwright's disillusionment with the American ideals of success and aspiration, and suggests that much of human behavior is driven by bitterness, envy and revenge." [225002280310] |Just FYI, if you haven't purchased your Hanukkah/Christmas/Kwanzaa/Festivus gift Kitten Mittons yet, I believe operators are standing by. [225002280320] |Finally, there's an elegant, comfortable mitton, for kats! [225002280330] |Meeeeoowww! [225002290010] |translated.by [225002290020] |Here's a new site that aims to crowdsource translation (HT Boing Boing): Translated by humans. [225002290030] |What's going on here? [225002290040] |It's called collaborative translation. [225002290050] |To make it simple, people help each other translate interesting foreign language texts into their native language. [225002290060] |It's mostly blog posts, magazine articles, short stories and another materials licensed for free redistribution. [225002300010] |Analogy as the Core of Cognition [225002300020] |Here is a YouTube of a February 6, 2009 Stanford University Presidential Lecture by Douglas Hofstadter, one of the most interesting cognitive science/artificial intelligence thinkers of our lifetime: [225002300030] |In this Presidential Lecture, cognitive scientist Douglas Hofstadter examines the role and contributions of analogy in cognition, using a variety of analogies to illustrate his points. [225002310010] |SEX! TORTURE! BANANA! [225002310020] |Do some words grab your attention more than others because of their semantic content? [225002310030] |If I want to get the attention of 12 screaming kids, would I be better off yelling "SEX!" or "EGGPLANT!" [225002310040] |This was the topic (kinda) of a study recently reviewed by the excellent Cognitive Daily blog: Huang, Y., Baddeley, A., &Young, A. (2008). [225002310050] |Attentional capture by emotional stimuli is modulated by semantic processing. [225002310060] |Journal of Experimental Psychology: Human Perception and Performance, 34 (2), 328-339 DOI: 10.1037/0096-1523.34.2.328. [225002310070] |The study used an interesting methodology: rapid serial visual presentation, or RSVP which involves showing participants a random stream of stimuli, flashing by one every tenth of a second. [225002310080] |Wiz bang! [225002310090] |That's a lot of flashing. [225002310100] |Let Cognitive Daily explain: [225002310110] |Typically if you're asked to spot two items in an RSVP presentation, you'll miss the second one if it occurs between about 2/10 and 4/10 of a second after the first one, but not sooner or later. [225002310120] |This phenomenon is called Attentional Blink -- a blind spot caused by the temporary distraction of seeing the first item... [225002310130] |Their streams were simply random strings of letters and digits, with two words embedded in each stream. [225002310140] |Then they asked students to look for words naming fruit as they flashed by. [225002310150] |If a fruit word appeared, it was always the second word in a stream. [225002310160] |The key was in the first word: half the time, this first word was a neutral word like bus, vest, bowl, tool, elbow, or tower, and half the time it was an emotional word like rape, grief, torture, failure or morgue. [225002310170] |So a sequence might look like this: [225002310180] |
  • JW34KA
  • [225002310190] |
  • QPLX12
  • [225002310200] |
  • MC15KW
  • [225002310210] |
  • 083FLB
  • [225002310220] |
  • TORTURE
  • [225002310230] |
  • S21L0C
  • [225002310240] |
  • DJW09S
  • [225002310250] |
  • BANANA
  • [225002310260] |
  • 3LW8Z9
  • [225002310270] |
  • XOWL01
  • [225002310280] |And so on. [225002310290] |The first word acts as a distractor: the students are looking for fruit words, but this is always a non-fruit word. [225002310300] |The question is, are emotional words more distracting? [225002310310] |The results result of the experiments was ... [225002310320] |a qualified yes. [225002310330] |When the participants were asked to pay attention to the meaning of the words (e.g., "look for words that mean fruit"), then yes, there was a distractor effect (i.e., participants were less accurate at identifying the fruit words at the relevant lag; they simply flashed by without being recognized). [225002310340] |However, when asked to perform a different task, like "look for words that are all caps," then no, there was no effect. [225002310350] |From the author's abstract: [225002310360] |Only when semantic processing of stimuli was required did emotional distractors capture more attention than neutral distractors and increase attentional blink magnitude. [225002310370] |Combining the results from 5 experiments, the authors co=]nclude that semantic processing can modulate the attentional capture effect of emotional stimuli. [225002310380] |The original paper is behind PsycNET's firewall so I don't have access to it, but of course, my curiosity is piqued. [225002310390] |I recall that the time course of visual word recognition was no simple thing. [225002310400] |This task requires participants to recognize words and perform a decision about the words at a very rapid pace. [225002310410] |Trying to tease apart what's occurring in the word recognition process during this would probably fill a dissertation, or at least a really good series of publishable papers. [225002310420] |Also, how did they determine what counted as an emotional word? [225002310430] |Was some kind of experiment performed whereby participants were shown a series of words and their blood pressure was measured? [225002310440] |EEG? fMRI? [225002310450] |Galvanic skin response? [225002310460] |What then? [225002310470] |How does one determine that one out-of-context lexical item has more emotional effect than another? [225002310480] |There may be good research on this, I don't know. [225002310490] |But it seems intuitive that context has a lot to do with our emotional response to meaning. [225002310500] |We know that torture was one of the words. [225002310510] |But what is its emotional effect in the following context: I love my kids, but watching Barney with them is torture. [225002310520] |They also used failure. [225002310530] |Really? [225002310540] |That has enough of a predictable emotional effect to be used in an experiment like this? [225002310550] |Why only negative words? [225002310560] |Why not wealthy, powerful, gorgeous? [225002310570] |Did they use the word moist? [225002310580] |They should've used moist. [225002310590] |Even if some words can be shown to have significant and predictable emotional effects all by themselves, these effects could easily be mitigated by the experimental design. [225002310600] |For this experiment, they showed 16 participants 128 sequences. [225002310610] |That's a lot of rapid flashing. [225002310620] |If the word torture is in the 120th sequence, I don't think I'm necessarily going to be processing the full range of semantic associations anymore. [225002310630] |I'm going to be doing the minimal amount of linguistic work necessary. [225002310640] |As a participant, I will, in essence, be gaming the system. [225002310650] |But then again, psycholinguistics is tough. [225002310660] |I respect anyone who explores new paradigms for studying something that is, currently, impossible to see: how language works in the brain. [225002310670] |But by the same token, it's healthy to put these methods through rigorous debate (if a blog post can be counted as rigorous debate...). [225002310680] |BTW, there are some nifty online demos for Attentional Blink methodology. [225002310690] |Enjoy: [225002310700] |
  • RIT (read instructions at bottom)
  • [225002310710] |
  • Cognitive Daily
  • [225002310720] |
  • Patrick Craston, University of Kent, demo
  • [225002310730] |PS: on a humor note. [225002310740] |My original example to contrast with "SEX!" was "TOMATO!", then I changed it to "EGGPLANT!" [225002310750] |Eggplant just seems funnier. [225002310760] |I resisted the urge to go with the obvious kumquat (when deciding which spelling to use, I Googled kumquat and discovered that it truly is the devil's fruit; it had 666,000 hits).