Why does linguistic structure exist? The common-sense view of language is that it is an adaptation for communication, and all of the crazy nested structures (i.e. phrases inside of phrases) that we see in language don't seem necessary to communicate well: just look at morse code! We could define an alternative, nested morse code that had to be parsed with a context-free grammar to decode the message, but this seems not only unnecessary but wasteful. An alternative view can be found in the Chomskyans, who hold that language is not about communication but about thought, and would attribute the nested structure to the inherently hierarchical and recursive nature of thought.
In this post, I'm going to discuss an alternative resolution to this paradox that I've been playing around with. It started developing while writing a paper with Sharon Goldwater (which will appear soon in the Journal of Memory and Language). Briefly, several people have been exploring the proposal that human speech is adapted for communication that is efficient, in a way that is defined by information theory. Information theory defines certain limits on how quickly information can be transferred, and part of our motivation for the paper was the realization that, for natural language, there is a conflict between incremental speech (producing and comprehending speech one word or unit at a time) and information-theoretic efficiency.
The details of this realization were beyond the scope of that paper, but in this post I want to describe this conflict and explore some future directions that capitalize on it directly.
I just got VPN access set up with Macquarie, my new institution, and I thought I'd make a quick post on how to get this set up under Linux and why it can be nice. At both Macquarie and Edinburgh, I had a desktop machine that was more powerful than my laptop, but it was not publicly accessible. One solution is just to only use it when I'm actually in the building at the desk, but then I don't get to feel like a ninja. (well, and I also can't use the machine if I'm not at my desk for whatever reason).
I just spent a week with neuroscience and vision people, and was pointed to an interesting pair of relatively recent neuroscience papers (by Rodrigo Perin, Henry Markram, and Thomas K Berger) with interesting potentially computational-level implications that I want to think through. First, a disclaimer: I'm not a neuroscientist, and I can't personally evaluate the biological methodology. But, assuming their basic measurements and interpretations are correct, I think the work is really cool.
I just saw this article by Mike Konczal about the politically-influential 2010 result that economies with over 90% debt-to-GDP ratios perform very poorly. The article discusses one dubious point of methodology and two outright methodological errors, and discusses how a re-analysis of the same data with more solid methodology finds that debt-to-GDP ratio doesn't really seem to affect economic performance. I'm not going to talk about the substance of the papers; as I said in the first post, I don't want this blog to be political, and I'm not an economist anyway. I do want to talk about a potential source of their errors, however.
I've recently been reading a lot of papers from the more traditional Generativist literature on computational models for language acquisition, so I'm going to write a post discussing how these traditional approaches relate to the kind of natural language processing (NLP)-style, data-driven, structured probabilistic models I work with (and hinted at in the previous post). Specifically, I'm going to outline the Universal Grammar and Principles & Parameters-based approach to language acquisition. We will see that it looks pretty different from the approaches adopted for grammar induction in natural language processing, which typically involve estimating probability distributions over structural analyses given sentences (and possibly sentence meanings). I'll argue next that they are actually related in a deep way. Specifically, they both propose that children simplify their exploration of the space of possible grammars by learning in a smaller space that is related to the space of possible grammars, and the space proposed by P&P-based approaches is potentially a special case of the spaces used by data-driven techniques.
I'm just back to Scotland after two weeks in the US, and was getting on a bus into the city center. I stopped and asked the bus driver the ticket price, and heard " 2.10." I said "OK", and got my wallet out to pay. He looks at me like I'm an idiot and says forcefully "sit down!" Confused, I go sit down. After we arrived I asked if I should pay, and he looked at me like an idiot (again) and said that the bus is paid for by two other bigger bus companies as a connection, and is free to riders. My expectation of a price was so strong that I must have heard "sit down" as "2.10." Possibly also I was not quite acclimated back to the language variety this side of the pond.
I've got marking to do along with a dissertation chapter to write, so naturally it's about time for another blogpost! A+ discipline. I've been wanting to write a response to an interview with Noam Chomsky provocatively titled Where Artificial Intelligence went wrong. Go ahead and look through the interview, I'll put my reaction below the fold.
Hey all, I realized I promised pictures but the last post had zero (0) pictures. So in this post, I'll sort of recap the previous post but put in some pictures.
Computational cognitive science and artificial intelligence go through periods when different kinds of models are fashionable. Early on, it was all symbolic processing with rules and predicate logic, then people got excited about connectionism, then everyone started putting weights or probabilities on their rules. The current big fad in Cognitive Science is Bayesian modeling. I like this fad, so I'm going to talk about it.
In this post about computational modeling, I promised to talk about how a computational model could improve our understanding of cognition without asserting that people actually did what the computational model did. The basic point is that we can use probabilistic modeling to explore the shape of our data, which in turn constrains what kinds of strategies a human learner could successfully use. In this post, I'll discuss how Bayesian modeling allows us to explore the statistical structure of data and characterize what any (decent) probabilistic algorithm would try to do with that data.
I haven't forgotten about this blog! In the last post, I promised to describe basic computational modelling approaches and what they can tell us about cognition. I've drafted that post a few times, but haven't been satisfied with it. So I'm going to adopt a different strategy and spend more than one post on each computational modelling approach. I want to present background on each approach, that will be technical enough to understand what's going on but not so technical as to dominate readers' time. This involves drawing lots of pictures, which is nice but slow. Then, I will talk about a specific application of the approach. So the posts are coming! And currently half-baked (or one-fifth-baked?) on my hard drive :p