Queen's Doctoral students Daniel Cownden (Maths & Statistics) and Timothy Lillicrap (Neuroscience) have won the international game-theory tournament called Cultaptation and 10,000 euros for their innovative and intuitive program called "DiscountMachine".
Cultaptation (from the words culture and adaptation) is a social learning strategies tournament where the "world" in which programs would find themselves would be complex and uncertain, and players (programs) would have a chance to "learn," not only about the environment but also about the strategies used by other players. The dilemma they had to face was that learning had a cost--winning was about getting more rewards than the opposition, and time taken to learn was time lost from gathering those rewards.
With a 104 entries from 16 countries representing the fields of Biology, Physics, Management, Psychology, Anthropology, Ethology, Environmental science, Primatology, Sociology, Mathematics, Computer Science, Philosophy, Neuroscience and Engineering, Dan & Tim should be congratulated on a great job.
The official announcement of the results contains this description of "DiscountMachine."
The winning strategy was sophisticated and complex, incorporating learning procedures based on neural networks... There are a number of ways in which this strategy stands out and that may have contributed to its success. The first is that it attempts to characterise the environment in a highly comprehensive way, extracting a lot of information from its prior experiences, and is responsive to that information. The second is that this strategy explicitly makes decisions based on expected lifetime outcomes, rather than immediate payoff benefits. Finally, the strategy is noteworthy for the sophistication of the mathematics it uses to calculate expected payoffs.In all started in 1980, when Robert Axelrod, a professor of political science at the University of Michigan asked himself a perplexing but fundamental question: when should an organism cooperate with others and when should it decide to be selfish and "defect"? To find an answer, Axelrod devised and announced a computer tournament based on a standard game known as "prisoner's dilemma" and received entries from game-theorists in psychology, sociology, political science, economics and mathematics from all over the world. Some strategies were subtle and complicated but remarkably enough the simplest strategy submitted was the winner. Known as TIT-FOR-TAT, it was contributed by Anatol Rapoport of the University of Toronto. Essentially, in a series of interactions with a partner, it cooperates on the first move, and thereafter copies each time the move of its partner. Axelrod made these results public and then invited entries for a second round in which everyone knew that TIT-FOR-TAT was the strategy to beat. Sixty-one entries from six countries were received for this second round, again some quite sophisticated. Again Rapoport submitted TIT-FOR-TAT, and again it was the winner. This historical tournament has now entered the folklore of what has become a vigorous and fascinating area of scholarly inquiry.
In 2007, the European research consortium called Cultaptation decided to mount the next international tournament. A description of the tournament and results are available at: http://www.intercult.su.se/cultaptation/tournament.php
The tournament was overseen by a committee of researchers from universities around the world, and the "call" on its website began with the following invitation. "Suppose you find yourself in an unfamiliar environment where you don't know how to get food, avoid predators, or travel from A to B. Would you invest time working out what to do on your own, or observe other individuals and copy them? If you copy, who would you copy? The first individual you see? The most common behaviour? Do you always copy, or do so selectively?"
Contestants were given a year to come up with a programmed strategy and ultimately over 100 entries were submitted from all over the world, some simple and some extremely complex. The winning program was submitted by two PhD students from Queen's University in Canada--Daniel Cownden in the Department of Mathematics and Statistics, and Timothy Lillicrap in the Centre for Neuroscience. Daniel is a student in the theory of games, particularly those involving evolution and learning, and Timothy is interested in understanding the brain using the mathematical theory of optimal control. It was a perfect match.
To understand just what it was that set their entry apart from most of the others, think about what you and I do as we wander each day through the world of our lives, most of which is familiar and predictable, some of which is unexpected and perplexing. It is our response to the latter that often opens a new door and lays down a new red carpet. And it is this response that constituted the fundamental challenge of the tournament.
How in fact do we respond? Do we turn up our analytical skills and work out the optimal strategy in terms of some precise measure of the amount of uncertainty of the different states? Well, perhaps a few of us do, but for the most part, the uncertainties are so great that such calculations are beyond our power. What we use instead is something that might be called our intuition. And that's what Daniel and Timothy decided to build into their program.
What they did was to endow their "creature" with a "neural net"--a network of virtual brain cells which could be "trained" to respond in those situations in which the level of novelty and uncertainty overwhelmed the analytic powers of the program. Of course the net had to be trained and that required a teacher, so they also built what they called a "guru" who was given perfect knowledge of the world and was therefore always able to act optimally. By paying careful attention to the guru in a wide range of environments over many weeks of computer time, the neural net developed an intuition for how to get out of sticky situations, an intuition which, in the event, when it finally entered the competition, enabled it to win the day.
Phase 1 of the tournament, which took 7.3 years of computer time, was a Round-robin of pair-wise contests in which strategies took turns at attempting to invade the other. From here, the top ten entries were selected to duke it out under two different kinds of conditions, systematic and random. In both cases Daniel and Timothy's "DiscountMachine" come resoundingly on top. For example, under random conditions they won an average of 40% of matches compared to 12% for the next highest entry.Daniel and Timothy will travel to St. Andrew's in early April to receive their prize of 10,000 Euros and to take part in an international conference where the results of the competition will be analyzed to see what it might teach us about how we ourselves navigate in a complex and uncertain world.