Many media outlets are reporting that a computer programme pretending to be a 13-year-old boy from the Ukraine passed the Turing test for artificial intelligence on Saturday.
At an event held at the Royal Society in London, the programme, called Eugene Goostman, duped 33% of judges into thinking it was a real person rather than a machine during five-minute-long text-based conversations.
Four other machines also took part in the test, along with a total of 30 judges. The judges included Lord Sharkey, who led the call for Turing’s posthumous pardon for his 1952 conviction for homosexuality; Robert Llewellyn, who played humanoid robot Kryten in Red Dwarf; and Aaron Sloman, a professor of computer science at the University of Birmingham.
The test was proposed by computer science pioneer and World War II codebreaker Alan Turing as a more concrete way to understand whether machines can “think”. Essentially, it tests a computer program’s ability to exhibit intelligence indistinguishable from (or better than) that of a human.
But some are sceptical about whether this counts as a pass.
Professor Murray Shanahan of the Department of Computing at Imperial College London told BuzzFeed:
Of course the Turing Test hasn’t been passed. I think its a great shame it has been reported that way, because it reduces the worth of serious AI research. We are still a very long way from achieving human-level AI, and it trivialises Turing’s thought experiment (which is fraught with problems anyway) to suggest otherwise.
1. The 30% pass mark did not come from Turing himself.
It’s widely quoted that a machine must fool the interrogator only 30% of the time in order to pass, but Turing himself never set a pass rate. He just said that the test would be passed if “the interrogator decide[s] wrongly as often when the game is played [between a computer and a human] as he does when the game is played between a man and a woman.”
In his 1950 paper proposing the test, “Computing machinery and intelligence,” Turing says:
I believe that in about fifty years’ time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.
But those are just his beliefs about what the pass rate would be in 2000, not criteria for passing.
2. Turing probably did not intend a real test to have a five-minute time limit.
The only mention of this in Turing’s original paper comes from the prediction (quoted above). “He doesn’t say that meeting these weak criteria would constitute success in achieving human-level AI. I imagine that, for that, he would require much longer conversations,” professor Shanahan told BuzzFeed.
A chatbot is a computer programme designed to imitate intelligent conversation. Some use natural language processing systems, others just scan for keywords and pull responses from a database. Whether they can really be called artificial intelligence is a matter of debate.
The supercomputer claim seems to have originated in a press release issued by the University of Reading.
4. This is not the first time a chatbot has been said to have passed a Turing test.
In an example from 1972, a chatbot called PARRY made by psychiatrist Kenneth Colby fooled psychiatrists 48% of the time while pretending to be a person with paranoid schizophrenia.
5. Turing specified an “average interrogator”.
A spokesman from the University of Reading says the event organisers made sure the computer programmes got a variety of judges to ensure this criteria was fulfilled, but according to Shanahan, “the small number of judges is not enough to be representative of the ‘average interrogator’”.
6. Pretending to be a 13-year-old non-native English speaker is technically within the rules of the test, but it’s not what Turing had in mind.
One of Eugene’s creators said: “Eugene was ‘born’ in 2001. Our main idea was that he can claim that he knows anything, but his age also makes it perfectly reasonable that he doesn’t know everything.”
Turing was fairly obviously talking about a general ability to impersonate a human, not the (much easier) ability to impersonate a specific person with a highly limited communication ability and very few shared cultural reference points with the judges.
7. The Turing test is a “very bad” test for artificial intelligence anyway.
“It overemphasises language at the expense of the issue of embodiment,” professor Shanahan told BuzzFeed. A lot of our intelligence concerns how we interact with the physical world, said Shanahan, and the Turing test explicitly ignores this.
The test relies on trickery rather than genuine intelligence and is also reliant on the judge. A naive interrogator would not be as good as an expert in computer science or philosophy at spotting failures.
Eugene’s creators themselves even recognise that the test isn’t really able to address the question Turing set out to answer. In a 2009 book, they said: “Turing’s Test is, actually, no more than a joke of that genius British mathematician. This game has nothing (or very little) in common with the question ‘Can machines think?’”