The Winograd Schema Challenge was originally suggested by Hector Levesque in 2011; developed in subsequent years by Ernest Davis, Leora Morgenstern, Charles Ortiz, and Gary Marcus; and sponsored by Nuance Communications. The competition was run at IJCAI 2016 and offered at AAAI 2018. The announcement of the 2016 challenge can be found below. A write-up of the challenge appeared in AI Magazine, 2017.
Nuance is no longer sponsoring the competition, and the $25,000 prize mentioned below is no longer offered. The challenge lives on in the many research groups, at Microsoft Research, Facebook, and the Allen Institute, among other places, that are currently (as of 2019) working on aspects of the problem.
Commonsense Reasoning is keen to promote the Winograd Schema Challenge and Nuance Communications' competition to successfully pass an alternative to the Turing Test.
Background: The Turing Test is intended to serve as a test of whether a machine has achieved human-level intelligence. In one of its best-known versions , a person attempts to determine whether he or she is conversing (via text) with a human or a machine. However, it has been criticized as being inadequate. At its core, the Turing Test measures a human’s ability to judge deception: Can a machine fool a human into thinking that it too is human? Chatbots like Eugene Goostman can fool at least some judges into thinking it is human, but that likely reveals more about how easy it is to fool some humans, especially in the course of a short conversation, than the bot’s intelligence. It also suggests that the Turing Test may not be an ideal way to judge a machine’s intelligence.
The answers to the questions (in the above examples, 0 for the sentences if the bolded words are used; 1 for the sentences if the italicized words are used) are expected to be obvious to a layperson. A human who answers the first questions correctly would likely use his knowledge about the typical size of objects and his ability to do spatial reasoning to solve the first example; he would likely use his knowledge about how political demonstrations unfold and his ability to do interpersonal reasoning to solve the second example. Due to the wide variety of commonsense knowledge and commonsense reasoning that would presumably be used by humans to solve Winograd Schema problems, it was proposed during Commonsense-2013 that the Winograd Schema Challenge could be a promising method for tracking progress in automating commonsense reasoning. The Winograd Schema Challenge received further attention after Eugene Goostman fooled 30% of judges into thinking it was human in 2014, sparking interest in developing and furthering alternatives to the Turing Test, and was one of several Turing Test alternatives proposed at the AAAI 2015 Workshop Beyond the Turing Test.
Features of the Challenge:
Winograd Schemas typically share the following features: (Details can be found in Levesque (2011) and Levesque et al. (2012).)
Ernest Davis has created a collection of more than 140 sample Winograd Schemas that can be used by participants to test their systems during development, at the WSC Collection. Leora Morgenstern has collected more than 60 sample Pronoun Disambiguation Problems, a more general form of Winograd Schemas that is explained below, and in (Morgenstern, Davis, and Ortiz, AI Magazine 2015), at the PDP Collection. These collections will be augmented over time with examples from previous tests.
Further details are below.
Babar wonders how he can get new clothing. Luckily, a very rich old man who has always been fond of little elephants understands right away that he is longing for a fine suit. As he likes to make people happy, he gives him his wallet.
he is longing for a fine suit
- Babar
- old man
There may be multiple problems that use the same text but ask about different pronouns, as with problems 2-5 in the example file.
The two rounds differ in the source of the texts. In the first round, the texts are "Pronoun Disambiguation Problems"; that is, they are drawn from actual texts, possibly with some editing. In the second round, each text is one half of a Winograd schema. A detailed discussion and justification is given in (Morgenstern, Davis, and Ortiz, 2016).
Only contestants who achieve at least 90% in the first round will be allowed to compete in the second round. If no contestants qualify, then the second round will not be given.
For each problem, there will be four lines in the output, separated by
line breaks:
Line 1: Problem number, and echo of text of problem.
Line 2: Echo of the excerpt for the problem.
Line 3: "Answer" problemNumber.answerNumber answer
Line 4: Blank, as a separator between problems
For example, if the problem above is problem 2 in the input, then the corresponding four lines of the output file would be as follows:
Babar wonders how he can get new clothing. Luckily, a very rich old man who has always been fond of little elephants understands right away that he is longing for a fine suit. As he likes to make people happy, he gives him his wallet. he is longing for a fine suit Answer 2.A Babar
At the end of the file, there should be a comma-separated list of all the answers in order. E.g.
At the end of the file, there should be a comma-separated list of all the answers in order. E.g.
A, A, B, B, A, A, B
The submission will be graded on the final list of answers. The remaining material is there for human inspection.
No problems should be omitted. Any problem that is omitted will be marked as wrong, so it always pays to guess.
If, in the judgement of the contest committee, the description of the program in this paper is entirely inadequate or implausible as an explanation of the success of the program, then the team involved will be asked to demonstrate in detail that the behavior of the program is in fact that described in the paper.
The aim of this contest is to advance science; all results obtained must be reproducible, and communicable to the public. As such, any winning entry is encouraged to furnish to the organizers of the Winograd Schema Challenge Competition its source code and executable code, and to use open source databases or knowledge bases or make its databases and knowledge structures available for independent verification of results.
At IJCAI-2016 three smaller prizes, of $1000, $750, and $500 will be awarded to the top three programs that score over 65% on the first round of the contest.
For questions or comments about commonsensereasoning.org please email leora@cs.nyu.edu.
Website design by Benjamin Johnston, based on the Fluid 960 Grid System by Stephen Bau and Nathan Smith. Crowd photo by James Cridland.