Winograd Schema Challenge
The Winograd Schema Challenge:
Participate in Nuance Communications’ annual competition
to successfully pass an alternative to the Turing Test
Commonsense Reasoning is keen to promote the Winograd Schema Challenge.
Nuance Communications, Inc. is sponsoring an annual competition to encourage efforts to develop programs that can solve the Winograd Schema Challenge, an alternative to the Turing Test developed by Hector Levesque , , winner of the 2013 IJCAI Award for Research Excellence. The test will be organized, administered, and evaluated by CommonsenseReasoning.org (http://www.CommonsenseReasoning.org), which is dedicated to furthering and promoting research in the field of formal commonsense reasoning.
The Turing Test is intended to serve as a test of whether a machine has achieved human-level intelligence. In one of its best-known versions , a person attempts to determine whether he or she is conversing (via text) with a human or a machine. However, it has been criticized as being inadequate. At its core, the Turing Test measures a human’s ability to judge deception: Can a machine fool a human into thinking that it too is human? Chatbots like Eugene Goostman can fool at least some judges into thinking it is human, but that likely reveals more about how easy it is to fool some humans, especially in the course of a short conversation, than the bot’s intelligence . It also suggests that the Turing Test may not be an ideal way to judge a machine’s intelligence.
The alternative: The Winograd Schema Challenge. Rather than base the test on the sort of short free-form conversation suggested by the Turing Test, the Winograd Schema Challenge (WSC) poses a set of multiple-choice questions that have a particular form. Two examples follow; the second, from which the WSC gets its name, is due to Terry Winograd.
I. The trophy would not fit in the brown suitcase because it was too big (small). What was too big (small)?
Answer 0: the trophy
Answer 1: the suitcase
II. The town councilors refused to give the demonstrators a permit because they feared (advocated) violence. Who feared (advocated) violence?
Answer 0: the town councilors
Answer 1: the angry demonstrators
The answers to the questions (in the above examples, 0 for the sentences if the bolded words are used; 1 for the sentences if the words in red are used) are expected to be obvious to a layperson.
A human who answers these questions correctly typically uses his abilities in spatial and interpersonal reasoning, his knowledge about the typical sizes of objects, and of how political demonstrations unfold, as well as other types of commonsense reasoning, to determine the correct answer. During Commonsense-2013, the Winograd Schema Challenge was therefore proposed as a promising method for tracking progress in automating commonsense reasoning.
Features of the Challenge. Winograd Schemas typically share the following features: [Details can be found in Levesque (2011) and Levesque et al. (2012).]
- Two entities or sets of entities, not necessarily people or sentient beings, are mentioned in the sentences by noun phrases.
- A pronoun or possessive adjective is used to reference one of the parties (of the right sort so it can refer to either party).
- The question involves determining the referent of the pronoun.
- There is a special word that is mentioned in the sentence and possibly the question. When replaced with an alternate word, the answer changes although the question still makes sense (e.g., in the above examples, “big” can be changed to “small”; “feared” can be changed to “advocated”.)
Administration and evaluation of the test. The test is projected to consist of at least 40 Winograd Schemas and will be administered on a yearly basis, with a non-repetitive set of test questions supplied each year. Ernest Davis has created an initial library of more than 100 sample Winograd Schemas that can be used by participants to test their systems during development, at (http://www.cs.nyu.edu/davise/papers/WS.html. This library will be augmented each year with the examples from the previous year’s test.
Further details regarding the establishing of a baseline for human performance for each year’s test, and the threshold that entries would minimally have to meet to qualify for prizes, will be given at the WSC website, http://www.commonsensereasoning.org/winograd.
Rules for entering.
Individuals or teams may enter. If approved by the organizers, a team can include an industry partner.
The winner that meets the baseline for human performance will receive a grand prize of $25,000. Details of other prizes are given at the WSC website.
The test will be administered on a yearly basis starting in 2015. The first submission deadline will be October 1, 2015. Additional details will appear at http://www.commonsensereasoning.org/winograd. The 2015 Commonsense Reasoning Symposium, to be held at the AAAI Spring Symposium at Stanford from March 23-25, 2015, will include a special session for presentations and discussions on progress and issues related to the Winograd Schema Challenge.
Visit http://www.commonsensereasoning.org/winograd or contact Leora Morgenstern at email@example.com or Charlie Ortiz at firstname.lastname@example.org.