Last night, I gleefully skipped celebration of Valentine’s Day, in favor of sitting rapt in front of the television to watch Jeopardy! mega-winner (and longtime friend of BCC’s Police Beat Roundtable) Ken Jennings go up against IBM’s latest massively parallel Artificial Intelligence engine, Watson.
The Atlantic has dubbed their coverage of the matchup “Liveblogging the robot takeover or humanity’s finest hour,” and it is hard not to read this confrontation in such sweeping, maybe-apocalyptic terms. Especially when there’s a Mormon in the mix!
But of course, as a geek, what fascinates me is the technology. When I first heard that IBM was undertaking this project, I was awed and incredulous. Having worked on artificial intelligence, specifically natural language processing and document search and organization, I immediately thought of all the unique challenges of the Jeopardy! format. It is one thing to pull up a list of possibly relevant web pages in response to a few keywords. But Jeopardy!‘s format involves so much more—a “who’s-who” list of the things that are most difficult for computers to handle: wordplay, puns, indirect allusions, jokes, and sometimes complex formats (for example, the “before & after” question type). A question featured on the mini-documentary that aired with the show last night illustrates this kind of challenge:
Answer: This trusted friend was the first non-dairy powdered creamer.
The response (it bears stating for a Mormon audience!) is “What is Coffee Mate?” Notice the classically Jeopardyesque wordplay: “trusted friend” is supposed to clue you in to “mate” in Coffee Mate. If you simply break up this clue into a jumble of keywords, as many internet search engines will do[1], it is hard or impossible to know what the exact response should be. Is it asking for the name of a friend who famously likes non-dairy, maybe some movie or novel character with that trait? Watson’s response in the documentary, which was depicting a trial run, was “What is milk?” This is a laughable response to any human, since “non-dairy” obviously excludes milk!
A similar gaffe happened last night, when Ken buzzed in and gave the incorrect response, “What are the 1920’s?” at which point Watson jumped in for the steal and said, “What are the 1920’s?” Oops! It was an error few if any humans would make. (Interestingly, if someone did, we would blame it on nervousness–something Watson doesn’t have to deal with!) But it makes perfect sense, since Watson, from what I understand, is not wired for speech recognition (it receives the clues by text message). So it would have no way of knowing what Ken had just said. Watson also flubbed an obvious Harry Potter question, which was just painful to watch. (Correct response: “Who is Voldemort?” Come on!)
Those glaring errors aside, last night started as a complete rout: curtains for humanity. (If humans can’t dominate on the topic of Beatles music, what are we good for?!) But Watson had a hard time with a category asking for the decade of various events, giving a window of opportunity for the humans to regain some dignity. It seems Watson didn’t understand what was being asked (the decade–not names or places or other facts). Last night’s episode only covered the first round (Single Jeopardy), with Double Jeopardy and Final Jeopardy each taking place on separate nights, tonight and tomorrow. (Much of the time was taken up with the behind-the-scenes vignettes, which were interesting, though I’m not sure audiences will appreciate more of them on the second and third nights.)
For more on the technology and story behind Watson, I recommend the NOVA episode dedicated to the topic, “Smartest Machine on Earth.” Use this thread as a liveblog/open thread tonight and tomorrow.
UPDATE (SPOILER): Watson wins! Now we just need Colbert’s handgun with a flag pin to run for president as a Republican, and totally dominate Romney in the primary, to make it 0-2 in the Mormon v. Machine wars.
———————-
[1] This is only partially true. Most search engines will group words that (probably) go together into n-grams that represent a single thought, or a specially associated phrase. So, for example, “shopping cart” is really one thing, even though (in English, at least) it happens to be represented by two words. It helps a search engine to group these terms together as “shopping cart,” rather than treating them as a “bag of words” with “shopping” and “cart” just thrown in the jumble separately.
Bonus points to anyone who can come up with a numerology-type connection between the word “Jeopardy!” and the White Horse Prophecy, or any verse from Revelations. Talking about that Jeopardy logo font will get you extra bonus points, since I find it kind of creepy. (the thick upper parts of the letters look like they are ominously crushing the lower parts of the letters under their weight)
If the computer creams the two guys, can Watson be invited to write for the next BCC PBR?
I’m not sure Jeopardy is a great format for this kind of test. From talking to a former player, it seems that one of the key factors is who’s the first to hit their button once the question is finished being read, and it’s just not interesting to observe that a computer can receive and act on electrical signals faster than a human brain can process and human muscles can act on the same signals. Now, if they allowed players to interrupt the question, that would be interesting. Ken and Watson should be going head-to-head in College Bowl.
Jonathan, I have heard the same thing (that Jeopardy success depends largely on clicking strategy and timing). I agree there are huge problems pitting people who have to listen to the sound of the question (sound just travels slower than light for one thing, though they can see the words at light speed they need to pan their eyes to get it all), and that was bugging me last night as well. It seems like it is very hard to adjust that to be fair to both Watson and the human players, when you have Watson receiving the questions by text message. You could have Watson only receive the answers the instant the question reading is done, in order to even it up, but that might disadvantage the system *too* much.
Did anyone else suspect foul play when Watson seemingly omnisciently went straight for the Daily Double clue early on?! That would be great if they programmed him to hack into the Jeopardy! question system and detect the locations of Daily Doubles. I’m on to you, Watson!
There is a light, visible to the players only, that goes on when the clue is completed. So, success seems based to a great extent on how quickly one can react to that signal.
I suspect that the poor showing by Ken Jennings last night was due more to his failure to get that signalling button pressed fast enough, and not because he didn’t come up with the answer as quickly as Watson or Brad.
I’m all for the old College Bowl format–but you’d have to get rid of the written clues. Let the players jump in as soon as they think they’ve heard enough of the clue to figure out the question. In that format, I think Watson wouldn’t have a chance.
I’d try to find some numerological connection between Jeopardy and “Revelations” but my bible seems to be missing a book of that name.
I will here break Ken’s confidence and report a private comment he made to me in a chat some weeks before taping the Watson episode. (I ordinarily protect confidences, but I’m shattering this one because 1. it’s funny and 2. it makes Ken look awesome.)
Warning him of the dangers of the upcoming Watson match, I said, “Remember John Henry.” Ken shot right back, “Screw John Henry. Remember John Connor!”
gst, let’s just hope that the IBM engineers remembered to include this all-important line of code.
#6: “I’d try to find some numerological connection between Jeopardy and “Revelations” but my bible seems to be missing a book of that name.”
Think like a search engine! They often strip plural and verb endings off words before processing. :-)
I thought it was a reference to Terminator, not the White Horse Prophecy…
The final Jeopardy category was “U.S. Cities.” Watson’s guess was Toronto.
OK, then.
So what happened on today’s episode?
That kind of thing doesn’t surprise me at all, Julie. A hard and fast rule that any clue or category name that includes the keyword “U.S.” should totally preclude any response of a location outside the U.S. could backfire just as often as help. For example, if the category had been “U.S. Cities,” and a clue had been, “Seattle’s space needle used to be the world’s tallest building, now the tallest building is in this city,” (or something like that–I don’t actually know if it was ever the tallest) it would be a mistake to rule out non-U.S. cities.
The NOVA special does a great job of explaining why machine learning wins on average, but can be prone to really groan-worthy mistakes when it does fail. It starts by explaining why, over the past few decades, Artificial Intelligence researchers have been forced to abandon the ever-present temptation to hand code elaborate rules systems, and instead let computers deduce their own statistical patterns from provided “training data” (millions of pieces of raw data, often including examples explicitly marked “correct” and “incorrect,” and/or giving correct/incorrect feedback on the system’s output). A loosely analogous approach to child-rearing would be to never give your children explicitly stated rules, but instead offer praise or punishment anytime the child does something. This kind of approach seems chaotic and destined to fail, but virtually every major advance in artificial intelligence for the last while has come from this “machine learning” philosophy, not rules-based philosophy.
So, given the total dominance of the performance of this approach generally, it becomes more understandable that you would want to weather the occasionally embarrassingly bad outcome, instead of going running back to a rules-based system. And often the outcome is embarrassing in ways that only humans and their social structures would care about. There’s a great example of this in the NOVA special, starting around 38:30. Just hilarious. “People get offended when they are referred to as the wrong gender. Watson doesn’t care about stuff like that. It’s making statistical judgments….”
Cynthia, I Cor. 15:30 reads “And why stand we in jeopardy every hour?” The Greek underlying “in jeopardy” is kinduneuomen (“we are in jeopardy/danger”). If you don’t listen too closely, the word sounds a little bit like “Ken don’t know–omen!”
Kevin FTW! I just knew you’d come through!
Here’s one prediction of final Jeopardy.
OK, I’ll answer my own question. SPOILER ALERT. (Jeopardy runs at 3:30 here in Chicago, so it is long over with, but I don’t know whether it might play in the evenings in some markets.) Anyway, Watson totally dominated day 2. The humans only answered five questions between them in double jeopardy. The humans both wagered everything in final jeopardy and each answered correctly, but they were too far behind Watson, who missed but only wagered $1,000. It was Watson $35,734, Rutter $5400, Jennings $2400.
This has been amazingly interesting to watch. Even though pitting it against humans is unfair due to the buzz in factor, in the end I don’t really care that it beats the humans so much as that artificial intelligence has come so far. I always thought computers *should* be good at chess so deep blue was not shocking, just cool. But I am pretty shocked that it is as good at Jeopardy as it is.
Wow. Meat is being trounced by electronics.
I’m sitting here watching, feeling very obsolete. This is a stunning achievement. I remember programing in LISP in the wayback and imagining this day.
Great. Now we just need Colbert’s handgun with a flag pin to run for president as a Republican, and totally dominate Romney in the primary, to make it 0-2 in the Machine v. Mormon wars.
Julie #11,
But isn’t Canada just another territory of the US of A? (ducking for cover)….
Watson gets the question sent to him as a text file (or whatever) all at once. That isn’t fair. I think they should send him one word at a time like the humans have to read the question one word at a time and take 2+ seconds to have the entire question.
I’m with jks….what’s with a computer getting the information in text? Let the freaking computer also have to deal with analyzing Alex’s words as it comes out of his mouth toward our ears. Let him analyze in his mind the inflection of the voice, and have to deal with possible misinterpretation solely on the words coming out of Alex’s mouth. Let the computer process words as humans process words and then we’ll see which one has the faster processor. My guess is that the human will still have a faster processor.
To be fair, Daniel, the contestants can also read the words, not just hear them.
Watson should have to read the words. Send them in a pdf file.
exactly…he should be able to read the words and not be given to him. The thing is, Cynthia, that no matter if one can read the words, the fact that Alex is talking disrupts our processing, diverting our attention from the words we read to the words he says. Even if the words are the same. Attention is given to the spoken word and taken away from the written word. The computer does not have to worry about that.
I cannot bring myself to care that Watson beating the buzzer. That’s a funny thing to focus on because, as Cynthia points out in the OP, what is amazing is that it is happening at all picking up complex word play, hints, making guesses, putting together complex ideas. The speed is trivial to the amazing thing that it can do this at all! I’m scared. This will change everything. This is closer than we think!
Ken Jennings himself tackled many of these issues over on a Washington Post chat yesterday. Go team human!
http://live.washingtonpost.com/jeopardy-ken-jennings.html
I was just noting that contestants can read not just hear. I wasn’t saying it’s fair. In fact, I’m pretty sure in comments above I said the whole timing and question distribution issue was tricky and bound to be unfair in one way or another no matter how they arranged it.
Getting the words as a PDF, or gif, or even having to use a webcam, and then having to do OCR, is such a trivial problem from a technical perspective that to me it makes sense that they just skipped it. (The general problem of reading isn’t easy, though the technology is great and getting better all the time. But reading within a constrained, specific context—i.e. predetermined locations to look at, a single predetermined font, etc–, is trivial.)
Ah, thanks Markie. Ken makes the same point I was trying to make about how making it “fair” is sort of hopeless, given that the whole point is to compare the strengths and weaknesses of meat vs. machine:
Cynthia,
What I’m saying is that the fact that the machine cannot hear reduces one obstacle in the path toward analyzing the data given it. A human being on that show is allowed to read, but also has to process Alex’s voice. A true competition would be where the computer would also have to deal with that distraction. In any case, Ken said so himself in that WPost link that a human brain will still be faster than the fastest computer processor.
Just finished reading that link in #28. Ken is hilarious. Watson can never take that away from you, Ken!
Daniel, I’m not sure I really understand your complaint. Yes, it is much, much harder and slower for humans to input the question than Watson getting a text message.
But how would Alex’s voice distract a computer? It wouldn’t be any distraction, because even a rudimentary machine learning algorithm could quickly realize that one of its channels is useless noise, and shut it out completely. That’s how machine learning works. So, even if Watson was physically connected to a listening device, the software would have learned to ignore that device before the game even started, during the practice/learning rounds. There’s no point.
Put it this way, Daniel. Do you think it is unfair that Watson hasn’t watched and listened to hours of television as part of its training, like all humans have? Do you think it was unfair that they just had it read the entire NYT archives and IMDB, instead of having it watch CNN, and watch all those movies in raw form? Maybe instead of having it read the wikipedia page on NYC, they should have just planted it, outfitted with webcam and microphones, in the middle of the city and have it wander around absorbing everything, learning about NYC that way?
At some point, I think you just accept that Watson’s programmers aren’t going to go way out of their way to provide it with input training sources that are useless and wildly noisy, and wildly inefficient. That would be a really interesting next research step! But they were simply solving a different problem this time around. A problem that is already breathtaking in its ambition and scope.
#33 – Amen. Hilarious conversation.
Cynthia,
I don’t expect someone to put a computer through the training of life. I was just observing that a computer has an advantage over a human in the scenario presented by the format of this game, in that the computer can, as you note, simply turn off the sound input, or completely ignore sound input. A human being cannot.
Definitely read the Washington Post QA link! It’s hysterical. I’m totally putting Ken’s books on my Amazon wishlist right now.
All of this has happened before, all of this will happen again.
So say we all.
Wow, I just read the Q&A linked in no. 29. I think all those police beat roundtables prepared Ken for that kind of instant interaction. His humor was off the charts sharp there. Way to go, Ken!
I just saw that KenJen took second place. Way to go, Ken!
It’s true, I owe it all to Police Beat. As a result, I have decided to change my charity…instead of VillageReach, the prize money will all go to the BYU Rent-a-Cop Benevolent Association. Suspicious males beware.
#42 – Ken, was it really necessary to call out gst in public like that?
Bad fingers – should have been DKL.
Ray, there are some who aren’t aware of gst’s secret identity, Long Juan Silver. The people need to know.
I always suspected gst of hanging out at the Wilkinson Center too much. You’d think a grown man would stop trying to hit on the 18 year old college students, especially after PBR outed him a few times.
Ken, do you think Watson could have a guest appearance on the next PBR? Steve, could you guys work that? Or do you think Watson would respond too quickly to his own comments?
http://www.wired.com/wiredscience/2011/02/what-watson-can-learn-from-the-human-brain/
A good article on this issue