Sunday, June 19, 2022

Don't Be Fooled: Well-Trained Chatbots Aren't Minds

This week we have a story in the news about artificial intelligence. It seems that a Google engineer named  Blake Lemoine told the Washington Post that he thought a Google project called  LaMDA had reached "sentience," a term implying some degree of consciousness. The Washington Post article said, "“Most academics and AI practitioners … say the words and images generated by artificial intelligence systems such as LaMDA produce responses based on what humans have already posted on Wikipedia, Reddit, message boards, and every other corner of the internet. And that doesn’t signify that the model understands meaning.” 

Humans are rather easily fooled by chatbots, computer programs designed to imitate human speech. The first chatbot was a program called ELIZA developed in the 1960's by Joseph Weizenbaum. The program was designed to imitate a psychoanalysist.  ELIZA used simple programming tricks.  For example, if someone typed a statement with a form such as "I am bothered by X," ELIZA might ask a question such as "How long have you been bothered by X?" or "Why do you think you are bothered by X?"  

Weizenbaum experimented with ELIZA by having people type on a computer terminal, interacting with an unseen agent that could have been either a real person or a mere computer program. Weizenbaum was surprised to find that a large fraction of the people interacting with the ELIZA program thought that they were conversing with a real person.  At the time computer programming was in a very primitive state. The lesson was clear: even some rudimentary programming tricks can be sufficient to fool people into thinking that they are talking to a real person, when they are merely talking to a chatbot (a computer program designed to imitate human speech). 

Now software is far more advanced, and we have systems that make ELIZA look very primitive in comparison. One type of chatbot is the experts system chatbot, which has been well-trained in some very specific knowledge domain.  A person talking to such a chatbot may be convinced he is talking to someone who really understands the subject matter involved.  For example, if you talk to a podiatrist chatbot, the program may seem to know so much about foot health problems that you might swear you are talking to someone who really understands feet.  But whenever there is a very limited knowledge domain,  thousands of hours of computer programming can be sufficient to create an impression of understanding. 

Then there are what we may call general knowledge chatbots.  Such programs are trained on many thousands of hours of online conversations between real humans. After such training it is relatively easy for a program to pick up response rules from pattern matching. 

I will give an example. The game Elden Ring is currently very popular, largely because of its wonderful graphics. Imagine if you train your pattern-matching chatbot AI software to eavesdrop on thousands of conversations between young men, and there occurs many an exchange like this:

Human #1: So, dude, you played any good PS4 or X-box games recently?

Human #2: Yeah, I'm playing Elden Ring. Man, the graphics are out-of-this world! But it's freaking hard. You gotta earn so many of these "rune" things. 

elden ring screenshot
A visual from the "Elden Ring" game

After training on many conversations that included an exchange like this, our AI chatbot pattern-matching software picks up a rule: when you are asked about good recent PS4 or X-box games, mention Elden Ring, and mention that the game has great graphics, but is hard to play.  Through similar training, the AI chatbot pattern-matching software picks up thousands of response rules, which can change from month to month.  A person interacting with the software will be very impressed.  For example:

  • Ask the software about computer games, and it will talk about whichever game is now popular, and say the things people are saying about that game.
  • Ask the software about TV shows, and it will talk about whatever shows are the most popular, and will say the kind of things people are saying about such shows.
  • Ask the software about recent movies, and it will talk about whatever movies are the most popular, and will say the kind of things people are saying about such movies.
  • Ask the software about celebrities, and it will repeat whatever  celebrity gossip is making the rounds these days. 
  • Ask the software about its politics, and it will say whatever political sentiments are the most popular in recent days.

With such powerful pattern-matching going on, it's all too easy to be fooled into thinking you are chatting with someone with real understanding about a topic. In fact, the software has zero understanding of any of the topics it is talking about. For example, a well-designed pattern matching software trained on thousands of hours of conversations about baseball may end up sounding like someone who understands baseball, even though the software really doesn't understand the slightest thing about baseball. 

Psychology professor Gary Marcus states the following:

"Neither LaMDA nor any of its cousins (GPT-3) are remotely intelligent. All they do is match patterns, drawn from massive statistical databases of human language. The patterns might be cool, but language these systems utter doesn’t actually mean anything at all. And it sure as hell doesn’t mean that these systems are sentient. Which doesn’t mean that human beings can’t be taken in. In our book Rebooting AI, Ernie Davis and I called this human tendency to be suckered by The Gullibility Gap — a pernicious, modern version of pareidolia, the anthromorphic bias that allows humans to see Mother Theresa in an image of a cinnamon bun....To be sentient is to be aware of yourself in the world; LaMDA simply isn’t. It’s just an illusion, in the grand history of ELIZA, a 1965 piece of software that pretended to be a therapist (managing to fool some humans into thinking it was human), and Eugene Goostman, a wise-cracking 13-year-old-boy impersonating chatbot that won a scaled-down version of the Turing Test....What these systems do, no more and no less, is to put together sequences of words, but without any coherent understanding of the world behind them, like foreign language Scrabble players who use English words as point-scoring tools, without any clue about what that mean."

Imagine if someone could get silicon computers to really understand things.  Then we would very soon see computer systems that did not just sound as smart as humans, but which sounded much smarter than humans. Since you can connect together thousands of computer CPUs without any limitation such as the limitation of fitting within a skull, once truly comprehending computers had been invented, we would soon see computers speaking ten times more intelligently than humans or a hundred times more intelligently than humans. But you will never see that. All you will ever see is chatbots that  use pattern matching well enough so that they sound like humans of average intelligence, when asked average questions. And such chatbots won't even perform well when asked subtle rarely-asked questions using words that have multiple meanings. For example, if you mention that there are three types of Mustangs (a mustang horse, a Ford Mustang car, and a P-51 Mustang fighter-bomber plane), and you ask how well each type can fit inside each other, or ask whether each type could be disassembled and then successfully reassembled, or how well each type could be made without human assistance, a chatbot will "flame out and crash" like a P-51 Mustang shot down by an anti-aircraft gun. 

No comments:

Post a Comment