Learning environments: kind, wicked and… fiendish?

This morning I came across a great article by Joshua Sokol – Why Artificial Intelligence Like AlphaZero Has Trouble With the Real World – that illustrates the differences between kind and wicked learning environments and (I think) highlights a third type, which I’m calling “fiendish”.

Kind learning environments

For those who came in late, kind learning environments are consistent, highly constrained situations with limited numbers of variables and limited choices available for each variable. They are characterised by clearly defined success criteria (e.g. a clear definition of victory, or a small set of “right” answers), by a high level of information availability (at their “kindest,” all the relevant information and variables are visible), and by tight and understandable feedback loops (that is, it’s readily discernible which decisions or actions lead to success or failure).

The less true these things are – the more variables, the less information that is visible, or the less understandable (or the more misleading) and slower the feedback loops, the less “kind” the environment is.

Sokol points out that human games fit this description of kind learning environments well, and lays out a rough hierarchy from checkers (a computer first defeated a world champion in 1994) to chess (1997) to go (2016). These games are kind learning environments in that they share high levels of constraint, unambiguous success criteria and perfect information availability – but are increasingly difficult as players face increasing numbers of variables and choices.

Wicked Learning Environments

At the other end of the spectrum, Robin Hogarth and David Epstein describe “wicked” learning environments:

In “wicked” learning environments often some information is hidden. Even when it isn’t, feedback may be delayed, it may be infrequent, it may be nonexistent, or it maybe partly accurate or inaccurate in many of the cases. So the most wicked learning environments will reinforce the wrong types of behaviour.

More here

Relatively “wicked” games might include other board games and computer-based real-time strategy games like Dota2 and StarCraft (at which an AI could defeat more than 99% of opponents in 2019) which are more complex, with greater numbers of variables (10^26 possibilities per move compared to an average of something like 35 in chess and a maximum of 361 in a game of go). More significantly, these games make use of the “fog of war”, in which much important information is hidden from players.

Like Epstein, Sokol points out that even these “wicked” games are hugely more simple than most of the decisions we make in real life. It’s not just the sheer number of choices available, or the incomplete availability of information, or the less explicit feedback loops – it’s a question of ends:

Despite its challenges, StarCraft II comes down to a simply enunciated goal: Eradicate your enemy. That’s something it shares with chess, Go, poker, Dota 2 and just about every other game. In games, you can win.

From an algorithm’s perspective, problems need to have an “objective function,” a goal to be sought. When AlphaZero played chess, this wasn’t so hard. A loss counted as minus one, a draw was zero, and a win was plus one. AlphaZero’s objective function was to maximize its score. The objective function of a poker bot is just as simple: Win lots of money.

Real-life situations are not so straightforward. For example, a self-driving car needs a more nuanced objective function, something akin to the kind of careful phrasing you’d use to explain a wish to a genie. For example: Promptly deliver your passenger to the correct location, obeying all laws and appropriately weighing the value of human life in dangerous and uncertain situations. How researchers craft the objective function, Domingos said, “is one of the things that distinguishes a great machine-learning researcher from an average one.”

Joshua Sokol – Why Artificial Intelligence Like AlphaZero Has Trouble With the Real World

Fiendish learning environments

The most challenging learning environments of all, which I’m choosing to call “fiendish,” are those in which we’re not even sure of what success looks like (note that even the “more nuanced” objective function Sokol describes above is a very simple human situation). We lack an objective function, or at least are often confused about it, and find it changes over time.

This uncertainty – combined with almost limitless choice, incomplete information, and obscure feedback loops for each person involved – are what make the infinite game of life (or of running an organisation) so fiendishly difficult.

See also:

David Epstein on Kind and Wicked Learning Environments

Peter Senge on the limits of learning from experience

Kind and Wicked Learning Environments and Learning, Feedback and Intuition on Judgement and Decision Making (j-dm.org)

I'd love to hear your thoughts and recommended resources...