In part 1 of this series, we examined how a number of developments in the history of AI led to a shift from the symbolic programming paradigm pursued since the nascent field’s beginnings in the middle of the twentieth century toward the probabilistic/stochastic approaches that now dominate the field.

In part 2, we’ll take a closer look at some of these issues and problems that came up, particularly the ones that were later addressed with some of the approaches that are currently still in use. As we’ll see, for all of the problems that they do solve, stochastic approaches bring unique challenges of their own. Contemporary attempts to provide solutions to those challenges will be our focus in part 3.

Machine Learning, Connectionism, and Artificial Neural Networks

While symbolic approaches gained a certain degree of financial success in the 1980s with expert systems like R1/XCON, alternative approaches began to emerge as personal computers became more and more cost-effective, rendering specialized hardware for running knowledge representation systems (like Lisp machines) obsolete for many business solutions. In that regard, it was not the symbolic nature itself that failed to provide solutions per se, but it was both that its implementation on such specific machines was financially expensive, and that the opportunity costs of failing to shift hardware solutions quickly enough to stay ahead of the widescale adoption of personal computers would have simply been too high.

But the symbolic approach had indeed failed to achieve many of the more grandiose goals set for itself by its proponents over the years, and by the end of the 1980s the field had indeed weathered more than just one “winter.” The constantly changing nature of expert-level knowledge domains resisted the strict, certain framework of the symbolic systems, which could not be made to efficiently respond to uncertainty or new categories of data without updating the system from the top down (in the form of modifying or adding axioms). What was clearly needed was a system that could efficiently update itself as it took on new data that diverged from its initial state of conditions.

In 1982, physicist John Hopfield developed an early form of an artificial neural network, while around the same time, Geoffrey Hinton and David Rumelhart popularized a version of backpropagation using multi-layer nets — originally conceived by Paul Werbos in his 1974 dissertation. Both approaches were based loosely on a then-new theory of cognitive science (known as “connectionism”) where input layers (“perceptions”) can be responsive to new data without explicit symbolic processing. What these approaches brought to the table was a sort of strategic agnosticism about the objects that the systems were purportedly about. Just as we do with the real world (à la Rodney Brooks’ claim that “the world is the best model”), the “inputs” of representation in neural networks could be just as messy and shifting as the phenomena they modeled, and, as such, various forms of early artificial neural networks began to be successfully implemented in a number of domains, including natural language processing, handwritten digit recognition (such as LeNet) and robotics.

But these were not wholly new approaches, despite their timely resurgence after the second AI winter that followed the interest in expert systems in the 1980s. The idea of a non-linear approach to modeling intelligence was not simply an immediate off-shoot of the interest in theories of connectionism, nor was it a response to any philosophical criticisms of symbolic approaches (like that of Hubert Dreyfus contra Newell and Simon). The “perceptron” had been theorized as early as 1958, but confusion around the differences between the capabilities of single-layer and multi-layer version of the model — thanks to an often mis-cited line in Minsky and Papert’s 1969 book Perceptrons — meant that the idea would not be explored in-depth again until the late 1980s during the heyday of the connectionist-inspired probabilistic approaches. With the stagnation of public and private interest in symbolic-oriented research, the success of probabilistic approaches, and the increasingly-affordable hardware (and especially later, with GPUs), the idea would be extended to its more mature forms of muli-layer perceptrons using backpropagation, such as with 1987’s NETtalk.

The current research paradigm in artificial intelligence represents the most recent steps in this journey, culminating with the attention-based neural networks that have dominated the landscape since their arrival in 2017. While this novel form of architecture has been the source of unprecedented successes in the field of artificial intelligence at large, it still has some well-publicized limitations. Because probabilistic approaches bottom out with outputs in the form of Bayesian-inspired probabilities rather than Boolean certainties, current models face empirical limitations with tasks in domains that yield deductive certainty (like counting) and could otherwise be trivially and efficiently performed by traditional symbolic programming.

While at first glance this might seem like an insurmountable barrier for the probabilistic models altogether, there are a number of contemporary approaches that seek to address this by essentially parsing out certain problems as being deterministic rather than probabilistic, and then using them as moments of circumscribed certainties — albeit still within a probabilistic, transformer-based, deep learning model (like ChatGPT). These hybridized and supplementary models (such as “neuro-symbolic” approaches, “tool use” and/or “constrained decoding”) and how they handle the sorts of difficulties mentioned above will be the focus of the third and final installment of this series.

How Did We Get Here? From Symbolic to Stochastic (Part 2)

Machine Learning, Connectionism, and Artificial Neural Networks