SAMPLING:

**S**earching for the **A**pproximation **M**ethod used to
**P**erform rationa**L** inference by **I**ndividuals
a**N**d **G**roups

Over the past two decades, Bayesian models have been used to explain behaviour in domains from intuitive physics and causal learning, to perception, motor control and language. Yet people produce clearly incorrect answers in response to even the simplest questions about probabilities. How can a supposedly Bayesian brain paradoxically reason so poorly with probabilities? Perhaps brains do notrepresent or calculate probabilities at all and are, indeed, poorly adapted to do so. Instead, they could be approximating Bayesian inference through sampling: drawing samples from a distribution of likely hypotheses over time.

This promising approach has been used in existing work to explain biases in judgment. However, different algorithms have been used to explain different biases, and the existing data does not distinguish between sampling algorithms. The first aim of this project is to identify which sampling algorithm is used by the brain by collecting behavioural data on the sample generation process and comparing it to a variety of sampling algorithms from computer science and statistics. The second aim is to show how the identified sampling algorithm can systematically generate classic probabilistic reasoning errors in individuals, with the goal of upending the longstanding consensus on these effects. Finally, the third aim is to investigate how the identified sampling algorithm provides a new perspective on group decision making biases and errors in financial decision making and harness the algorithm to produce novel and effective ways for human and artificial experts to collaborate.

Since the beginning of the project, we have worked on the theoretical framework underlying the project and have explored how sampling can explain individual probabilistic reasoning errors. In particular, we have developed a model, the Bayesian sampler, of how people might make probability estimates from samples, trading off the coherence of probabilistic judgments for improved accuracy, and provides a single framework for explaining phenomena associated with diverse biases and heuristics such as conservatism and the conjunction fallacy.

**Key Publications from the Project:**

Chater, N., Zhu, J.-Q., Spicer, J., Sundh, J., León-Villagrá, P., &
Sanborn, A.N. (2020). Probabilistic biases meet the Bayesian brain.
*Current Directions in Psychological Science, 29*(5), 506-512. doi:
https://dx.doi.org/10.1177/0963721420954801

Zhu, J.-Q., Sanborn, A.N. & Chater, N. (2020). The Bayesian sampler:
generic Bayesian inference causes incoherence in human probability.
*Psychological Review, 127*(5), 719-748. doi:
https://doi.org/10.1037/rev0000190.

Sanborn, A. N., Zhu, J.-Q., Spicer, J., Sundh, J., León-Villagrá, P. &
Chater, N. (2021). Sampling as the human approximation to probabilistic
inference. In S. Muggleton & N. Chater (Eds).
*Human-Like Machine Intelligence*. Oxford: Oxford University Press.
url:
https://global.oup.com/academic/product/human-like-machine-intelligence-9780198862536

Zhu, J.-Q., Newall, P.W.S., Sundh, J., Chater, N. & Sanborn, A.N.
(2022). Clarifying the relationship between coherence and accuracy in
probability judgments. *Cognition, 223*, 1-8. doi:
https://doi.org/10.1016/j.cognition.2022.105022

Spicer, J., Zhu, J.-Q., Sanborn, A.N. & Chater, N. (2022). Perceptual
and cognitive judgments show both anchoring and repulsion.
*Psychological Science, In Press*.

Zhu, J.-Q., Leon-Villagra, P., Chater, N., & Sanborn, A.
N. (2022). Understanding the structure of cognitive noise
*. PLoS Comput Biology, 18*(8), doi:
https://doi.org/10.1371/journal.pcbi.1010312

Human cognition is fundamentally noisy. While routinely regarded as a nuisance in experimental investigation, the few studies investigating properties of cognitive noise have found surprising structure. A first line of research has shown that inter-response-time distributions are heavy-tailed. That is, response times between subsequent trials usually change only a small amount, but with occasional large changes. A second, separate, line of research has found that participants’ estimates and response times both exhibit long-range autocorrelations (i.e., 1/f noise). Thus, each judgment and response time not only depends on its immediate predecessor but also on many previous responses. These two lines of research use different tasks and have distinct theoretical explanations: models that account for heavy-tailed response times do not predict 1/f autocorrelations and vice versa. Here, we find that 1/f noise and heavy-tailed response distributions co-occur in both types of tasks. We also show that a statistical sampling algorithm, developed to deal with patchy environments, generates both heavy-tailed distributions and 1/f noise, suggesting that cognitive noise may be a functional adaptation to dealing with a complex world.

Repeated forecasts of changing targets are a key aspect of many everyday tasks, from predicting the weather to financial markets. Random walks provide a particularly simple and informative case study, as new values represent random deviations from the preceding value only, with further previous points being irrelevant. Moreover, random walks often hold simple rational solutions in which predictions should repeat the most recent state, and hence replicate the properties of the target. In previous experiments, however, we have found that human forecasters do not adhere to this standard, showing systematic deviations from the properties of a random walk such as excessive volatility and extreme movements between subsequent predictions. We suggest that such deviations reflect general statistical signatures of human cognition displayed across multiple tasks, offering a window into underlying cognitive mechanisms. Using these deviations as new criteria, we here explore several cognitive models for predicting random walks drawn from various approaches developed in the existing literature, including Bayesian, error-based learning, autoregressive and sampling mechanisms. These models are contrasted to determine which best accounts for the particular statistical features displayed by experimental participants. We find support for sampling models in both aggregate and individual fits, suggesting that these variations are attributable to the use of inherently stochastic prediction systems. We thus argue that variability in predictions is driven by computational noise in the decision making process, rather than ``late'' noise at the output stage.

One of the most robust effects in cognitive psychology is anchoring, in which judgments show a bias toward previously viewed values. However, in what is essentially the same task as used in anchoring research, a perceptual illusion demonstrates the opposite effect of repulsion. Here, we united these two literatures, testing in two experiments with adults (total N = 200) whether prior comparative decisions bias cognitive and perceptual judgments in opposing directions or whether anchoring and repulsion are two domain-general biases whose co-occurrence has so far gone undetected. We found that in both perceptual and cognitive tasks, anchoring and repulsion co-occur. Further, the direction of the bias depends on the comparison value: Distant values attract judgments, whereas nearby values repulse judgments. Because none of the leading theories for either effect account for both biases, theoretical integration is needed. As a starting point, we describe one such joint theory based on sampling models of cognition.

Bayesian approaches presuppose that following the coherence conditions of probability theory makes probabilistic judgments more accurate. But other influential theories claim accurate judgments (with high “ecological rationality”) do not need to be coherent. Empirical results support these latter theories, threatening Bayesian models of intelligence; and suggesting, moreover, that “heuristics and biases” research, which focuses on violations of coherence, is largely irrelevant. We carry out a higher-power experiment involving poker probability judgments (and a formally analogous urn task), with groups of poker novices, occasional poker players, and poker experts, finding a positive relationship between coherence and accuracy both between groups and across individuals. Both the positive relationship in our data, and past null results, are captured by a sample-based Bayesian approximation model, where a person's accuracy and coherence both increase with the number of samples drawn. Thus, we reconcile the theoretical link between accuracy and coherence with apparently negative empirical results.

Human probability judgments are variable and subject to systematic biases. Sampling-based accounts of probability judgment have successfully explained such idiosyncrasies by assuming that people remember or simulate instances of events and base their judgments on sampled frequencies. Biases have been explained either by noise corrupting sample accumulation (the Probability Theory + Noise account), or as a Bayesian adjustment to the uncertainty implicit in small samples (the Bayesian sampler). While these two accounts closely mimic one another, here we show that they can be distinguished by a novel linear regression method that relates the variance of repeated judgments to their means. First, the efficacy of the method is confirmed by model recovery, and it more accurately recovers parameters than computationally complex methods. Second, the method is applied to both existing and new probability judgment data, which confirm that judgments are based on a small number of samples that are adjusted by a prior, as predicted by the Bayesian sampler.

Estimation, choice, confidence, and response times are the primary behavioural measures in perceptual and cognitive tasks. These measures have attracted extensive modeling efforts in the cognitive sciences, but there is the lack of a unified approach to explain all measures simultaneously within one framework. We propose an Autocorrelated Bayesian Sampler (ABS), assuming that people sequentially sample from a posterior probability distribution of hypotheses on each trial of a perceptual or cognitive task. Unlike most accounts of choice, we do not assume that the system has global knowledge of the posterior distribution. Instead it uses a sophisticated sampling algorithm to make do with local knowledge, and so produces autocorrelated samples. The model assumes that each sample takes time to generate, and that samples are used by well-validated mechanisms to produce estimates, choices, and confidence judgments. This relatively simple framework clears a number of well-known empirical hurdles for models of choice, confidence, and response time. The autocorrelation be- tween samples also allows the model to predict the long-range between-trial dependence observed in both estimates and response times.

Price series in speculative markets show a common set of statistical properties, termed ‘stylised facts’. While some facts support simple efficient markets composed of homogenous rational agents (e.g., the absence of autocorrelation in price increments), others do not (e.g., heavy-tailed distributions of price changes and volatility clustering) (Campbell et al., 1997; Fama, 1970; Mandelbrot, 1966; Mandelbrot, 1963; Cont, 2001). Collectively, these facts have been explained by either more complex markets or markets of heterogeneous agents (Cont 2007; Giardina & Bouchaud, 2003; Hommes, 2006; Barberis & Thaler, 2005), with asset-market experiments validating the latter approach (Hommes 2011; Kirchler & Huber, 2009). However, it is unknown whether markets are necessary to produce these features. Here we show that within-individual variability alone is sufficient to produce many of the stylised facts. In a series of experiments, we increasingly simplified a price prediction task by first removing external information, then removing any interaction between participants. Finally, we removed any resemblance to an asset market by asking participants to simply reproduce temporal intervals. All three experiments produced the main stylised facts. The robustness of the results across tasks suggests a common cognitive-level mechanism underlies these patterns, and we identify a candidate that is a general-purpose approximation to rational behavior. We recommend a stronger focus on individual psychology in macroeconomic theory, and particularly within-individual variability. Combining these insights with existing economic mechanisms could help explain price changes in speculative markets.

Much categorization behavior can be explained by family resemblance: new items are classified by comparison with previously learned exemplars. However, categorization behavior also shows a variety of dimensional biases, where the underlying space has so-called ‘separable’ dimensions: ease of learning categories depends on how the stimuli align with the separable dimensions of the space. For example, if a set of objects of various sizes and colors can be accurately categorized using a single separable dimension (e.g., size), then category learning will be fast, while if the category is determined by both dimensions, learning will be slow. To capture these dimensional biases, almost all models of categorization supplement family resemblance with either rule-based systems or selective attention to separable dimensions. But these models do not explain how separable dimensions initially arise; they are presumed to be unexplained psychological primitives. We develop, instead, a pure family resemblance version of the Rational Model of Categorization, which we term the Rational Exclusively Family RESemblance Hierarchy (REFRESH), which does not presuppose any separable dimensions in the space of stimuli. REFRESH infers how the stimuli are clustered and uses a hierarchical prior to learn expectations about the variability of clusters across categories. We first demonstrate the dimensional alignment of natural category features and then show how through a lifetime of categorization experience REFRESH will learn prior expectations that clusters of stimuli will align with separable dimensions. REFRESH captures the key dimensional biases and also explains their stimulus-dependence and how they are learned and develop.

Many models of cognition assume that people can generate independent samples, yet people fail to do so in random generation tasks. One prominent explanation for this behavior is that people use learned schemas. Instead, we propose that deviations from randomness arise from people sampling locally rather than independently. To test these explanations, we teach people one- and two-dimensional arrangements of syllables and ask them to generate random sequences from them. Although our results reproduce characteristic features of human random generation, such as a preference for adjacent items and an avoidance of repetitions, we also find an effect of dimensionality on the patterns people produce. Furthermore, model comparisons revealed that local sampling accounted better for participants' sequences than a schema account. Finally, evaluating the importance of each models' constituents, we show that the local sampling model proposed new states based on its current trajectory, rather than an inhibition-of-return-like principle.

Author-accepted copy of a chapter in
*Human-Like Machine Intelligence* published in 2021 by
Oxford University Press.

Human probability judgments are systematically biased, in apparent tension with Bayesian models of cognition. But perhaps the brain does not represent probabilities explicitly, but approximates probabilistic calculations through a process of sampling, as used in computational probabilistic models in statistics. Naïve probability estimates can be obtained by calculating the relative frequency of an event within a sample, but these estimates tend to be extreme when the sample size is small. We propose instead that people use a generic prior to improve the accuracy of their probability estimates based on samples, and we call this model the Bayesian sampler. The Bayesian sampler trades off the coherence of probabilistic judgments for improved accuracy, and provides a single framework for explaining phenomena associated with diverse biases and heuristics such as conservatism and the conjunction fallacy. The approach turns out to provide a rational reinterpretation of “noise” in an important recent model of probability judgment, the probability theory plus noise model (Costello & Watts, 2014, 2016a, 2017; Costello & Watts, 2019; Costello, Watts, & Fisher, 2018), making equivalent average predictions for simple events, conjunctions, and disjunctions. The Bayesian sampler does, however, make distinct predictions for conditional probabilities and distributions of probability estimates. We show in 2 new experiments that this model better captures these mean judgments both qualitatively and quantitatively; which model best fits individual distributions of responses depends on the assumed size of the cognitive sample.

In Bayesian cognitive science, the mind is seen as a spectacular probabilistic-inference machine. But judgment and decision-making (JDM) researchers have spent half a century uncovering how dramatically and systematically people depart from rational norms. In this article, we outline recent research that opens up the possibility of an unexpected reconciliation. The key hypothesis is that the brain neither represents nor calculates with probabilities but approximates probabilistic calculations by drawing samples from memory or mental simulation. Sampling models diverge from perfect probabilistic calculations in ways that capture many classic JDM findings, which offers the hope of an integrated explanation of classic heuristics and biases, including availability, representativeness, and anchoring and adjustment.

Resource rationality is useful for choosing between models with the same cognitive constraints but cannot settle fundamental disagreements about what those constraints are. We argue that sampling is an especially compelling constraint, as optimizing accumulation of evidence or hypotheses minimizes the cost of time, and there are well-established models for doing so which have had tremendous success explaining human behavior

When asked to combine two pieces of evidence, one diagnostic and one non-diagnostic, people show a dilution effect: the addition of non-diagnostic evidence dilutes the overall strength of the evidence. This non-normative effect has been found in a variety of tasks and has been taken as evidence that people inappropriately combine information. In a series of five experiments, we found the dilution effect, but surprisingly it was not due to the inaccurate combination of diagnostic and non-diagnostic information. Because we have objectively correct answers for our task, we could see that participants were relatively accurate in judging diagnostic evidence combined with non-diagnostic evidence, but overestimated the strength of diagnostic evidence alone. This meant that the dilution effect – the gap between diagnostic evidence alone and diagnostic evidence combined with non-diagnostic evidence – was not caused by dilution. We hypothesized that participants were filling in “missing” evidence in a biased fashion when presented with diagnostic evidence alone. This hypothesis best explained the experimental results.

Previous research has established that numeric estimates are based not just on perceptual data but also past experience, and so may be influenced by the form of this stored information. It remains unclear, however, how such experience is represented: numerical data can be processed by either a continuous analogue number system or a discrete symbolic number system, with each predicting different generalisation effects. The present paper therefore contrasts discrete and continuous prior formats within the domain of numerical estimation using both direct comparisons of computational models of this process using these representations, as well as empirical contrasts exploiting different predicted reactions of these formats to uncertainty via Occam’s razor. Both computational and empirical results indicate that numeric estimates commonly rely on a continuous prior format, mirroring the analogue approximate number system, or ‘number sense’. This implies a general preference for the use of continuous numerical representations even where both stimuli and responses are discrete, with learners seemingly relying on innate number systems rather than the symbolic forms acquired in later life. There is however remaining uncertainty in these results regarding individual differences in the use of these systems, which we address in recommendations for future work.

The judgments that people make are not independent – initial decisions can bias later perception. This has been shown in tasks in which participants first decide whether the direction of moving dots is to one side or the other of a reference line: their subsequent estimates are biased away from this reference line. This interesting bias has been explained in past work as either a consequence of weighting sensory neurons, or as a consequence of participants adjusting their estimate to match their decision. We propose a new explanation: that people sequentially sample evidence to make their decision, and reuse these samples to make their estimate (i.e., amortised inference). Because optimal stopping leads to samples that strongly favor one or another decision alternative, the subsequent estimates are also biased away from the reference line. We introduce a sequential sampling model for posterior samples that does not assume constant thresholds, and provide evidence for our explanation in a new experiment that generalizes the perceptual bias to a new domain.

Both resources in the natural environment and concepts in a semantic space are distributed "patchily", with large gaps in between the patches. To describe people's internal and external foraging behavior, various random walk models have been proposed. In particular, internal foraging has been modeled as sampling: in order to gather relevant information for making a decision, people draw samples from a mental representation using random-walk algorithms such as Markov chain Monte Carlo (MCMC). However, two common empirical observations argue against people using simple sampling algorithms such as MCMC for internal foraging. First, the distance between samples is often best described by a Levy flight distribution: the probability of the distance between two successive locations follows a power-law on the distances. Second, humans and other animals produce long-range, slowly decaying autocorrelations characterized as 1/f-like fluctuations, instead of the 1/f^2 fluctuations produced by random walks. We propose that mental sampling is not done by simple MCMC, but is instead adapted to multimodal representations and is implemented by Metropolis-coupled Markov chain Monte Carlo (MC3), one of the first algorithms developed for sampling from multimodal distributions. MC3 involves running multiple Markov chains in parallel but with target distributions of different temperatures, and it swaps the states of the chains whenever a better location is found. Heated chains more readily traverse valleys in the probability landscape to propose moves to far-away peaks, while the colder chains make the local steps that explore the current peak or patch. We show that MC3 generates distances between successive samples that follow a Levy flight distribution and produce 1/f-like autocorrelations, providing a single mechanistic account of these two puzzling empirical phenomena of internal foraging.

Alday, Schlesewsky, and Bornkessel-Schlesewsky (Alday et al, 2017) provide a stimulating commentary on the issues discussed in our paper (Sanborn and Chater, 2016), highlighting important connections between sampling, Bayesian inference, neural networks, free energy, and basins of attraction. We trace here some relevant history of computational theories of the brain.

Bayesian models in cognitive science and artificial intelligence operate over domains such as vision, motor control and language processing by sampling from vastly complex probability distributions. Such models cannot, and typically do not need to, calculate explicit probabilities. Sampling naturally generates a variety of systematic probabilistic reasoning errors on elementary probability problems, which are observed in experiments with people. Thus, it is possible to reconcile probabilistic models of cognitive and brain function with the human struggle to master even the most elementary explicit probabilistic reasoning. Bayesian explanations have swept through cognitive science over the past two decades, from intuitive physics and causal learning, to perception, motor control and language. Yet people flounder with even the simplest probability questions. What explains this apparent paradox? How can a supposedly Bayesian brain reason so poorly with probabilities? In this paper, we propose a direct and perhaps unexpected answer: that Bayesian brains need not represent or calculate probabilities at all and are, indeed, poorly adapted to do so. Instead, the brain is a Bayesian sampler. Only with infinite samples does a Bayesian sampler conform to the laws of probability; with finite samples it systematically generates classic probabilistic reasoning errors, including the unpacking effect, base-rate neglect, and the conjunction fallacy.

This list is inevitably weighted toward our interest in internal
sampling from a probability distribution, though we have included a
few key references to the information sampling literature (tag:
**Sampling From the Environment**) and sequential sampling
literatures (tag: **Sequential Sampling**). The term sampling could
also describe a number of other approaches to human cognition, such as
using only certain aspects of an observable or remembered stimulus (Gigerenzer, Hertwig, & Pachur, 2011;
Vlaev, Stewart, Chater, & Brown, 2007), though in order to keep the list focussed we do not include them
here.

Search Articles