Theoretically, a calibrated date should be a continuous probability density function (PDF); however, in practice a date is represented as a discrete vector of probabilities corresponding to each calendar year, and is therefore a probability mass function (PMF). This discretization (of both a proposed model probability distribution and a calibrated date probability distribution) provides the advantage that numerical methods can be used to calculate likelihoods.
Hypothetically, if a calibrated date was available with such precision that it could be attributed with certainty to just a single calendar year the model likelihood would trivially be the model probability at that date. Similarly, if the data comprised just two such Indonesian Cupid point estimates (at calendar time points A and B), the model’s relative likelihood would trivially be the model probability at date A multiplied by the model probability at date B.
Therefore, the probability of a single calibrated date given the model can be calculated as the model probability at year A, or the model probability at year B etc., for all possible years, weighted by how probable the calibrated 14 C date is at each of those years. This can be calculated using the scalar product between model probabilities and calibrated date probabilities, and gives the probability of a single calibrated date under the model. This is repeated for every calibrated date, and the overall product gives the relative likelihood of the model, given the whole dataset.
This approach assumes each date is a fair and random sample, but where many dates are available from a single site-phase, it is sensible to first bin dates into phases. This is an important step in modelling population dynamics to adjust for the data ascertainment bias of some archaeological finds having more dates by virtue of a larger research interest/budget. These phase-SPDs are then combined and normalized to create a final SPD. This procedure ensures phases with multiple dates are weighted to contribute the same overall pm as a phase with a single date. The probability of each phase-SPD can then be calculated in exactly the same way as the probability of a single calibrated date.
6. Avoiding edge effects
It is common for a research question to be targeted at a specific time range that spans only part of the overall calibrated date range of the 14 C dataset being used. This is of no consequence if merely generating an SPD, as regions outside the range of interest can be ignored or truncated. Indeed, simulation approaches benefit from considering a slightly wider range by pushing any potential edge effects outside the target range. By contrast, any modelling approach that calculates likelihoods will be influenced by the entire dataset provided, including dates that fall well outside the modelled date range. These external dates must be excluded, since they can have a substantial and mischievous influence on the parameter search.
However, a single calibrated 14 C date is not a point estimate, but rather a complex multimodal probability distribution, representing the probability of each possible year being the true date
This influence can be attributed to the interesting behaviour of the tails of a Gaussian distribution, from which a calibrated date is derived. A calibrated date has a non-zero probability at all calendar dates, and as a consequence, a mostly external date still has a tiny tail within the model’s date boundaries. However, despite the absolute probability values of this tail being extremely small, surprisingly the relative value increases hugely towards the model boundary (approximately exponentially). As a result, given a dataset where all/most dates are external to the date range of interest, the most likely model shape will have massive upticks at the boundaries. Overall, the likelihood of such a model will be extremely small, but it will be the best explanation given so much data are outside the date range.