KILSBY AUSTRALIA transport policy, planning and management advice
 

Calibration and Validation

How many times have you heard traffic modellers talking about the process of tweaking their network descriptions to get a better fit to traffic counts as "model calibration", or describing a model whose ability to replicate known screenline volumes (or some similar aggregate statistic) has not been checked as "uncalibrated" ? To me this is a sign that the actual model is a bit of a black-box mystery to the practitioner, whose knowledge is restricted to manipulating what to feed the box.

The following notes are adapted from a lecture on model development and application that I gave to UNSW transport engineering students in 1999 and 2000. Please don't say "calibration" when you mean "validation" or vice versa.

The Implementation Process [for a transport model]

Between specifying a model [covered earlier in lecture] and using it [covered later in lecture] lies a very interesting and complex process, outlined here. The terms used to describe the stages in the model implementation process are not always applied consistently by modelling practitioners.

Calibration

The model specification will stipulate that something is a function of something else, and will usually suggest the form of that relationship (linear, exponential, differential etc). However it should not identify the numerical value of any coefficients used in the relationships (or their equivalents, e g heuristic distribution functions) - that is the function of the calibration process. This involves taking a data set which is considered to encapsulate the relationships specified in the model, and using appropriate analytical techniques to estimate the parameters implicit in the calibration data set.

Calibration does not necessarily have to be done with data from the "base year" adopted when using the model - in fact it does not necessarily have to be done with data from the same geographical area either. The model developer must be satisfied that whatever relationships apply in the calibration place and year also apply in the application place and year.

Most non-trivial models are actually systems rather than single models, and the calibration process must be undertaken individually for each component in the system.

The calibration process is sometimes also referred to as "model estimation".

Coding

The most tangible manifestation of the model is the computer program or set of programs which turns inputs into outputs by means of the algorithms from which the model is constructed. The production of these computer programs is not an insignificant task. Many approaches are possible, for instance :

  • use of proprietary "tool-boxes" of tested software components (e.g. EMME/2) only requiring a linking "script"
  • bespoke development by a specialist organisation, to a given specification
  • an organic in-house development process which reacts over time to perceived priorities (maintenance over time of model inputs such as network descriptions becomes a major issue under this approach)

This process must be subject to quality control procedures on the part of the organisation responsible for the coding, in the first instance to ensure that the programs do what the specification says they are supposed to do.

This quality assurance process is sometimes also called "code verification".

The production of program code to build software to implement the model is complemented by the process of coding the networks, another pre-requisite for model use. This involves the preparation of input data to describe the road and public transport networks. Road networks are largely described as a collection of connected links, each with individual characteristics. Public transport networks are more complex, because they consist of a collection of services, each with defined characteristics such as routes and frequencies (which may vary during the day), operating on an infrastructure network which includes the road network and also in some cases off-road busways, tramways or railways.

Validation

Validation is the process of making sure that your model - either the whole system, or individual components of it - is capable of producing outputs which satisfactorily describe known conditions, when given inputs relating to those same conditions.

Failure to reproduce known conditions adequately would invalidate the model, hence the term "validation".

The outputs to be checked can be anything derived from the model's calculations which is capable of real world measurement - traffic flows on road links, mode split, public transport revenues, travel speeds or times, traffic volumes crossing screenlines, and so on.

Judging what is an acceptable fit tends to be more of an art than a science. One area where greater quality control could be introduced in modelling is in the specification of validation acceptance criteria before validation.

If the validation process shows that the model or some part of it (and can you tell which ?) does not reproduce existing conditions to an acceptable degree, what do you do (assuming no time or budget constraints) ?

  • It is possible that you have specified a model form which simply does not reflect real relationships. If this is considered to be the case, you could respecify and recalibrate.

  • It is possible that the dataset used for calibration does not contain the same relationships as those applicable in your study area. If this is considered to be the case, you could change the dataset and recalibrate.

  • It is possible that the description of existing conditions which you are seeking to replicate is in fact inapplicable or wrong. Traffic counts, for instance, can vary considerably from day to day and week to week. It is not unknown

  • It is not unknown for modellers to claim that their model is right and the count data is wrong. If this is considered to be the case, you could organise fresh counts or adjust those being used in some defensible fashion, and then revalidate.

  • It is possible that your zoning system is too coarse and as a result your trip matrix is not an adequate representation of complex movement patterns, particularly for short trips. If this is considered to be the case, you could seek to distort the trip matrix you have in ways which allow it to better reproduce known traffic counts when assigned. This process is known as "matrix estimation". It raises a philosophical issue - if it is necessary to distort your base trip matrix in this way to make the model fit better, what should you do to your future trip matrices ?

  • It is possible - indeed, likely - that the most complicated input to the process, the description of the transport network, either contains errors or is incomplete. A road network will inevitably be incomplete, because no analyst ever codes up every single road, street, avenue, lane and alleyway in the city. There is no point in a network description at much finer resolution than the model zoning system, or vice versa. Tweaking the network data is perhaps the most commonly applied response to poor initial validation - changing link descriptors, adding links, changing the connections of zone centroids (ie the connections between the zone-based trip matrix and the link-based transport network) and so on.

Many traffic engineers, whose interest lies solely in the traffic assignment stage of the process, call this network-tweaking and matrix-estimating make-it-fit validation process "calibration". Those with a broader view would assert that calibration activities in the assignment stage would be restricted to such things as trying to improve the speed-flow relationships used by assignment algorithms.

Models where the network and travel data have been fine-tuned to improve the validation tend not to be very portable, i.e. they only work well for the areas where this fine-tuning has been done. Even a switch to a different assignment method for the same area, e.g. equilibrium to stochastic or vice versa, will tend to throw out the tuning.

Sanity check

When the model is satisfactorily validated, it is almost ready for application. It would, however, be a brave analyst who leapt straight into forecasting mode without trying out a few hypothetical future situations to assess how the model reacts. If, for instance, the analyst doubled public transport fares and found that the model then predicted a doubling of public transport patronage, he or she would probably have to accept that further development work was required. Conversely, a few test forecasts that give intuitively acceptable results serve to increase confidence in the model's ability to forecast future outcomes.

Certification

The final stage is that of certification. This calls for an independent review by a recognised authority who is then prepared to state that in his or her opinion the model is fit to be used for the purpose for which it was intended.

This process is sometimes also called "auditing". Models are often applied without such certification or auditing, particularly if there is no feasible alternative course if certification were to be refused.

Back to "Planning Methods" index