Map of Measures to Improve Reliability in Technology Evaluation

Research areas


The project studies measures that are used in statistical machine learning, artificial intelligence, and document analysis to evaluate how well methods perform in terms of processing correctness. Regardless of the cruciality of the right measure to reliable results, connections between the gamut of available measures are unclear, and confusingly, the same or almost the same measure has many names. Cataloguing, relating, and discussing the pros and cons of the measures not only promotes the use of the right measures but also improves understanding of technological similarities and differences. For example, in binary classification for HIV coding, the most common measure of accuracy gives misleading results. Because 99.989 per cent of Australians are HIV-negative, a coder that always assigns negative is 99.989 per cent accurate but far from perfect. Even if refining evaluation to true-positives and true-negatives, the probability of a patient having HIV given a positive code is less than 50 per cent if the coder assigns 99.9 per cent of true-positives and 99.99 per cent of true-negatives correctly. Achieving superior sensitivity is trivial by coding everything as positives at the expense of deteriorating precision.


Catalog of measures with their mathematical relations specified Discussion of the pros and cons of the measures with empirical results to illustrate theoretical conclusions


Solid programming skills, preferably using Matlab, Java, or Python Success in the ANU course(s) of Artificial Intelligence and/or Document Analysis and/or Introduction to Statistical Machine Learning

Updated:  1 June 2019/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing