Sample size for a Ballot Paper Scanning Assurance Plan for ACT
Legislative assembly elections
Executive summary
The Australian Bureau of Statistics (ABS) has agreed to work with Elections ACT on the design of a
sampling scheme for the auditing of scanned paper bal ots. The auditing clerical y checks a sample
of scanned bal ot papers to confirm the vote has been correctly captured through the scanning
process. The ABS has previously assisted the Australian Electoral Commission (AEC) with a sampling
scheme for the auditing of Senate ballot papers.
The ABS proposes a sample size of 1 in 90 scanned ballot papers. There are expected to be
approximately 92,000 scanned ballot papers in the 2024 ACT Legislative Assembly election, in which
case a 1 in 90 sample will deliver a sample size of 1022 ballots. This sample size is slightly larger than
the sample size used by the AEC within the ACT for Senate elections, and under the anticipated
outcome of zero errors detected in the audit sample, will allow Elections ACT to state there is 99%
confidence that the number of errors in the ful population of bal ot papers is smal er than 0.009 the
average size of an electorate quota, or 99% confidence that there are no more than 82 erroneous
bal ot papers on average within an electorate (i.e. no more than 409 errors across all 5 electorates).
This report also assesses options of a 1 in 200 sample (sample size of approximately 460), and a 1 in
60 sample (sample size of approximately 1500).
The use of a sampling rate (e.g. 1 in 90) instead of a set sample size (e.g. a sample of 1000) means
the sample will provide the same confidence limit for the number of errors even if the total
population size of scanned bal ot papers changes. That is, if zero errors are detected in the audit
sample, then the recommended sampling scheme gives 99% certainty that the number of errors is
not more than 409 even if the total number of ballot papers is as small as 50,000 or as large as
120,000. However, there are also advantages to using a set sample size of say 1000 to ensure a
sample size does not fall below what may be perceived as defensible amount. There is little
difference in practice between using a set sampling rate or a set sample size approach.
Key elements of a bal ot paper scanning audit
A paper “Assessing the accuracy of the Australian Senate count” by Michelle Blom, Philip B. Stark,
Peter J. Stuckey, Vanesse Teague and Damjan Vukcevic, May 31, 2022,
1 suggests an election audit
should encompass the fol owing characteristics.
• Provide sufficient evidence that the error rate is low enough that the results deserve to be
trusted.
• Estimate the average number of errors per ballot and the percentage of ballots that have at
least one error separately in each electorate.
• The sequence of bal ots to be audited needs to be chosen transparently and unpredictably,
e.g. through use of a random seed and an algorithm, committed to in advance, that
transforms the random seed into a sequence of bal ot papers.
The approach proposed in this report is consistent with the advice contained in the paper by Blom et
al. The first point, that the error rate is low enough, is difficult to establish objectively in advance of
1 https://arxiv.org/pdf/2205.14634.pdf
an election as is recognized in the paper. The suggestion is to establish (or to estimate) how many
votes would need to change to change the outcome of an election, but this can only be done after
the election is held. In the even that an election outcome is decided with a margin of very few votes,
there is the option to conduct additional auditing to increase the total audit sample size and reduce
the confidence limit for the number of errors to below the margin that could change the result. The
sampling scheme options discussed in this report provide a high level of confidence that the error
rate within an electorate is of the order of 0.01 quotas, assuming that a very smal number of errors
are detected in the audit sample.
On the second point, using a set sampling rate (or alternatively allocating a set sample
proportionately across electorates in line with the total number of scanned ballots within the
electorate) makes it very easy to estimate the average number of errors per scanned ballot (the
sample average is used) and the percentage of scanned bal ot papers with at least one error (the
percentage within the sample is used).
On the third point, Elections ACT have existing processes to generate a random seed and to use this
to select a sample of ballot papers, this meets the principle expressed in the paper by Blom et al.
The paper discusses a method of geometric skipping designed to ensure that every possible
combination of papers with a set sampling probability can be realized, but while suggested this
approach is presented as just one possible method. The method would add additional complexity to
the existing Elections ACT method of selecting a set sample size with a batch of bal ot papers for
negligible benefit. The method of selecting 5 bal ot papers for audit with a random start from each
batch that is randomly selected for audit is statistically sound, widely used and highly defensible.
Use of a set sampling rate
An appropriate sampling scheme can be designed to either use a set sampling rate (e.g. select 1 in
every 90 bal ot papers) or a set sampling size (e.g. a sample of 1000 bal ot papers). There is little
difference in terms of performance, as if the total number of bal ot papers is known then a sampling
rate can be set to give a specified sample size, or vica versa. For example, if there are 92,000 ballot
papers in total, then a sampling rate of 1 in 92 can be used to give a sample size of 1000 or a sample
size of 1022 can be used to give a sampling rate of 1 in 90. There are advantages and disadvantages
of the different approaches for the operations of the auditing.
Advantages of a fixed sample size:
• Sample size is fixed and so the resourcing required for the auditing is known.
• A fixed sample size will give a fixed accuracy for the estimates of error rate, e.g. for a fixed
sample size of 1000 there is 95% confidence that the error rate is not more than 0.30% when
0 errors are detected, regardless of the number of bal ot papers in the population.
Advantages of a fixed sampling rate:
• A fixed sampling rate will give a fixed upper limit for the number of ballot papers with a
scanning error, e.g. for a fixed sampling rate of 1 in 90, there is 95% confidence that not
more than 267 bal ot papers have a scanning error when 0 errors are detected in the
sample, for population sizes of 76,000 bal ot papers and higher. There is only a smal
difference for small population sizes, e.g. if there are only 20,000 ballot papers, then a 1 in
90 sample wil give a 95% confidence limit of 265 errors.
• A fixed sampling rate scheme helps to ensure al bal ot papers have the same chance of
being selected for audit – there is an even rate applied across all electorates, locations and
al days in which the auditing in undertaken.
If the total number of votes to be scanned can be accurately estimated, then the impact of these
advantages is reduced compared to situations where the total number of ballot papers varies by a
greater amount, that is, there is little practical difference between the approaches.
In this report options for sample size are presented as sampling rates, with the corresponding
sample size based on a population of 92,000 paper bal ots. The use of a sampling rate is proposed
by the paper by Blom et al (section 2.2).
Options for sample size
Sample of 1 in 200 scanned bal ot papers (sample size of approximately 460 papers)
This option is suggested as the minimum viable audit sample size and can be sufficient to say with
confidence that the quality of the scanning is comparable to the quality of scanning of ACT Senate
papers for Federal elections. The option can be implemented by selecting 1 in every 20 batches of
bal ot papers, and then selecting 1 in every 10 bal ot papers from within the batch (i.e. 5 papers from
each batch).
The AEC samples 1 in 300 senate papers in the ACT to quality assure the bal ot paper scanning
process, however there is no option for electronic voting, so there is a greater total population of
papers in a federal senate election. For the 2022 senate election, 290,308 votes were cast in the
ACT, a 1 in 300 sample then gives a total sample size of 969 ballot papers selected for audit. The
sample size under this option is less than half the sample size used for senate elections.
The AEC senate process was designed to provide 95% confidence that the senate scanning error rate
is lower than 0.81% (or not more than 2351 errors in the population of 290,308), and 99%
confidence that the error rate is lower than 0.95% (not more than 2757).
If this sampling option is used for the 2024 ACT Legislative assembly election, then if:
•
0 errors are detected in sample, there is 95% confidence the error rate is lower than 0.65%
and 99% confidence it is lower than 1.00%. In a population of 92,000 scanned voted this
means 95% confidence there are not more than 595 erroneous bal ot papers and 99%
confidence there are not more than 913. In this scenario the 99% confidence limit for the
error rate is slightly higher than the confidence interval for senate scanning, although the
number of bal ot papers that could have errors is substantially smaller, due to the smaller
population of scanned bal ot papers. (If a sample of 1 in 190 is selected instead of 1 in 200
then the 99% limit for the error rate will match the AEC value of 0.95%).
•
1 error is detected in sample, there is 95% confidence the error rate is lower than 1.03% and
99% confidence it is lower than 1.44%. In a population of 92,000 scanned voted this means
95% confidence there are not more than 942 erroneous bal ot papers and 99% confidence
there are not more than 1315. In this scenario the confidence limits for the error rate
exceed the AEC limits although the confidence limits for the number of erroneous bal ot
papers are lower than for the AEC.
•
2 errors are detected in sample, there is 95% confidence the error rate is lower than 1.36%
and 99% confidence it is lower than 1.82%. In a population of 92,000 scanned voted this
means 95% confidence there are not more than 1250 erroneous bal ot papers and 99%
confidence there are not more than 1663. In this scenario the confidence limits for the error
rate exceed the AEC limits although the confidence limits for the number of erroneous ballot
papers are lower than for the AEC.
As no errors in scanning were detected in three previous Legislative Assembly elections, with a
combined audit sample size of 9000, it is unlikely that errors will be detected in the 2024 election
but the possibility cannot be ruled out. If this option is selected and an error is detected in the audit
sample, a second sample can be selected to reduce the confidence limit for the error rate.
Recommended: Sample of 1 in 90 scanned ballot papers (sample size of approximately 1000
papers)
This is the recommended option and wil give Elections ACT a high degree of confidence that the
audit sampling plan is highly defensible and withstands external scrutiny. The option can be
implemented by selecting 1 in every 9 batches of bal ot papers, and then selecting 1 in every 10
bal ot papers from within the batch (i.e. 5 papers from each batch).
The total sample size under this option is 1022 if there are 92,000 paper bal ots. The sample size will
be over 1000 so long as at least 90,000 ballot papers are scanned in total. The audit sample size in
the ACT for the 2022 senate election was 969, and so under this option a larger sample size will be
used for the 2024 Legislative Assembly election so long as at least 87,210 paper ballots are
submitted.
If this sampling option is used for the 2024 ACT Legislative assembly election, then if:
•
0 errors are detected in sample, there is 95% confidence the error rate is lower than 0.29%
and 99% confidence it is lower than 0.45%. In a population of 92,000 scanned voted this
means 95% confidence there are not more than 267 erroneous bal ot papers and 99%
confidence there are not more than 409. As there are five electorates, this means the 95%
limit is 53 erroneous papers and 99% limit of 82 erroneous papers in an electorate (for an
electorate with a total of 18,400 scanned papers). The 99% limit of 82 errors is
approximately 0.009 quotas.
•
1 error is detected in sample, there is 95% confidence the error rate is lower than 0.46% and
99% confidence it is lower than 0.65%. In a population of 92,000 scanned voted this means
95% confidence there are not more than 423 erroneous bal ot papers and 99% confidence
there are not more than 591. The 99% confidence limit equates to 118 errors within an
electorate where 18,400 bal ot paper are scanned, or approximately 0.013 quotas.
•
2 errors are detected in sample, there is 95% confidence the error rate is lower than 0.62%
and 99% confidence it is lower than 0.82%. In a population of 92,000 scanned voted this
means 95% confidence there are not more than 562 erroneous bal ot papers and 99%
confidence there are not more than 748. The 99% confidence limit equates to 150 errors
within an electorate where 18,400 bal ot paper are scanned, or approximately 0.017 quotas.
This option is recommended as it delivers an audit sample size that is similar to but slightly higher
than the audit sample size used for ACT senate elections, and in the most likely outcome of no
detected errors provides 99% confidence that the total number of errors is less than 0.01 the size of
an average quota, as well as 99% confidence that the error rate is lower than the 0.45% error rate
observed in the 2022 Australian Senate election
2 (for the whole of Australia). Even in the unlikely
event of 2 errors being detected in the audit sample, there is still 99% confidence that the error rate
sits below the error rate limits used for the AEC Senate audit design within the ACT.
2 In the 2022 Australian Senate election, the audit of scanned bal ot papers estimated an error rate of 0.45%
(Senate ballot paper sampling outcomes statement, Tom Rogers (Australian Electoral Commissioner), 7 July
2002)
It is also noted that the
Commonwealth Electoral Act 19183 specifies that at least 1000 ballot papers
be checked for an Australian Senate election within a senate electorate (i.e. a state or territory).
While the act does not apply to ACT Legislative Assembly elections, the choice of an expected
sample size of 1000 avoids potential criticism of auditing fewer ballots than is considered necessary
for a Senate election.
Sample of 1 in 60 scanned ballot papers (sample size of approximately 1500 papers)
This option provides a higher level of confidence that election outcomes are not affected by
scanning errors. The total sample size under this option of 1/60th the number of scanned bal ots, or
a sample size of 1533 if there are 92,000 paper bal ots. The option can be implemented by selecting
1 in every 6 batches of bal ot papers, and then selecting 1 in every 10 bal ot papers from within the
batch (i.e. 5 papers from each batch).
If this sampling option is used for the 2024 ACT Legislative assembly election, then if:
•
0 errors are detected in sample, there is 95% confidence the error rate is lower than 0.20%
and 99% confidence it is lower than 0.30%. In a population of 92,000 scanned voted this
means 95% confidence there are not more than 177 erroneous bal ot papers and 99%
confidence there are not more than 271. As there are five electorates, this means the 95%
limit is 35 erroneous papers and 99% limit of 54 erroneous papers in an electorate (for an
electorate with a total of 18,400 scanned papers). The 99% limit of 54 errors is
approximately 0.006 quotas.
•
1 error is detected in sample, there is 95% confidence the error rate is lower than 0.31% and
99% confidence it is lower than 0.43%. In a population of 92,000 scanned voted this means
95% confidence there are not more than 281 erroneous bal ot papers and 99% confidence
there are not more than 393. The 99% confidence limit equates to 79 errors within an
electorate where 18,400 bal ot paper are scanned, or approximately 0.009 quotas.
•
2 errors are detected in sample, there is 95% confidence the error rate is lower than 0.41%
and 99% confidence it is lower than 0.55%. In a population of 92,000 scanned voted this
means 95% confidence there are not more than 374 erroneous bal ot papers and 99%
confidence there are not more than 498. The 99% confidence limit equates to 100 errors
within an electorate where 18,400 bal ot paper are scanned, or approximately 0.011 quotas.
Under this option, even if 1 error is detected during the audit, there is still 99% confidence that the
error rate is lower than the 0.45% error rate detected in the 2022 Australian senate election.
Calculation of a confidence interval for the error rate
A confidence interval can be calculated using the well-known Clopper-Pearson method
4, often
referred to as the ‘exact’ method of calculating binomial confidence intervals. This method is
discussed as an appropriate method to use in the paper by Blom et al. To calculate a one-sided 95%
confidence interval for the error rate when x errors have been found in an audit sample of size n,
determine the value of p for which Pr(X ≤ x) = 0.05, where X is distributed as a Bin(n,p) distribution (a
binomial distribution). The confidence interval is then (0,p); i.e. there is 95% confidence that the
error rate is p or smaller.
3 Commonwealth Electoral Act 1918 Section 273AC (3) (b): Commonwealth Electoral Act 1918
(legislation.gov.au)
4 Clopper, C.J. and Pearson, E.S.
The use of confidence or fiducial limits il ustrated in the case of the binomial
Biometrika Vol 26 Issue 4 December 1934 pages 404-413: THE USE OF CONFIDENCE OR FIDUCIAL LIMITS
ILLUSTRATED IN THE CASE OF THE BINOMIAL | Biometrika | Oxford Academic (oup.com)
This means that if the true error rate in the ful population of scanned bal ot papers is p, then there
is only a 5% chance of observing as few as x errors in the audit sample. If the error rate is any value
greater than p then the chance of observing as few as x errors in the audit sample is smaller than 5%.
The probability of observing x or fewer errors for a Bin(n,p) distribution can be calculated using the
excel function BINOM.DIST.RANGE or by other statistical software or spreadsheet packages.
Appendix A: Performance of sampling schemes
The table below shows the performance of a 1 in 200, a 1 in 90 and a 1 in 60 sampling scheme for a
population of 92,000 scanned bal ots, in the case of 0, 1 or 2 detected errors within the audit
sample. It gives both 95% and 99% confidence limits for the error rate and the total number of
erroneous bal ots in the population of 92,000.
For example, if a 1 in 90 sampling scheme is used and there are no errors detected in the audit
sample, then there is 95% confidence that the error rate in the population does not exceed 0.29%,
i.e. at most 0.29% of all scanned ballot papers have an error (267 erroneous ballots within the
92,000 population). If the actual error rate was greater than 0.29%, then the probability of selecting
a sample of 1,022 and finding no errors in that sample is smal er than 5%.
Similarly, if a 1 in 90 sampling scheme is used and no errors are detected in the audit sample, there
is 99% confidence that the error rate does not exceed 0.45%, i.e. there are at most 409 erroneous
bal ot papers in the population of 92,000.
Number of errors detected in sample
0
1
2
Population size
95% upper
99% upper
95% upper
99% upper
95% upper
99% upper
92,000 scanned
bound for …
bound for …
bound for …
bound for …
bound for …
bound for …
bal ots
Sample
error
no. of
error
no. of
error
no. of
error
no. of
error
no. of
error
no. of
size
rate errors
rate errors
rate errors
rate errors
rate errors
rate errors
1 in
200
460 0.65%
595 1.00%
913 1.03%
942 1.44%
1,315 1.36%
1,250 1.82%
1,663
Sampling 1 in
rate
90
1,022 0.29%
267 0.45%
409 0.46%
423 0.65%
591 0.62%
562 0.82%
748
1 in
60
1,533 0.20%
177 0.30%
271 0.31%
281 0.43%
393 0.41%
374 0.55%
498