I had the opportunity recently to be asked to join a committee which had to select candidates for an upcoming election. In the candidate selection phase, the members of the committee use preferential voting to rank the candidates. Prior to the meeting, I did some research on preferential voting systems. I found two main classes of methods used to determine winners-those based on some form of Borda count and those based on some form of Condorcet method.
In previous years, this committee selected the order of the candidates purely on an average rank basis, which is functionally equivalent to a Borda count (although not a modified Borda count). Being informed that I was going to be the tabulator and calculator, I decided that this year I also wanted to use a Condorcet method and compare it with the average rank. I decided that my Condorcet method was going to take the following path:
- Calculate if there is a pure Condorcet method
- If there is a cycle (no Condorcet winner) use the Schulze method
- If there is no unique member of the Schwartz set (the Schulze winner) then select one of a number of tiebreakers
I will gloss over the embarrassing details of how I missed an edge case while checking for stopping conditions and ended up in an infinite while loop, and focus on how I used Rcpp to greatly speed up the process. Condorcet methods require making a pairwise comparison between every candidate. This is slow—. The Schulze method has a rather simple implementation , but it is even slower—.
Here is a sample ballot:
1 2 3 4 5 6 |
## Candidates Vote A Vote B Vote C Vote D Vote E Vote F Vote G Vote H ## 1 Albert 1 2 1 3 2 1 3 4 ## 2 Bruce 2 4 5 4 5 4 1 2 ## 3 Charles 3 1 3 1 1 2 4 5 ## 4 David 4 5 2 5 4 5 2 1 ## 5 Edward 5 3 4 2 3 3 5 3 |
Here is some simple code to calculate the average rank of the candidates:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
AvgRank <- function(BallotMatrix) { Ballots <- as.matrix(BallotMatrix[, -1], mode = "numeric") Num_Candidates <- dim(Ballots)[1] Names <- BallotMatrix[, 1] Ballots[is.na(Ballots)] <- Num_Candidates + 1 #Treat blanks as one worse than min MeanRanks <- rowMeans(Ballots) Rankings <- data.frame(Names, MeanRanks) Rankings <- Rankings[order(rank(Rankings[, 2], ties.method = "random")), ] #Ties handled through random draw Rankings <- data.frame(Rankings, seq_along(Rankings[, 1])) names(Rankings) <- c("Names", "Average Rank", "Position") return(Rankings) } |
The above ballot would result in the following ranking:
1 2 3 4 5 6 |
## Names Average Rank Position ## 1 Albert 2.125 1 ## 3 Charles 2.500 2 ## 2 Bruce 3.375 3 ## 5 Edward 3.500 4 ## 4 David 3.500 5 |
Here is some (simplified) code to calculate Condorcet and Schulze winners:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
VoteExtract <- function(BallotMatrix) { Votes <- as.matrix(BallotMatrix[, -1], mode = "numeric") Num_Candidates <- dim(Votes)[1] Votes[is.na(Votes)] <- Num_Candidates + 1 #Treat blanks as one worse than min return(Votes) } PairCount <- function(Votes) { Num_Candidates <- dim(Votes)[1] Pairwise <- matrix(nrow = Num_Candidates, ncol = Num_Candidates) for (CurCand in 1:Num_Candidates) { CandRank <- as.vector(as.matrix(Votes[CurCand, ])) Pref_Cur_Cand <- t(Votes) - CandRank for (Pairs in 1:Num_Candidates) { Pairwise[CurCand, Pairs] <- sum(Pref_Cur_Cand[, Pairs] > 0) } } return(Pairwise) } Schulze <- function(PairsMatrix) { size <- dim(PairsMatrix)[1] p <- matrix(nrow = size, ncol = size) for (i in 1:size) { for (j in 1:size) { if (i != j) { if (PairsMatrix[i, j] > PairsMatrix[j, i]) { p[i, j] <- PairsMatrix[i, j] } else { p[i, j] <- 0 } } } } for (i in 1:size) { for (j in 1:size) { if (i != j) { for (k in 1:size) { if (i != k && j != k) { p[j, k] <- max(p[j, k], min(p[j, i], p[i, k])) } } } } } diag(p) <- 0 return(p) } CondorcetRank <- function(BallotMatrix) { Num_Candidates <- dim(BallotMatrix)[1] Rankings <- matrix(nrow = Num_Candidates, ncol = 3) CurrentBallot <- BallotMatrix CurrentRank <- 1 while (CurrentRank <= Num_Candidates) { CurrentNames <- as.vector(CurrentBallot[, 1]) CurrentSize <- length(CurrentNames) CurrentVotes <- VoteExtract(CurrentBallot) Pairwise <- matrix(nrow = CurrentSize, ncol = CurrentSize) Pairwise <- PairCount(CurrentVotes) Winner <- vector(length = CurrentSize) # Check for Condorcet Winner for (i in 1:CurrentSize) { Winner[i] <- sum(Pairwise[i, ] > Pairwise[, i]) == (CurrentSize - 1) } if (sum(Winner == TRUE) == 1) { # Condorcet Winner Exists CurrentWinner <- which(Winner == TRUE) Rankings[CurrentRank, ] <- c(CurrentNames[CurrentWinner], CurrentRank, "Condorcet") } else { # Condorcet Winner does not exist, calculate Schulze beatpaths Pairwise <- Schulze(Pairwise) for (i in 1:CurrentSize) { Winner[i] <- sum(Pairwise[i, ] > Pairwise[, i]) == (CurrentSize - 1) } if (sum(Winner == TRUE) == 1) { # Schwartz set has unique member CurrentWinner <- which(Winner == TRUE) Rankings[CurrentRank, ] <- c(CurrentNames[CurrentWinner], CurrentRank, "Schulze") } } CurrentBallot <- CurrentBallot[-CurrentWinner, ] CurrentRank = CurrentRank + 1 } Rankings <- data.frame(Rankings) names(Rankings) <- c("Name", "Rank", "Method") return(Rankings) } |
The pairwise matrix would be:
1 2 3 4 5 6 |
## [,1] [,2] [,3] [,4] [,5] ## [1,] 0 6 5 6 6 ## [2,] 2 0 3 5 3 ## [3,] 3 5 0 5 7 ## [4,] 2 3 3 0 4 ## [5,] 2 5 1 4 0 |
The beatpath matrix for rank 1 (all ballots) would be:
1 2 3 4 5 6 |
## [,1] [,2] [,3] [,4] [,5] ## [1,] 0 6 5 6 6 ## [2,] 0 0 0 5 0 ## [3,] 0 5 0 5 7 ## [4,] 0 0 0 0 0 ## [5,] 0 5 0 5 0 |
and just for completeness, the full Condorcet ranking (selecting a winner, removing that entry from the ballot, and repeating on the smaller set) would be:
1 2 3 4 5 6 |
## Name Rank Method ## 1 Albert 1 Condorcet ## 2 Charles 2 Condorcet ## 3 Edward 3 Schulze ## 4 Bruce 4 Condorcet ## 5 David 5 Condorcet |
When profiling this code, 81% of the time was spent in the Schulze algorithm, another 12% was spent in the PairCount algorithm, and the remaining 7% was spent on everything else (the actual ranking had multiple steps to handle cases when there was no Schulze winner). To speed up the procedure, I ported the Schulze and PairCount functions to C++ using Rcpp. Below is the C++ code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] IntegerMatrix Schulze_C(IntegerMatrix Pairs) { int nrow = Pairs.nrow(); IntegerMatrix Schulze(nrow, nrow); for (int i = 0; i < nrow; i++) { for (int j = 0; j < nrow; j++) { if (i != j) { if (Pairs(i, j) > Pairs(j, i)) { Schulze(i, j) = Pairs(i, j); } else { Schulze(i, j) = 0; } } } } for (int i = 0; i < nrow; i++) { for (int j = 0; j < nrow; j++) { if (i != j) { for (int k = 0; k < nrow; k++) { if ((i != k) && (j != k)) { Schulze(j, k) = (std::max)(Schulze(j, k), (std::min)(Schulze(j, i), Schulze(i, k))); } } } else { if ((i = j)) { Schulze(i, j) = 0; } } } } return(Schulze); } // [[Rcpp::export]] IntegerMatrix PairCount_C(IntegerMatrix Votes) { int Num_Candidates = Votes.nrow(); int Num_Ballots = Votes.ncol(); IntegerMatrix Pairwise(Num_Candidates, Num_Candidates); for (int CurCand = 0; CurCand < Num_Candidates; CurCand++) { IntegerVector CandRank = Votes(CurCand, _); IntegerMatrix Pref_Cur_Cand(Num_Candidates, Num_Ballots); for (int i = 0; i < Num_Candidates; i++) { for (int j = 0; j < Num_Ballots; j++) { Pref_Cur_Cand(i, j) = Votes(i, j) - CandRank(j); } } for (int i = 0; i < Num_Candidates; i++) { int G0 = 0; for (int j = 0; j < Num_Ballots; j++) { if (Pref_Cur_Cand(i, j) > 0) G0 += 1; } Pairwise(CurCand, i) = G0; } } return(Pairwise); } |
First, we need to check that the functions return the same values:
1 2 3 4 |
all.equal(PairCount(VoteExtract(Ballot)), PairCount_C(VoteExtract(Ballot))) ## [1] TRUE all.equal(Schulze(PairCount(VoteExtract(Ballot))), Schulze_C(PairCount_C(VoteExtract(Ballot)))) ## [1] TRUE |
Now let's compare the functions' speed:
1 2 3 4 |
library(microbenchmark) V <- VoteExtract(Ballot) P <- PairCount(V) microbenchmark(PairCount(V), PairCount_C(V), Schulze(P), Schulze_C(P), times = 100L) |
1 2 3 4 5 6 |
## Unit: microseconds ## expr min lq median uq max neval ## PairCount(V) 184.360 189.126 191.657 195.678 267.754 100 ## PairCount_C(V) 5.064 5.958 6.851 7.447 22.637 100 ## Schulze(P) 363.955 369.912 376.762 386.292 1006.682 100 ## Schulze_C(P) 1.490 2.085 2.979 3.277 7.446 100 |
The PairCount function is over 25 times faster in C++ and the Schulze function is over 120 times as fast! Moreover, the PairCount algorithm reads more logically in C++, as the way R handles vectors and matrices, when subtracting the current rank, the resulting matrix ended up transposed with the candidates across the columns.
So with easy-to-read code that results in speed gains of multiple orders of magnitude, what's not to like?!
[…] pode ser feita neste programa super-flexível? Pois é. Vejam, portanto, esta aplicação da famosa regra de Condorcet. Não sabe do que se trata? Seja esperto: Wikipedia em língua […]
Great post, I love the Rcpp implementation, one can learn a lot from these! Do you think that a combination of ‘outer’, ‘pmax’ and ‘ifelse’ might make some of your loops obsolete and vectorizable?
Cheers, Andrej
Thanks, anspiess! I’m not sure these functions are more easily vectorizable in R than what I’ve already done. For example, in PairCount, we have to count the number of times Candidate “X” had more pairwise wins than Candidate “Y” for all combinations of X and Y, and that is not a symmetric function, so the simplest way of doing that requires stepping through each cell. I’ve already used R’s overloading of subtraction to subtract vectors from matrix rows. I am not sure how to create m sets of mxn matrices, each one reflecting the net wins of the candidate, outside of a for loop stepping through the candidate sets. As for the Schulze routine, perhaps if I understood the Floyd–Warshall algorithm better, I’d be able to vectorize it in R. However, I’m not sure that is easily done as it needs to compare the “shortest found to date” with the current edge as it steps through all edges. If you come up with any ideas, I’d love to see them!
What I immediately spot in ‘PairCount’ is that you can take t(Votes) out of the loop because it is static and define tVotes <- t(Votes) beforehand and Pref_Cur_Cand <- tVotes – CandRank in the loop so you don't transpose NumCandidates times.
Cheers,
Andrej
Great point; looks like I’ll need to do a follow up post one of these days.
Awesome! I love alternate voting systems (big fan of approval voting).
Just out of curiosity did you try using the compiler package? I’d be curious to see how the timing compares.
PairCount_cmp<-cmpfun(PairCount)
Schulze_cmp<-cmpfun(Schulze)
Thanks, Seth! I don’t know if the formatting will work in replys, but here is your answer. It’s on a different computer, so the actual timing won’t be the same, but the relative timings should still be accurate.
So while compiling the function shows a significant 40% increase in the paircount routine and a stunning 215% increase in the Schulze routine, that pales in comparison to the 2670% and 9870% speed increase moving to C++. In the actual committee meetings, we had ballots with over 25 people, so the speed gains are even more dramatic as the algorithm is O(n³)!
Very cool, I wrote an R package based on ranking data (available here: https://github.com/Errrriikkkk/RMallow), and have been meaning to port some the necessarily-iterative computations to C++. If I get to that, I may request/modify your code for pair_count.
Also, for comparison of results (my package is slower and uses a different algorithm, so not worth the speed benchmark)
datas <- t(BallotMatrix[, 2:ncol(BallotMatrix)])
dimnames(datas)[[2]] <- as.character(BallotMatrix[,1])
rnks <- ranking$new(datas)
m.1 <- Mallows(rnks, G = 1)
m.1
Mallows model with modes:
Modes:
[1] "Albert, Charles, Edward, Bruce, David"
Thetas:
[,1] [,2] [,3] [,4]
[1,] 0.4462629 0.4462629 0.4462629 0.4462629
Proportions:
1
Thanks for the great post, Avraham. We used the Schulze method practically on a competition (with an Excel spreadsheet). Reading this post in the rcpp gallery helped understanding the mechanics of the method.
I am trying to currently focus on tie-breaking (of winners) explained in section 5 of the original Schulze paper and this seems more complicated than the basics of the method.
Regarding your post, I am having two questions:
1) Have you thought of implementing “margins” as explained in http://m-schulze.9mail.de/schulze1.pdf
2) Have you thought of possible caveats of sequential dropping (your while loop, as I understand it). Basically, given N candidates, we calculate (using Condorcet and/or Schulze scheme) a winner. Then we drop the winner and compute the next winner from N-1 candidates. We proceed till there is one member left.
I have tried to find clear indications of this procedure in the 1st Schulze paper but couldn’t. Would you have any comment on this?
Thanks.
Alfred