by Deirdre Connolly, Conrado Gouvea
We optimized our implementation of FROST by upwards of 50% over the trivial implementation, without changing the protocol and therefore maintaining its existing security guarantees. We use a known trick to do so: multi-scalar multiplication, which is exactly designed to give this kind of performance speedup.
In the FROST threshold signing protocol, we perform many elliptic curve operations for key generation, signing, and signature verification. Because FROST is a Schnorr threshold signing scheme, the signature that is produced is compatible with single-party Schnorr signature verification. As such, there is no additional computation overhead to verifying signatures produced by FROST vs single-party.
However, when performing FROST signing, signers must perform a linear number of group element multiplications, proportionate to the number of signers, as shown below (see the FROST specification for details).
If implemented trivially, the computational overhead of FROST signing grows computationally more expensive as more parties are involved. When the number of parties is small, which is typically the case for threshold signing, (i.e. 2-out-of-3 or 3-out-of-5) this extra computational overhead is marginal. However, we would like to reduce the number of expensive elliptic curve operations wherever possible.
In the context of elliptic curves, a scalar multiplication is written as kP where k is an integer mod a prime p and P an elliptic curve point, an abelian group element; points can be added or subtracted. With only those operations it’s possible to compute kP. The naïve approach would be to simply add k copies of P together with k-1 additions, but there are more efficient approaches that take a number of additions in the order of log(k). These go through the bits of the scalar, doubling the point for every bit and adding the point P if the bit is 1. For example, 5P can be computed with 3 additions:
2P = P + P
4P = 2P + 2P
5P = 4P + P
In order to speed up FROST signing, we must do more efficient point multiplications with respect to multiple variable base points, which is called multi-scalar multiplication. It consists of computing the sum aP + bQ + … + dS for some number of points and scalars. It can be naïvely computed by doing each scalar multiplication and then summing them all up. Thankfully, we have several algorithms at our disposal that can do better.
Algorithms to Optimize Multi-scalar Multiplication
Most of the multi-scalar multiplication algorithms rely on the observation that you do some operations on all of the points at the same time. For example, you can compute 3P + 3Q with only 3 additions:
P + Q
2(P + Q)
2(P + Q) + (P + Q)
The NAF (non-adjacent form) is a way to encode the scalar with digits -1, 0, and 1 (instead of the regular bits 0 and 1). This is useful because point subtraction is as easy as a point addition, and the NAF has fewer non-zero elements, which speed up the point multiplication algorithm (recall that there is a point addition for every non-zero digit). The wNAF is a windowed version of the NAF (e.g. a 2NAF can have digits -3, -1, 0, 1, and 3). We have been using an interleaved width-w non-adjacent form in our scalar implementation to support multi-scalar multiplication. We pre-populate a lookup table of multiples of the points being multiplied (e.g. P, 3P and 5P for 3NAF), which are then used to add the non-zero terms of the scalar being multiplied in the non-adjacent form.
Interleaved wNAF is often used where part of the points are fixed, and then a larger window is used for those and their table can be precomputed in advance once, instead of being computed on-the-fly. However, that is not useful for FROST: we will describe an alternative solution later in this post.
Other algorithms such as Pippenger and Bos-Coster can be more efficient than the interleaved wNAF, but they are more complex to implement. We will eventually look into them. (We mostly went for interleaved wNAF because we already had an implementation of it used in batch verification!)
In our FROST libraries, we have already used a variable-time multi-scalar multiplication implementation to verify batches of Schnorr signatures all in one go. We now describe how we used this multi-scalar multiplication implementation to speed up how signers generate the group commitment R when performing the second round of FROST signing.
As a reminder, during the second round of the FROST signing protocol, each party computes the group commitment based on the nonce commitments sent by each i-th signer in the first round of the signing protocol. This group commitment is also computed by the coordinator in the final aggregate step, after all signing participants have created and sent their signature shares.
Computing this group commitment is a ripe opportunity to use multi-scalar multiplication, because we have to compute a multiplication of varying elliptic curve element bases (the nonce commitments from each participant) by a varying scalar (the binding factor). Previously, we would do a variable-base scalar multiplication for each participant, and then add the result to an accumulator elliptic curve group element. However, we can restructure our algorithm to accumulate the hiding commitments, and save the variable base multi-scalar multiplication of the binding commitments and the binding factor scalar to the end, in one shot. Then we add the result to the accumulator, to result in the full group commitment.
Because we already had a variable time multi-scalar multiplication implementation in our code base, this change only touched a few lines of code, but resulted in an over 50% speed up at the high values of threshold and max possible participants. The speed up was seen in the second round computation and the final aggregate step, as both are computing the group commitment.
This optimization is compliant with the FROST specification, as the change to use multi-scalar multiplication only involves a rearrangement of equation terms in the generation of the group commitment. The speed up is available with any multi-scalar multiplication implementation, variable-time or constant-time. The underlying elliptic curve group software implementation used by your FROST implementation might already have this optimization available.
Comparing Optimized FROST to FROST Variants
There are now several different variants of FROST in the literature, all that offer speedups with respect to the overhead of the group commitment. Notably, FROST2 allows for constant overhead when computing the nonce, and another variant presented in the context of ROAST improves on the bandwidth that is sent from the coordinator to each signing participant. However, FROST2 achieves weaker security than FROST, and the variant in the ROAST paper has not been demonstrated to have any stronger notion of security (i.e. TS-UF-1 and higher) other than unforgeability. As a result, we chose to keep the CFRG draft and our implementation pinned to the original FROST design.
Using multi-scalar multiplication to optimize computing the group commitment over the full execution of the FROST protocol is significant, because it brings the performance overhead of FROST closer to these alternatives, while retaining stronger security properties.
As opposed to making breaking changes to the protocol itself, we use known optimization tricks under the hood to speed up our implementation. Making protocol changes requires re-analysis and new security proofs, so such changes are not done lightly. Happily, in this case, we can get the best of both worlds: performance that is better than the trivial implementation of FROST (i.e. from linear overhead in the number of signers to close to constant), without having to compromise on the security or flexibility of the scheme.