Revamping BLS: Optimize on-chain verification of arbitrary compute in the shared security era

Background

BLS (Boneh-Lynn-Shacham) signatures are a cryptographic tool that can be used to achieve consensus in a distributed system. The BLS signature scheme became popular within the crypto community following its inclusion in the Ethereum consensus protocol. 

​​The key benefit of the BLS scheme is its ability to efficiently aggregate signatures. Signatures of the same message can be ״combined״ and verified simultaneously. This feature is crucial for Ethereum’s beacon chain as thousands of validators worldwide attest to the validity of blocks at a high rate. Without BLS aggregation, validating that amount of signatures in isolation would have been unfeasible.

In the new shared security era, node clients are evolving to support novel AVS (Actively Validated Services) use cases, and Operators are required to execute arbitrary computational tasks1 such as retrieve information from API, run complex ML calculations, analyze transactions to prevent exploits, solve transactions optimal route, automate on-chain and off-chain workflows, index and query databases, and more.

Othentic Stack introduces a programmable consensus engine to reach on-chain agreements on specialized AVS tasks, which uses the BLS signature scheme.
Similar to Ethereum validators that attest to blocks others have produced, AVS Operators2 attest to the execution of tasks by designated Task Performers, which are nodes in the network that execute arbitrary computation. 

In order to facilitate consensus of arbitrary computing in a decentralized and permissionless manner, signature verification has to be done within the EVM environment. The implications of that are extremely high gas costs.



The Gas Bottleneck

One of the primary goals of Othentic is to support specialized node tasks to enable AVS’s Operators to submit results of arbitrary computations and come to a consensus on the execution.

The consensus algorithm leverages staked-weight leader election mechanism, a process for Task Attesters to make claims about a Task Performer. These claims are cryptographically signed to ensure the authenticity and immutability of task execution and attestation.

  • Task Performer - AVS Operator that executes arbitrary computation according to the network’s specification, and publishes the result via peer-to-peer networking for Attester nodes to discover.

  • Task Attesters - AVS Operators quorum that attests to the validity of the executed task.

  • Task Aggregator - AVS Operator collects all the attestation signatures, and aggregates them into one compact signature. The aggregator submits the task to the on-chain OBLS contract, specifying which operators attested.
  • OBLS contract - The OBLS contract constructs an aggregated public key (APK) corresponding to the attesters quorum, and verifies that the aggregated signature and the APK match.

  • Distributes rewards, penalties, and slashes to the appropriate operators.

The construction of the APK is the most gas-intensive part of this process. Aggregating two public keys costs around ~30k units of gas3, and the cost grows linearly with respect to the number of operators.

Possible Workarounds

We have been researching possibilities to improve the efficiency and cost structure of the BLS scheme to allow high throughput of task submissions. 

First approach:
Intermediate Jacobian representation

In cryptography, Elliptic Curves are defined by an equation and a finite field. Points that satisfy the equation are considered “on the curve”. Usually, points on the curve are represented with 2 coordinates (X, Y) and in this form they are said to be in their “Affine representation”. 

To optimize, we can project into 3D space (X, Y, Z). After this process, the points are said to be in “projective representation” or “Jacobian representation4. This process is more efficient in performing arithmetic on points when they are in their Jacobian representation.

The first notable optimization in this workaround is removing redundant conversions between affine representation of points and Jacobian representation of points.

This is the function we’re using to aggregate two public keys:

function ecTwistAdd(
       uint256 pt1xx,
       uint256 pt1xy,
       uint256 pt1yx,
       uint256 pt1yy,
       uint256 pt2xx,
       uint256 pt2xy,
       uint256 pt2yx,
       uint256 pt2yy
   ) internal view returns (uint256, uint256, uint256, uint256) {
       //... Input validation ...


       uint256[6] memory pt3 = ecTwistAddJacobian(pt1xx, pt1xy, pt1yx, pt1yy,
       1, 0, pt2xx, pt2xy, pt2yx, pt2yy, 1, 0);


       return _fromJacobian(pt3[PTXX], pt3[PTXY], pt3[PTYX],
        pt3[PTYY], pt3[PTZX], pt3[PTZY]);
   }


The implementation suggests that whenever two points are added, they are added in their Jacobian form, but they are immediately converted back to their affine form.

This back-and-forth conversion is expensive and can be prevented by performing the conversion back to affine only once.
However, there is a problem. The implementation for ecTwistAddJacobian we are using doesn’t work for adding more than 2 points. Here is a reproducible fuzz test that demonstrates this issue: 

// SPDX-License-Identifier: MIT
pragma solidity >=0.8.19;


import { Test } from "forge-std/Test.sol";
import { BN256G2 } from "musalbas/solidity-BN256G2/BN256G2.sol";


contract BN256G2PlaygroundTest is Test {
   uint256 internal constant G2_NEG_X_RE = 0x198E9393920D483A7260BFB731FB5D25F1AA493335A9E71297E485B7AEF312C2;
   uint256 internal constant G2_NEG_X_IM = 0x1800DEEF121F1E76426A00665E5C4479674322D4F75EDADD46DEBD5CD992F6ED;
   // slither-disable-next-line similar-names
   uint256 internal constant G2_NEG_Y_RE = 0x275dc4a288d1afb3cbb1ac09187524c7db36395df7be3b99e673b13a075a65ec;
   // slither-disable-next-line similar-names
   uint256 internal constant G2_NEG_Y_IM = 0x1d9befcd05a5323e6da4d435f3b617cdb3af83285c2df711ef39c01571827f9d;


   function test_jacobianWorks(uint256[10] memory seeds) public {
       // affine additions
       uint256[4][] memory affinePoints = new uint256[4][](10);
       for (uint256 i = 0; i < 10; i++) {
           if (seeds[i] == 0) {
               seeds[i] = 1;
           }
           (affinePoints[i][0], affinePoints[i][1], affinePoints[i][2], affinePoints[i][3]) = BN256G2.ECTwistMul(seeds[i], G2_NEG_X_IM, G2_NEG_X_RE, G2_NEG_Y_IM, G2_NEG_Y_RE);
       }


       uint256[4] memory affineSum;
       uint256[6] memory jacobianSum;
       for (uint256 i = 0; i < 10; i++) {
           jacobianSum = BN256G2._ECTwistAddJacobian(
               jacobianSum[0], jacobianSum[1], jacobianSum[2], jacobianSum[3], jacobianSum[4], jacobianSum[5],
               affinePoints[i][0], affinePoints[i][1], affinePoints[i][2], affinePoints[i][3], 1, 0
           );
           (affineSum[0], affineSum[1], affineSum[2], affineSum[3]) = BN256G2.ECTwistAdd(
               affineSum[0], affineSum[1], affineSum[2], affineSum[3],
               affinePoints[i][0], affinePoints[i][1], affinePoints[i][2], affinePoints[i][3]
           );
       }


       uint256[4] memory fromJacobian;
       (fromJacobian[0], fromJacobian[1], fromJacobian[2], fromJacobian[3]) = BN256G2._fromJacobian(jacobianSum[0], jacobianSum[1], jacobianSum[2], jacobianSum[3], jacobianSum[4], jacobianSum[5]);
       assertEq(affineSum[0], jacobianSum[0]);
       assertEq(affineSum[1], jacobianSum[1]);
       assertEq(affineSum[2], jacobianSum[2]);
       assertEq(affineSum[3], jacobianSum[3]);
   }
}

Second approach:
Switching to short public keys

Ethereum’s elliptic curve precompiles work with the BN2545 curve (G1) and its field extension of degree 2 (G2).

The BLS signature scheme uses two elliptic curves:

G1 - The “Small” curve. Less data per point.

G2 - The “Big” curve. More data per point.

The BLS signature scheme is devised such that these curves are interchangeable; G1 can be used for signatures and G2 for public keys, or vice versa.

This leaves us with two variants:

  • Use G1 for signatures and G2 for public keys.
  • Use G2 for signatures and G1 for public keys.

Using the first variant, where the signatures are short and the public keys are long, the aggregation of public keys on-chain is significantly more expensive. 

Unfortunately, the EVM makes the switching to the second variant unfeasible and gas-intensive.

The state of EVM

In the BLS signature scheme, signatures are constructed from the private key of the signer and a hash of the message they want to sign.

The hash of the message in this context is obtained through a process known as hashing to a curve. In this process, unlike a regular hash function that produces a uniformly random string of bits, a hash_to_curve function produces a uniformly-randomly chosen point from an elliptic curve.

To use short public keys (G1), the scheme requires long signatures (G2), and because signatures are constructed from the message hashes, the hash function needs to hash to G2.

In regular contexts, this would not be a problem., Unfortunately, the EVM implements elliptic curve arithmetic only for G1 points. As a result, Implementing a hash to curve function for G2 in the EVM costs about 1 to 4 million units of gas.

Solution: Revamping BLS using caches and diffs

AVS Operators regularly attest to new tasks, and the aggregated public keys (APKs) overlap from task to task. This results in a significant amount of duplicate APK computation that could be optimized by saving results from past APK computations and reusing them.

The simplest form of this is an exact-match cache:

Each time before initiating APK compute (aggregating each individual operator public key), we check if this specific operator quorum has already attested task in the past, and if it did, we use its cached APK:

// solidity psuedo code:


mapping(AddressSet => APK) cache;


function generateAPK(AddressSet operators) returns (APK) {
   if operators in cache {
       return cache[operators];
   } else
     return aggregateIndividualPKs(operators);
   }
}

This solution works, and significantly reduces gas costs when the cache hits, but as the number of operators grow, the number of possible operator quorums increases exponentially:

  • For 2 operators, there are 3 possible operator quorums: [A], [B], [A, B]
  • For 3 operators, there are 7 possibilities (23-1).
  • For 10 operators, there 1023 possibilities
  • For 50 operators, there are  250-1 ≅ 1015 possibilities.

Of course, not every possibility is equally likely. It depends on the voting power6 of the Operators and other factors. Considering that Operators will register, unregister, and switch their BLS keys from time to time, the gas saved by implementing this simple cache is just not worth it.

We have explored incorporating tree data structures and alternative data structuring algorithms to increase efficiency in the construction of a logical cache system…

And then came the big revelation.

Diffs

Elliptic curve point aggregation leverages the power of arithmetic, and that’s where the magic happens. You may have heard that two public keys are “added” together (even in this article), or that an elliptic curve point is “multiplied” by a number, and these terms are not coincidences; these elliptic curve points really do work like numbers:

A - the public key of Alice
B - the public key of Bob


// aggregating the two public keys is actually adding them together
APK = A + B


// adding them together implies we can subtract as well
APK - A = B

Using the power of subtraction in our hands, we can now construct the following algorithm:

  • On each task, retrieve the APK and the attesters quorum of the last submitted task, and compute the new APK based on the difference in the new attesters quorum and the last attesters quorum.

  • For each operator that attests that hasn’t attested previously, add their PK to the APK.

  • For each operator that doesn’t attest but has attested previously, subtract their PK from the APK.

Here it is as solidity pseudocode:

AddressSet lastOperators;
APK lastAPK;


function generateAPK(AddressSet operators) returns (APK) {
   // load cache
   apk = lastApk;
  
   // consider only operators that appear in exactly one of the sets
   changedOperators = xor(lastOperators, operators);
  
   for operator in changedOperators {
       if operator is in operators {
           // operator is attesting now, but hasn't previous time
           apk = apk + operator.pk;
       } else {
           // operator has attested last time, but hasn't now
           apk = apk - operator.pk;
       }
   }
  
   return apk;
}

Performance

μ is the mean gas used, and ~ is the median.

We ran fuzz tests comparing the original aggregation algorithm with the new implementation.

In each test there were 10 subsequent APK aggregations with n operators, with each operator having a 10% chance of changing its attestation behavior from task to task:


By graphing the results, we can see that the new implementation is more gas efficient by a factor of 2.81, which is about ~64% less gas than the original implementation.

We can also see that for a really small number of operators, the original implementation is actually more efficient. That is because the new optimization incurs overhead in storing and retrieving the APK cache. The crossover point at which the optimized implementation becomes more efficient than the old is at around ~4 AVS Operators.

* It should be noted that when the entire flow of task submission is taken into account, the effective gas saving is closer to a half.

Conclusion 

The optimization presented here offers a way to mitigate the high costs of dealing with cryptography within the EVM, by utilizing the strong suites of the EVM - In this case, it is the usage of the persistent storage system that plays to our advantage.

There is still a lot of room for improvement and research. Some of the things that could be improved upon include:

  • Mathematically correct Jacobian library
  • APK indexing system that allows the aggregator to specify the closest cached APK
  • Research on how often caches should be updated

By using the revamped BLS process, average gas usage in the aggregation process was cut down by two-thirds, and that’s under conservative estimates of erratic attestation behavior (10% chance of changing attestation behavior each task).
In practice, the gas usage will probably be much lower since stable AVSs are expected to have consistent operators that do not change their behavior regularly. The optimization reaps the benefits of exact cache hits, while the cost of cache misses becomes proportional to the difference in APKs.

Special thanks to Kobi Gurkan and Ron Turetzky for providing help and advice in the process of reviewing this optimization.

Epilogue

It should be mentioned that EIP-2537 will add BLS12-381 precompiles which are significantly more extensive and efficient than the existing ones for the BN254 curve. These precompiles will significantly reduce the costs of BLS signature verification, potentially rendering the cost margin saved by this optimization to be negligible.

Another significant optimization for BLS signature verification was devised by Kobi Gurkan: https://geometry.xyz/notebook/Optimized-BLS-multisignatures-on-EVM
This implementation uses short public keys (G1) while not having to hash onto G2
. This is achieved by slightly modifying the BLS signature scheme itself.

Footnotes

Footnote 1 - AVS Tasks
The protocol’s core service can be broken down into units called “Tasks”. Each Task represents a unit of work to be carried out by Operators.


Footnote 2 - AVS Operators
Operators, who can be either individuals or organizations, opt-in to work for AVS and provide (1) task execution (computation, lookups, etc) by running the AVS node client and (2) crypto-economic security by delegating slashable stake in the form of native ETH, LST, LRT, etc.and attesting that the task execution done by other operators has been done correctly.

Footnote 3 - This cost metric is for aggregating alt_bn128 G2 points. The reason for not using G1 public keys is explained later in the article.

Footnote 4 - Jacobian coordinates are a specific type of projective coordinates.

Footnote 5 - BN254 in this article refers to alt_bn128 as specified in the EIP-196. Not to be confused with Fp254BNb

Footnote 6 - Voting power 
The influence of individual Operator in the consensus process, proportional to the amount of (re)stake assets locked on the L1.

References

https://hackmd.io/@liangcc/bls-solidity

https://hackmd.io/@benjaminion/bls12-381

https://geometry.xyz/notebook/Optimized-BLS-multisignatures-on-EVM

https://hackmd.io/@cailynyongyong/HkuoMtz6o

https://github.com/musalbas/solidity-BN256G2

https://www.ietf.org/archive/id/draft-irtf-cfrg-hash-to-curve-10.html#SBCDK09