Othentic Blog | Revamping BLS: Optimize on-chain verification of arbitrary compute in the shared security era

Background

BLS (Boneh-Lynn-Shacham) signatures are a cryptographic tool that can be used to achieve consensus in a distributed system. The BLS signature scheme became popular within the crypto community following its inclusion in the Ethereum consensus protocol.

The key benefit of the BLS scheme is its ability to efficiently aggregate signatures. Signatures of the same message can be ״combined״ and verified simultaneously. This feature is crucial for Ethereum’s beacon chain as thousands of validators worldwide attest to the validity of blocks at a high rate. Without BLS aggregation, validating that amount of signatures in isolation would have been unfeasible.

In the new shared security era, node clients are evolving to support novel AVS (Actively Validated Services) use cases, and Operators are required to execute arbitrary computational tasks such as retrieve information from API, run complex ML calculations, analyze transactions to prevent exploits, solve transactions optimal route, automate on-chain and off-chain workflows, index and query databases, and more.

Othentic Stack introduces a programmable consensus engine to reach on-chain agreements on specialized AVS tasks, which uses the BLS signature scheme.

Similar to Ethereum validators that attest to blocks others have produced, AVS Operators attest to the execution of tasks by designated Task Performers, which are nodes in the network that execute arbitrary computation.

In order to facilitate consensus of arbitrary computing in a decentralized and permissionless manner, signature verification has to be done within the EVM environment. The implications of that are extremely high gas costs.

The Gas Bottleneck

One of the primary goals of Othentic is to support specialized node tasks to enable AVS’s Operators to submit results of arbitrary computations and come to a consensus on the execution.

The consensus algorithm leverages staked-weight leader election mechanism, a process for Task Attesters to make claims about a Task Performer. These claims are cryptographically signed to ensure the authenticity and immutability of task execution and attestation.

Task Performer - AVS Operator that executes arbitrary computation according to the network’s specification, and publishes the result via peer-to-peer networking for Attester nodes to discover.
Task Attesters - AVS Operators quorum that attests to the validity of the executed task.
Task Aggregator - AVS Operator collects all the attestation signatures, and aggregates them into one compact signature. The aggregator submits the task to the on-chain OBLS contract, specifying which operators attested.
OBLS contract - The OBLS contract constructs an aggregated public key (APK) corresponding to the attesters quorum, and verifies that the aggregated signature and the APK match.
Distributes rewards, penalties, and slashes to the appropriate operators.

The construction of the APK is the most gas-intensive part of this process. Aggregating two public keys costs around ~30k units of gas, and the cost grows linearly with respect to the number of operators.

Possible Workarounds

We have been researching possibilities to improve the efficiency and cost structure of the BLS scheme to allow high throughput of task submissions.

First approach: Intermediate Jacobian representation

In cryptography, Elliptic Curves are defined by an equation and a finite field. Points that satisfy the equation are considered “on the curve”. Usually, points on the curve are represented with 2 coordinates (X, Y) and in this form they are said to be in their “Affine representation”.

To optimize, we can project into 3D space (X, Y, Z). After this process, the points are said to be in “projective representation” or “Jacobian representation”. This process is more efficient in performing arithmetic on points when they are in their Jacobian representation.

The first notable optimization in this workaround is removing redundant conversions between affine representation of points and Jacobian representation of points.

This is the function we’re using to aggregate two public keys:

function ecTwistAdd(
       uint256 pt1xx,
       uint256 pt1xy,
       uint256 pt1yx,
       uint256 pt1yy,
       uint256 pt2xx,
       uint256 pt2xy,
       uint256 pt2yx,
       uint256 pt2yy
   ) internal view returns (uint256, uint256, uint256, uint256) {
       //... Input validation ...


       uint256[6] memory pt3 = ecTwistAddJacobian(pt1xx, pt1xy, pt1yx, pt1yy,
       1, 0, pt2xx, pt2xy, pt2yx, pt2yy, 1, 0);


       return _fromJacobian(pt3[PTXX], pt3[PTXY], pt3[PTYX],
        pt3[PTYY], pt3[PTZX], pt3[PTZY]);
   }

‍
The implementation suggests that whenever two points are added, they are added in their Jacobian form, but they are immediately converted back to their affine form.

This back-and-forth conversion is expensive and can be prevented by performing the conversion back to affine only once.

When refactored to work this way, BLS aggregation gas costs are cut by ~40%:

It should be noted that initially this approach was deemed impractical due to a false conclusion that the solidity library used for key aggregation had bugs in its implementation, and this article was published before finding this out.

As a consequence, the narrative of the article revolves around the latter optimization that will be presented, as it was the only working solution in the article when the article came out.

Nevertheless, being unable to implement this approach prompted further research into the problem, and by combining this method together with the latter, the gas savings become even more significant (in the optimal case - an order of magnitude less gas)

Special thanks to Skate AVS for pointing out the mistaken conclusion

Second approach: Switching to short public keys

Ethereum’s elliptic curve precompiles work with the BN254 curve (G1) and its field extension of degree 2 (G2)

The BLS signature scheme uses two elliptic curves:

G1 - The “Small” curve. Less data per point.

G2 - The “Big” curve. More data per point.

The BLS signature scheme is devised such that these curves are interchangeable; G1 can be used for signatures and G2 for public keys, or vice versa.

This leaves us with two variants:

Use G1 for signatures and G2 for public keys.
Use G2 for signatures and G1 for public keys.

Using the first variant, where the signatures are short and the public keys are long, the aggregation of public keys on-chain is significantly more expensive.

So why not switch to the second variant?
As often the problem with things are, the reason is EVM.

The state of EVM

In the BLS signature scheme, signatures are constructed from the private key of the signer and a hash of the message they want to sign.

The hash of the message in this context is obtained through a process known as hashing to a curve. In this process, unlike a regular hash function that produces a uniformly random string of bits, a hash_to_curve function produces a uniformly-randomly chosen point from an elliptic curve.

To use short public keys (G1), the scheme requires long signatures (G2), and because signatures are constructed from the message hashes, the hash function needs to hash to G2.

In regular contexts, this would not be a problem., Unfortunately, the EVM implements elliptic curve arithmetic only for G1 points. As a result, Implementing a hash to curve function for G2 in the EVM costs about 1 to 4 million units of gas.

Third approach: Caches and Diffs

AVS Operators regularly attest to new tasks, and the aggregated public keys (APKs) overlap from task to task. This results in a significant amount of duplicate APK computation that could be optimized by saving results from past APK computations and reusing them.

The simplest form of this is an exact-match cache:

Each time before initiating APK compute (aggregating each individual operator public key), we check if this specific operator quorum has already attested task in the past, and if it did, we use its cached APK:

// solidity psuedo code:


mapping(AddressSet => APK) cache;


function generateAPK(AddressSet operators) returns (APK) {
   if operators in cache {
       return cache[operators];
   } else
     return aggregateIndividualPKs(operators);
   }
}

This solution works, and significantly reduces gas costs when the cache hits, but as the number of operators grow, the number of possible operator quorums increases exponentially:

For 2 operators, there are 3 possible operator quorums: [A], [B], [A, B]
For 3 operators, there are 7 possibilities (23-1).
For 10 operators, there 1023 possibilities
For 50 operators, there are 250-1 ≅ 1015 possibilities.

Of course, not every possibility is equally likely. It depends on the voting power (footnote6) of the Operators and other factors. Considering that Operators will register, unregister, and switch their BLS keys from time to time, the gas saved by implementing this simple cache is just not worth it.

We have explored incorporating tree data structures and alternative data structuring algorithms to increase efficiency in the construction of a logical cache system…

And then came the big revelation.

Diffs

Elliptic curve point aggregation leverages the power of arithmetic, and that’s where the magic happens. You may have heard that two public keys are “added” together (even in this article), or that an elliptic curve point is “multiplied” by a number, and these terms are not coincidences; these elliptic curve points really do work like numbers:

A - the public key of Alice
B - the public key of Bob


// aggregating the two public keys is actually adding them together
APK = A + B


// adding them together implies we can subtract as well
APK - A = B

Using the power of subtraction in our hands, we can now construct the following algorithm:

On each task, retrieve the APK and the attesters quorum of the last submitted task, and compute the new APK based on the difference in the new attesters quorum and the last attesters quorum.
For each operator that attests that hasn’t attested previously, add their PK to the APK.
For each operator that doesn’t attest but has attested previously, subtract their PK from the APK.

‍

Here it is as solidity pseudocode:

This means the amount of G2 arithmetic operations done are exactly the difference between the number of Task Attesters attested to the last task and the current task.

How much gas does it actually save?

Performance

μ is the mean gas used, and ~ is the median.

We ran fuzz tests comparing the original aggregation algorithm with the new implementation.

In each test there were 10 subsequent APK aggregations with n operators, with each operator having a 10% chance of changing its attestation behavior from task to task:

‍

By graphing the results, we can see that the new implementation is more gas efficient by a factor of 2.81, which is about ~64% less gas than the original implementation.

We can also see that for a really small number of operators, the original implementation is actually more efficient. That is because the new optimization incurs overhead in storing and retrieving the APK cache. The crossover point at which the optimized implementation becomes more efficient than the old is at around ~4 AVS Operators.

* It should be noted that when the entire flow of task submission is taken into account, the effective gas saving is closer to a half.

‍

Conclusion

The optimizations presented here offer ways to mitigate the high costs of dealing with cryptography within the EVM. For the caches and diffs approach, this is achieved by utilizing the strong suites of the EVM - In this case, it is the usage of the persistent storage system that plays to our advantage.

There is still a lot of room for improvement and research. Some of the things that could be improved upon include:

APK indexing system that allows the aggregator to specify the closest cached APK
Research on how often caches should be updated

By getting rid of redundant format conversions, together with implementing the diff optimization results in a total ~80% reduction in gas usage. Moreover, the gas savings measured for the diff optimization relied on conservative estimates that assume erratic attestation behavior (10% change of changing attestation behavior each task). In practice, the gas usage will probably be much lower since stable AVSs are expected to have consistent operators that do not change their behavior regularly., The optimization reaps the benefits of exact cache hits, while the cost of cache misses become proportional to the difference in APKs.

Special thanks to Kobi Gurkan for providing help and advice in the process of reviewing this optimization.

‍

Epilogue

It should be mentioned that EIP-2537 will add BLS12-381 precompiles which are significantly more extensive and efficient than the existing ones for the BN254 curve. These precompiles will significantly reduce the costs of BLS signature verification, potentially rendering the cost margin saved by this optimization to be negligible.

Another significant optimization for BLS signature verification was devised by Kobi Gurkan: https://geometry.xyz/notebook/Optimized-BLS-multisignatures-on-EVM
This implementation uses short public keys (G1) while not having to hash onto G2. This is achieved by slightly modifying the BLS signature scheme itself.

‍

References

https://hackmd.io/@liangcc/bls-solidity

https://hackmd.io/@benjaminion/bls12-381

https://geometry.xyz/notebook/Optimized-BLS-multisignatures-on-EVM

https://hackmd.io/@cailynyongyong/HkuoMtz6o

https://github.com/musalbas/solidity-BN256G2

https://www.ietf.org/archive/id/draft-irtf-cfrg-hash-to-curve-10.html#SBCDK09

‍

Footnote 1 - AVS Tasks
The protocol’s core service can be broken down into units called “Tasks”. Each Task represents a unit of work to be carried out by Operators.

Footnote 2 - AVS Operators
Operators, who can be either individuals or organizations, opt-in to work for AVS and provide (1) task execution (computation, lookups, etc) by running the AVS node client and (2) crypto-economic security by delegating slashable stake in the form of native ETH, LST, LRT, etc.and attesting that the task execution done by other operators has been done correctly.

‍

Footnote 3 -

This cost metric is for aggregating alt_bn128 G2 points. The reason for not using G1 public keys is explained later in the article

‍

Footnote 4 -

Jacobian coordinates are a specific type of projective coordinates

‍

Footnote 5 -

BN254 in this article refers to alt_bn128 as specified in the EIP-196. Not to be confused with Fp254BNb

Footnote 6 - Voting power

The influence of individual Operator in the consensus process, proportional to the amount of (re)stake assets locked on the L1.