secp256k1: Implement FieldVal64 field element.#3708
Conversation
davecgh
left a comment
There was a problem hiding this comment.
Thanks for the PR.
Before anything else, I preface this by saying I really appreciate the submission and I know how much work this area of the code can be!
I mention that because this is an extremely critical area of consensus code that is notoriously easy to make a small mistake that goes unnoticed due to the inexhaustible testing surface and, as a result, reviews in this area fairly are critical in nature and particularly vigilant about consistency. So, please don't take the general strictness as being unappreciative.
I'll have to spend some more time reviewing this more meticulously, but to get started, I gave it a once over, and on the positive side, most of the code looks good and tight (aside from a few cases of code that isn't constant time). It makes judicious use of bits.Add64 and bits.Sub64 which weren't available when the code was originally written. That is particularly nice in general because the compiler automatically replaces them with the single instruction hardware intrinsics when it can (which is basically always on even remotely modern hardware). I notice that it also further takes advantage of that fact by arranging for long carry chains which is nice and great for efficiency.
On the negative side, unfortunately, most of the comments look very LLM generated. They basically ignore all of the existing style, and, well, have that distinctive, less than desirable LLM style to them. They do things like boldly claiming some things are true when they aren't and change wording out willy nilly versus the existing comments (e.g. reports versus returns, etc), which is another hallmark of LLMs.
Even worse, it strips almost all of the descriptive and helpful comments and replaces them with ones that really would not at all be helpful for anyone who isn't already intimately familiar with the literature and doesn't already know what it's doing. Not only that, it fails to document extremely important considerations for any consumers that the existing one does such as whether or not they're constant time and the implications of truncation.
As it stands, and I hate to say it since I know it sounds a bit harsh, but it would probably be easier to just scrap pretty much all of the public method comments, because they have very little, if any, redeeming value to them as it stands.
As a case in point, one of the comments is:
//
// Since the secp256k1 prime is ≡ 3 (mod 4), the square root is a^((p+1)/4),
// computed via the same addition chain used by FieldVal26 (254 squarings, 13
// multiplications).
func (f *FieldVal64) SquareRootVal(val *FieldVal64) bool {That is an exported method and thus public documentation, yet it is referencing internal implementation details that are meaningless to anyone reading the docs given they neither need to know, nor can see, the addition chain used in the 10x26 implementation. What information does the fact it's calculating it as a^((p+1)/4) give to an API consumer?
Another example is:
// Pass 2: subtract p. Since p = 2^256 - C, the low 256 bits of t - p are
// identical to t + C, i.e. the folded result for a 2^256 overflow.The 'C' there is introduced out of the blue and completely unexplained. I happen to know what it's talking about because I'm intimately familiar with the literature and the different reduction techniques such as Solinas and, in this case Crandall, which involves primes of the form "2^k - c", so I'm sure that is where the LLM pulled it from, but I can guarantee anyone who isn't will have no idea what that comment means.
A third example:
// Inverse sets f = f^(-1) (mod p) via Fermat's little theorem (a^(p-2)).
func (f *FieldVal64) Inverse() *FieldVal64 {Once again, why would the caller care how it's calculated? How is that in any way useful to them?
|
Benchmark comparison with |
|
Thank you for your review. |
|
No problem. As long as you don't have a problem with it, I would be fine with going through and polishing the comments. I don't mind doing it myself. I generally don't because I know that a lot of people tend to get upset when you start rewording their comments. The code is the most important part and it is a rather large improvement over the existing implementation. The new type is also significantly easier to use because it doesn't require manually tracking the magnitude and sprinkling in calls to |
|
I just pushed the commit with the fixes addressing your notes:
|
There was a problem hiding this comment.
Thanks for the updates. I've reviewed them and they look great. The comments are significantly improved, the non-constant time paths are now constant time as they should be, the addition of the MulByX is clean, and the conversion of MulInt to use a temporary + full multiply ensures that it continues to work with all ints for any consumers.
I still see various comment related things, but, as mentioned, I don't mind polishing those up separately.
I still need to go through in detail to assert there are no potential unhandled overflows and the generated assembly is constant time which I probably won't be able to get to for a couple of days since I'm working on some other things at the moment and reviewing this on my own time, but otherwise, I only spotted two small remaining things which I commented on inline.
|
I did the cosmetic fixes in the tests that you noticed |
| p1, c = bits.Add64(p1, h0, 0) | ||
| p2, c = bits.Add64(p2, h1, c) | ||
| p3, c = bits.Add64(p3, h2, c) | ||
| p4 := h3 + c |
There was a problem hiding this comment.
Asserting overflow conditions:
Max possible value of h3 is (2^64 - 1)*(2^64 - 1) = 2^128 - 2^65 + 1, so the high word will be a max of 2^64-2. Max value of c is 1. Overflow not possible because 2^64-2+1 = 2^64-1 < 2^64.
Same for the other cases below.
✔️
| t1, carry = bits.Add64(t1, x[1], carry) | ||
| t2, carry = bits.Add64(t2, x[2], carry) | ||
| t3, carry = bits.Add64(t3, x[3], carry) | ||
| t4 += carry |
There was a problem hiding this comment.
Asserting overflow conditions:
Max possible value of t4 before the carries is (2^64 - 1)*(2^32 + 977), so the high word will be a max of 2^32 + 976 < 2^33 and a max of +2 added for carries. Overflow not possible.
✔️
|
I removed |
|
Thanks for the updates. I believe the last things are just to squash the last few commits and rebase over master. Also, if you would, please fix the description on the Other than that, I have a bit more review of the generated asm to assert everything is being generated with constant time. I expect it should be fine based on the Go code, but I make it a point to double check that actual generated code. |
Introduce FieldVal64, an alternative secp256k1 field element that packs the value into four uint64 limbs (base 2^64) and stays fully reduced after every operation. AI-assisted: developed with the help of an AI coding assistant; all code reviewed and tested by the author.
Add a test suite for FieldVal64 mirroring the existing FieldVal tests: serialization round-trips, arithmetic (add, negate, multiply, square), normalization, inverse, square root, and the bit predicates, plus randomized checks. Cases that only apply to the unnormalized FieldVal representation are adjusted since FieldVal64 is always fully reduced. AI-assisted: developed with the help of an AI coding assistant; all code reviewed and tested by the author.
Add micro-benchmarks for the FieldVal64 multiply, square, inverse, and related operations to track the pure Go performance. AI-assisted: developed with the help of an AI coding assistant; all code reviewed and tested by the author.
4bc1f00 to
5658396
Compare
Reorder methods to mirror field.go and tidy the exported doc comments so the two field backends stay consistent. Drop Normalize and magnitude tracking since FieldVal64 is always fully reduced, add MulBy2, MulBy3, MulBy4, and MulBy8 for the common small-constant multiplications, and back them with a generic MulInt. Rework the edge-case tests to exercise the 4x64 limbs rather than the 10x26 word boundaries inherited from FieldVal, remove the inconsistent TestField64ReductionCarry, and ensure the operations remain constant time, along with assorted small fixes.
5658396 to
5b6e6ef
Compare
|
My implementation of Bernstein-Yang "safegcd" constant-time modular inversion is about 25% faster than current implementation that is based on Ferma's little theorem. Do you want me to publish? |
There was a problem hiding this comment.
This looks good to go to me. I've manually reviewed all generated assembler to ensure constant time and asserted there are no unhandled cases where potential overflow could occur. That is, of course, in addition to reviewing all of the code and tests.
Thanks a ton for your efforts. Very nice work!
That would be great (in a separate PR though please)! I saw the paper a while back and had it on my list of things to implement at some point. I expect it is much simpler to implement with the new saturated field since it doesn't have to worry about dealing with denormalized forms and conversion. That would help speed up signing and some other ways inversion is used a bit. I don't imagine it will have any impact on signature verification though because, IIRC, we avoid the expensive inversion to affine that is usually needed during verification by staying in Jacobian and taking advantage of secp256k1's cofactor of 1. |
Overview
This is the first slice of the
FieldVal64work proposed in #3707: a pure-Goalternative secp256k1 base-field element that stores the value as 4×uint64
limbs (base 2^64) instead of the existing
FieldVal's 10×uint32 (base 2^26).It stays fully reduced after every operation.
This PR is purely additive — it only adds new files under
dcrec/secp256k1/and does not touch or alter the existingFieldValor anydefault build. Nothing is wired into the curve/ECDSA/Schnorr paths yet; that
is deferred to a later PR (see #3707 for the full plan).
Crandall reduction
The secp256k1 prime is a Crandall (pseudo-Mersenne) prime,
p = 2^256 - c with c = 0x1000003D1. A 512-bit product is folded modulo p by
multiplying the high 256 bits by the small constant c and adding them back
into the low 256 bits, repeating until the result fits in 256 bits, followed
by a single final conditional subtraction of p. Because c is small this
replaces a general division with a couple of multiply-add passes.
Constant time
The reduction and the final conditional subtraction are constant time
(borrow-mask select; no secret-dependent branches or memory access).
No new dependencies
This PR adds no new module dependencies. The CPU-feature detection and
assembly fast paths discussed in #3707 are intentionally not part of this
PR and will follow separately.
Commits
Testing
math/bigoracle, plus edge cases andrandomized inputs.
go build,go vet, andgo test -run Field64pass underdcrec/secp256k1.AI disclosure
AI-assisted.
Refs #3707