Skip to content

secp256k1: Implement FieldVal64 field element.#3708

Open
valery-osheter-cb wants to merge 4 commits into
decred:masterfrom
valery-osheter-cb:field64-purego
Open

secp256k1: Implement FieldVal64 field element.#3708
valery-osheter-cb wants to merge 4 commits into
decred:masterfrom
valery-osheter-cb:field64-purego

Conversation

@valery-osheter-cb

Copy link
Copy Markdown

Overview

This is the first slice of the FieldVal64 work proposed in #3707: a pure-Go
alternative secp256k1 base-field element that stores the value as 4×uint64
limbs (base 2^64) instead of the existing FieldVal's 10×uint32 (base 2^26).
It stays fully reduced after every operation.
This PR is purely additive — it only adds new files under
dcrec/secp256k1/ and does not touch or alter the existing FieldVal or any
default build. Nothing is wired into the curve/ECDSA/Schnorr paths yet; that
is deferred to a later PR (see #3707 for the full plan).

Crandall reduction

The secp256k1 prime is a Crandall (pseudo-Mersenne) prime,
p = 2^256 - c with c = 0x1000003D1. A 512-bit product is folded modulo p by
multiplying the high 256 bits by the small constant c and adding them back
into the low 256 bits, repeating until the result fits in 256 bits, followed
by a single final conditional subtraction of p. Because c is small this
replaces a general division with a couple of multiply-add passes.

Constant time

The reduction and the final conditional subtraction are constant time
(borrow-mask select; no secret-dependent branches or memory access).

No new dependencies

This PR adds no new module dependencies. The CPU-feature detection and
assembly fast paths discussed in #3707 are intentionally not part of this
PR and will follow separately.

Commits

  • secp256k1: Add pure Go FieldVal64 (4x64) field element
  • secp256k1: Add tests for FieldVal64
  • secp256k1: Add benchmarks for FieldVal64

Testing

  • Differential tests against a math/big oracle, plus edge cases and
    randomized inputs.
  • go build, go vet, and go test -run Field64 pass under
    dcrec/secp256k1.

AI disclosure

AI-assisted.
Refs #3707

@davecgh davecgh changed the title SECP256K1 Field64 purego secp256k1: Implement FieldVal64 field element. Jun 22, 2026

@davecgh davecgh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Before anything else, I preface this by saying I really appreciate the submission and I know how much work this area of the code can be!

I mention that because this is an extremely critical area of consensus code that is notoriously easy to make a small mistake that goes unnoticed due to the inexhaustible testing surface and, as a result, reviews in this area fairly are critical in nature and particularly vigilant about consistency. So, please don't take the general strictness as being unappreciative.

I'll have to spend some more time reviewing this more meticulously, but to get started, I gave it a once over, and on the positive side, most of the code looks good and tight (aside from a few cases of code that isn't constant time). It makes judicious use of bits.Add64 and bits.Sub64 which weren't available when the code was originally written. That is particularly nice in general because the compiler automatically replaces them with the single instruction hardware intrinsics when it can (which is basically always on even remotely modern hardware). I notice that it also further takes advantage of that fact by arranging for long carry chains which is nice and great for efficiency.

On the negative side, unfortunately, most of the comments look very LLM generated. They basically ignore all of the existing style, and, well, have that distinctive, less than desirable LLM style to them. They do things like boldly claiming some things are true when they aren't and change wording out willy nilly versus the existing comments (e.g. reports versus returns, etc), which is another hallmark of LLMs.

Even worse, it strips almost all of the descriptive and helpful comments and replaces them with ones that really would not at all be helpful for anyone who isn't already intimately familiar with the literature and doesn't already know what it's doing. Not only that, it fails to document extremely important considerations for any consumers that the existing one does such as whether or not they're constant time and the implications of truncation.

As it stands, and I hate to say it since I know it sounds a bit harsh, but it would probably be easier to just scrap pretty much all of the public method comments, because they have very little, if any, redeeming value to them as it stands.

As a case in point, one of the comments is:

//
// Since the secp256k1 prime is ≡ 3 (mod 4), the square root is a^((p+1)/4),
// computed via the same addition chain used by FieldVal26 (254 squarings, 13
// multiplications).
func (f *FieldVal64) SquareRootVal(val *FieldVal64) bool {

That is an exported method and thus public documentation, yet it is referencing internal implementation details that are meaningless to anyone reading the docs given they neither need to know, nor can see, the addition chain used in the 10x26 implementation. What information does the fact it's calculating it as a^((p+1)/4) give to an API consumer?

Another example is:

// Pass 2: subtract p. Since p = 2^256 - C, the low 256 bits of t - p are
// identical to t + C, i.e. the folded result for a 2^256 overflow.

The 'C' there is introduced out of the blue and completely unexplained. I happen to know what it's talking about because I'm intimately familiar with the literature and the different reduction techniques such as Solinas and, in this case Crandall, which involves primes of the form "2^k - c", so I'm sure that is where the LLM pulled it from, but I can guarantee anyone who isn't will have no idea what that comment means.

A third example:

// Inverse sets f = f^(-1) (mod p) via Fermat's little theorem (a^(p-2)).
func (f *FieldVal64) Inverse() *FieldVal64 {

Once again, why would the caller care how it's calculated? How is that in any way useful to them?

Comment thread dcrec/secp256k1/field64.go Outdated
Comment thread dcrec/secp256k1/field64.go Outdated
Comment thread dcrec/secp256k1/field64.go Outdated
Comment thread dcrec/secp256k1/field64.go Outdated
Comment thread dcrec/secp256k1/field64.go Outdated
Comment thread dcrec/secp256k1/field64_test.go
Comment thread dcrec/secp256k1/field64_test.go Outdated
Comment thread dcrec/secp256k1/field64_test.go Outdated
Comment thread dcrec/secp256k1/field64_test.go Outdated
Comment thread dcrec/secp256k1/field64_test.go Outdated
@davecgh

davecgh commented Jun 22, 2026

Copy link
Copy Markdown
Member

Benchmark comparison with benchstat on 10 runs on a Ryzen 7 5800X3D (rebased on #3709):

name                          old time/op    new time/op    delta
FieldNegate                     3.92ns ± 5%    2.49ns ± 4%  -36.60%  (p=0.000 n=10+9)
FieldAdd                        4.31ns ± 2%    3.77ns ± 4%  -12.46%  (p=0.000 n=10+10)
FieldMul                        57.8ns ± 3%    19.0ns ± 2%  -67.11%  (p=0.000 n=10+10)
FieldSqrt                       9.55µs ± 4%    4.03µs ± 1%  -57.76%  (p=0.000 n=10+10)
FieldSquare                     33.9ns ± 2%    14.2ns ± 3%  -58.20%  (p=0.000 n=10+10)
FieldInverse                    9.57µs ± 2%    4.17µs ± 3%  -56.39%  (p=0.000 n=10+10)
FieldIsGtOrEqPrimeMinusOrder    8.69ns ± 4%    0.33ns ± 7%  -96.15%  (p=0.000 n=10+8)

name                          old alloc/op   new alloc/op   delta
FieldNegate                      0.00B          0.00B          ~     (all equal)
FieldAdd                         0.00B          0.00B          ~     (all equal)
FieldMul                         0.00B          0.00B          ~     (all equal)
FieldSqrt                        0.00B          0.00B          ~     (all equal)
FieldSquare                      0.00B          0.00B          ~     (all equal)
FieldInverse                     0.00B          0.00B          ~     (all equal)
FieldIsGtOrEqPrimeMinusOrder     0.00B          0.00B          ~     (all equal)

name                          old allocs/op  new allocs/op  delta
FieldNegate                       0.00           0.00          ~     (all equal)
FieldAdd                          0.00           0.00          ~     (all equal)
FieldMul                          0.00           0.00          ~     (all equal)
FieldSqrt                         0.00           0.00          ~     (all equal)
FieldSquare                       0.00           0.00          ~     (all equal)
FieldInverse                      0.00           0.00          ~     (all equal)
FieldIsGtOrEqPrimeMinusOrder      0.00           0.00          ~     (all equal)

@valery-osheter-cb

Copy link
Copy Markdown
Author

Thank you for your review.
I absolutely agree with your unflattering assessment of LLM.
I implemented saturated Field64 in both pure Go and ASM about one year ago without AI.
Recently I had an Idea to publish it, and I saw the best way is to make it as an alternative (optimized version) for existing implementation(s). As you can see, my English is poor, that why I decided to use AI to polish, but it seems bad idea, because it added bad comments. Do you want me to push my original handmade code in different branch?

@davecgh

davecgh commented Jun 22, 2026

Copy link
Copy Markdown
Member

No problem. As long as you don't have a problem with it, I would be fine with going through and polishing the comments. I don't mind doing it myself. I generally don't because I know that a lot of people tend to get upset when you start rewording their comments.

The code is the most important part and it is a rather large improvement over the existing implementation. The new type is also significantly easier to use because it doesn't require manually tracking the magnitude and sprinkling in calls to Normalize at the right times to avoid getting wrong results.

@valery-osheter-cb

Copy link
Copy Markdown
Author

I just pushed the commit with the fixes addressing your notes:

  • Reordered methods to match field.go and tidied exported doc comments.
  • Removed Normalize() and magnitude tracking (always fully reduced).
  • Added MulBy2–MulBy8, backed by a generic MulInt.
  • Reworked edge-case tests for the 4x64 limbs; removed TestField64ReductionCarry.
  • Audited for constant time, plus small fixes.

@davecgh davecgh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I've reviewed them and they look great. The comments are significantly improved, the non-constant time paths are now constant time as they should be, the addition of the MulByX is clean, and the conversion of MulInt to use a temporary + full multiply ensures that it continues to work with all ints for any consumers.

I still see various comment related things, but, as mentioned, I don't mind polishing those up separately.

I still need to go through in detail to assert there are no potential unhandled overflows and the generated assembly is constant time which I probably won't be able to get to for a couple of days since I'm working on some other things at the moment and reviewing this on my own time, but otherwise, I only spotted two small remaining things which I commented on inline.

Comment thread dcrec/secp256k1/field64_test.go Outdated
Comment thread dcrec/secp256k1/field64_test.go Outdated
@valery-osheter-cb

Copy link
Copy Markdown
Author

I did the cosmetic fixes in the tests that you noticed

Comment thread dcrec/secp256k1/field64.go Outdated
p1, c = bits.Add64(p1, h0, 0)
p2, c = bits.Add64(p2, h1, c)
p3, c = bits.Add64(p3, h2, c)
p4 := h3 + c

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserting overflow conditions:

Max possible value of h3 is (2^64 - 1)*(2^64 - 1) = 2^128 - 2^65 + 1, so the high word will be a max of 2^64-2. Max value of c is 1. Overflow not possible because 2^64-2+1 = 2^64-1 < 2^64.

Same for the other cases below.

✔️

t1, carry = bits.Add64(t1, x[1], carry)
t2, carry = bits.Add64(t2, x[2], carry)
t3, carry = bits.Add64(t3, x[3], carry)
t4 += carry

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserting overflow conditions:

Max possible value of t4 before the carries is (2^64 - 1)*(2^32 + 977), so the high word will be a max of 2^32 + 976 < 2^33 and a max of +2 added for carries. Overflow not possible.

✔️

Comment thread dcrec/secp256k1/field64.go
@valery-osheter-cb

Copy link
Copy Markdown
Author

I removed MulBy{5,6,7} and added the test for MulBy{2,3,4,8}
Please suggest next steps.

@davecgh

davecgh commented Jun 23, 2026

Copy link
Copy Markdown
Member

Thanks for the updates. I believe the last things are just to squash the last few commits and rebase over master.

Also, if you would, please fix the description on the secp256k1: Refine FieldVal64 API and tests. commit. It's missing a blank line after the title line which makes the entire message show as if it were the summary in the oneline output as can be seen in the following output:

| * 4bc1f00b5 seck256k1: cosmetic changes in tests of Field64
| * 566f6edf8 seck256k1: cosmetic changes in tests of Field64
| * 7a8b3dec1 secp256k1: Refine FieldVal64 API and tests. Reorder methods to mirror field.go and tidy the exported doc comments so the two field backends stay consistent. Drop Normalize and magnitude tracking since FieldVal64 is always fully reduced, add MulBy2 through MulBy8 for the common small-constant multiplications, and back them with a generic MulInt. Rework the edge-case tests to exercise the 4x64 limbs rather than the 10x26 word boundaries inherited from FieldVal, remove the inconsistent TestField64ReductionCarry, and ensure the operations remain constant time, along with assorted small fixes.
| * 66df468c8 secp256k1: Add benchmarks for FieldVal64
| * f2441a82c secp256k1: Add tests for FieldVal64
| * 65e4f8f00 secp256k1: Add pure Go FieldVal64 (4x64) field element

Other than that, I have a bit more review of the generated asm to assert everything is being generated with constant time. I expect it should be fine based on the Go code, but I make it a point to double check that actual generated code.

Introduce FieldVal64, an alternative secp256k1 field element that packs
the value into four uint64 limbs (base 2^64) and stays fully reduced
after every operation.

AI-assisted: developed with the help of an AI coding assistant; all code
reviewed and tested by the author.
Add a test suite for FieldVal64 mirroring the existing FieldVal tests:
serialization round-trips, arithmetic (add, negate, multiply, square),
normalization, inverse, square root, and the bit predicates, plus
randomized checks. Cases that only apply to the unnormalized FieldVal
representation are adjusted since FieldVal64 is always fully reduced.

AI-assisted: developed with the help of an AI coding assistant; all code
reviewed and tested by the author.
Add micro-benchmarks for the FieldVal64 multiply, square, inverse, and
related operations to track the pure Go performance.

AI-assisted: developed with the help of an AI coding assistant; all code
reviewed and tested by the author.
Comment thread dcrec/secp256k1/field64_test.go
Reorder methods to mirror field.go and tidy the exported doc comments
so the two field backends stay consistent.
Drop Normalize and magnitude tracking since FieldVal64 is always fully
reduced, add MulBy2, MulBy3, MulBy4, and MulBy8 for the common
small-constant multiplications, and back them with a generic MulInt.
Rework the edge-case tests to exercise the 4x64 limbs rather than the
10x26 word boundaries inherited from FieldVal, remove the inconsistent
TestField64ReductionCarry, and ensure the operations remain constant
time, along with assorted small fixes.
@valery-osheter-cb

Copy link
Copy Markdown
Author

My implementation of Bernstein-Yang "safegcd" constant-time modular inversion is about 25% faster than current implementation that is based on Ferma's little theorem. Do you want me to publish?

@davecgh davecgh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to go to me. I've manually reviewed all generated assembler to ensure constant time and asserted there are no unhandled cases where potential overflow could occur. That is, of course, in addition to reviewing all of the code and tests.

Thanks a ton for your efforts. Very nice work!

@davecgh

davecgh commented Jun 24, 2026

Copy link
Copy Markdown
Member

My implementation of Bernstein-Yang "safegcd" constant-time modular inversion is about 25% faster than current implementation that is based on Ferma's little theorem. Do you want me to publish?

That would be great (in a separate PR though please)! I saw the paper a while back and had it on my list of things to implement at some point. I expect it is much simpler to implement with the new saturated field since it doesn't have to worry about dealing with denormalized forms and conversion.

That would help speed up signing and some other ways inversion is used a bit. I don't imagine it will have any impact on signature verification though because, IIRC, we avoid the expensive inversion to affine that is usually needed during verification by staying in Jacobian and taking advantage of secp256k1's cofactor of 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants