fix(consensus,framework,actuator): use Locale.ROOT for case-insensitive#6698
fix(consensus,framework,actuator): use Locale.ROOT for case-insensitive#6698halibobo1205 wants to merge 2 commits intotronprotocol:developfrom
Conversation
String.toLowerCase()/toUpperCase() without an explicit Locale uses Locale.getDefault(), which on Turkish (tr) or Azerbaijani (az) systems folds 'I' to dotless-ı (U+0131) instead of 'i' (U+0069). Changes: - Fix all toLowerCase()/toUpperCase() calls to use Locale.ROOT - Enable the ErrorProne StringCaseLocaleUsage checker at ERROR level to prevent future regressions at compile time - Add one-time data migration (MigrateTurkishKeyHelper) to normalize all Turkish legacy keys (ı → i) at startup.
| addresses[i] = randomBytes(21); | ||
| String turkishLower = accountIds[i].toLowerCase(TURKISH); | ||
| turkishKeys[i] = turkishLower.getBytes(StandardCharsets.UTF_8); | ||
| accountIdIndexStore.put(turkishKeys[i], new BytesCapsule(addresses[i])); |
There was a problem hiding this comment.
The test design is really crisp — 14 realistic accountIds covering both divergent (I-containing) and non-divergent names, symmetric pre-/post-migration assertions, and using Locale.forLanguageTag("tr") to faithfully reproduce legacy behavior is lovely. 🎯
One branch that doesn't get exercised today: the if (ArrayUtils.isEmpty(existing)) guard in MigrateTurkishKeyHelper.doWork(). Because this setup only seeds Turkish-form keys and never seeds a pre-existing ROOT-form key for the same accountId, the conflict-recovery path (where a ROOT entry already exists and the legacy Turkish value is silently dropped) is always skipped. That branch encodes the conservative conflict policy you describe in the PR body — would you consider adding a small case that:
- puts a ROOT-form key for
"ISSRWallet"with one address, - then puts the Turkish-form key
"ıssrwallet"with a different address, - runs
doWork(), - asserts the ROOT value wins and the Turkish-form key is gone?
It would also make the intentional policy visible to future readers who might otherwise wonder whether a "keep legacy" policy would be safer.
There was a problem hiding this comment.
Thanks — I considered this carefully and want to keep the test surface as-is.
The conflict branch only fires when the same accountId is committed to the DB under both Turkish-form and ROOT-form. For that state to exist on any deployed node, the node would have had to either :
(a) switch locale mid-operation while remaining in consensus,
(b) restore a tr-locale snapshot onto a ROOT-locale node.
Both of which would have produced contradictory SetAccountIdActuator.validate() verdicts versus the rest of the network and forked the chain. Mainnet and Nile have not observed such a fork, so no deployed DB contains the state this branch handles.
The branch itself stays in the code as a defensive check (snapshot-stitching tools, debugging operations, etc., should still not corrupt ROOT entries), but exercising it in a unit test would document a behavior that production data can never actually reach. I'd rather keep the test suite focused on the paths that real DBs traverse.
Summary
Fix a locale-dependent code hygiene issue: no-arg
String.toLowerCase()/toUpperCase()calls use the JVM default locale, and under a minority of locales (tr/az) the folding of'I'diverges from every other locale. Switching uniformly toLocale.ROOTdecouples the behavior from the deployment environment, backed by a compile-time guard and a one-time data normalization.This PR does three things:
toLowerCase()/toUpperCase()call with itsLocale.ROOTcounterpart so that string folding is independent of the JVM's startup locale.StringCaseLocaleUsagechecker atERRORlevel, and add a customStringCaseLocaleUsageMethodRefchecker to cover theString::toLowerCasemethod-reference form that upstream misses. Any future no-arg call fails compilation.MigrateTurkishKeyHelperruns once atManager.init()to normalize any non-ROOT legacy keys that may exist inAccountIdIndexStore.What this fix really does
Make the case-insensitive index behavior a pure function of the input bytes, no longer implicitly dependent on the JVM's startup locale. Before this fix,
AccountIdIndexStorewas, in principle, "case-insensitive relative to whichever locale this JVM happens to use" rather than truly locale-independent. After the fix it is the latter.This is a data-structure-level correctness convergence — remove an implicit environment dependency and align the implementation with the intended semantics.
Scope
Only one site reaches persistent storage:
AccountIdIndexStore.getLowerCaseAccountId— the lowercased result becomes a DB key. All other touched call sites (DB engine selection, disabled-API list, log topic classification, hex display, OS detection, etc.) are in-process runtime comparisons — they neither write to disk nor cross nodes.The input domain itself is tightly constrained: input to
setAccountIdis gated byvalidReadableBytesto printable ASCII (0x21–0x7E). Within this domain, the locale divergence in lowercase only manifests on a single character; every other character (letters, digits, symbols) folds identically under any locale.Why
MigrateTurkishKeyHelperis bundled inA code-only fix does not fully cover "nodes that were ever started under a non-default locale" — such nodes may carry legacy-format keys in their DB. The cleanest way to keep the code fix and the data state in sync is a one-shot normalization at startup.
Design notes:
MoveAbiHelperpattern, gated by aTURKISH_KEY_MIGRATION_DONEflag inDynamicPropertiesStoreso it runs at most once.AccountIdIndexStoreis a sparse index with very low cardinality — only 14 entries on mainnet today; scan cost is negligible).For nodes that were never affected, the scan yields zero candidates and the helper is a no-op.
Why the simplified normalization strategy
In principle one could enumerate every possible legacy variant of each ROOT-canonical key and merge them, but that enters a combinatorial space (per-key variant count can reach 2^k, where k is the occurrence count of the divergent character). Even with full enumeration, picking the correct value across conflicts requires knowing the historical insertion order — information the DB layer does not retain.
The conservative strategy chosen here:
Impact
Note for downstream fork chains
A downstream fork chain needs to treat this PR as a non-transparent upgrade only if both of the following hold:
SetAccountIdtransaction whose accountId contains a character that triggers the locale divergence (e.g.I).On such a chain, the
AccountIdIndexStorehistorical state was produced under non-ROOT folding; after this PR,has()lookups for the same accountId may yield different verdicts, which requires a coordinated hard-fork activation height and operator-defined historical data migration strategy.Mainnet and Nile are not in this category: condition (1) has never held historically.
Release scope