diff --git a/assets/contributors.csv b/assets/contributors.csv index 6c417f27e0..a8249390c3 100644 --- a/assets/contributors.csv +++ b/assets/contributors.csv @@ -120,3 +120,4 @@ Parichay Das,,parichaydas,parichaydas,, Johnny Nunez,NVIDIA,johnnynunez,johnnycano,, Raymond Lo,NVIDIA,raymondlo84,raymondlo84,, Kavya Sri Chennoju,Arm,kavya-chennoju,kavya-sri-chennoju,, +Akash Malik,Arm,akashmalik19973,akash-malik-a65bab219,, \ No newline at end of file diff --git a/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/_index.md b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/_index.md new file mode 100644 index 0000000000..83dcc558d5 --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/_index.md @@ -0,0 +1,60 @@ +--- +title: Post-Quantum Cryptography on Arm Cortex + +description: Learn how to implement and test post-quantum cryptographic algorithms on ARM Cortex-M4 microcontrollers using the pqm4 library. + +minutes_to_complete: 120 + +who_is_this_for: This tutorial is for software developers and cryptography enthusiasts interested in implementing and testing post-quantum cryptographic algorithms on ARM Cortex-M4 microcontrollers. + +learning_objectives: + - Understand the design goals of the pqm4 library. + - Set up the development environment for ARM Cortex-M4. + - Implement and test post-quantum cryptographic algorithms. + - Benchmark and profile cryptographic implementations. + - Integrate new cryptographic schemes into the pqm4 framework. + +prerequisites: + - ARM Cortex-M4 development board (e.g., NUCLEO-L4R5ZI, STM32F4 Discovery) + - Computer with Python 3.8 or higher + - ARM toolchain (arm-none-eabi) + - stlink and OpenOCD for flashing binaries + - QEMU 5.2 or higher for simulation + +author: + - Akash Malik + - Odin Shen + +### Tags +skilllevels: Advanced +subjects: + - Performance and Architecture + - Security +armips: + - Cortex-M +operatingsystems: + - Linux + - macOS +tools_software_languages: + - C + - Python + - ARM toolchain + - stlink + - QEMU + +further_reading: + - resource: + title: PQCRYPTO Project + link: https://pqcrypto.eu.org + type: website + - resource: + title: PQClean GitHub Repository + link: https://github.com/PQClean/PQClean + type: repository + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 +layout: "learningpathall" +learning_path_main_page: "yes" +--- diff --git a/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/_next-steps.md b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/_next-steps.md new file mode 100644 index 0000000000..e20dfa6d43 --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 +title: "Next Steps" +layout: "learningpathall" +--- diff --git a/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/adding-new-schemes.md b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/adding-new-schemes.md new file mode 100644 index 0000000000..8dbdb40672 --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/adding-new-schemes.md @@ -0,0 +1,113 @@ +--- +title: Adding New Schemes and Implementations to pqm4 + +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Adding New Schemes and Implementations + +The pqm4 build system facilitates easy addition of new schemes and implementations provided they follow the **NIST/SUPERCOP/PQClean API**. Follow these steps to add an M4-optimized implementation of a scheme like NewHope-512-CPA-KEM: + + +#### Step 1 : Download the Scheme Implementation + +Download the NewHope implementation from GitHub: +```bash +git clone https://github.com/newhopecrypto/newhope.git +``` +Navigate to the reference implementation: +```bash +cd newhope/ref +``` +This directory contains the implementation files required for integration. + +#### Step 2 : Create Scheme Directory +Inside pqm4, create a directory for the scheme: +```bash +mkdir -p crypto_kem/newhope512cpa/m4 +``` + +#### Step 3 : Copy Implementaion Files +Copy required files into pqm4 +Include files : +* Core algorithm files(.c,.h) +* Polynomial and NTT operations +* CPA KEM logic (cpakem.c, cpapke.c) +Do not Include file such as : +* randombytes.c +* PQCgenKAT_kem.c +* standalone test/benchmark files (speed.c, test.c) +* .o files + +#### Step 4 : Create API File + +create file name **api.h** + +```bash +crypto_kem/newhope512cpa/m4/api.h +``` + +Define CRYPTO_SECRETKEYBYTES, CRYPTO_PUBLICKEYBYTES, and CRYPTO_CIPHERTEXTBYTES using the values from the **params.h** file in the NewHope reference implementation, and implement the required functions: **crypto_kem_keypair**, **crypto_kem_enc** , and **crypto_kem_dec** + +* Example of api.h file + +```python +#ifndef API_H +#define API_H + +#define CRYPTO_SECRETKEYBYTES 3680 +#define CRYPTO_PUBLICKEYBYTES 1824 +#define CRYPTO_CIPHERTEXTBYTES 2208 +#define CRYPTO_BYTES 32 + +#define CRYPTO_ALGNAME "NewHope512-CCA" + +int crypto_kem_keypair(unsigned char *pk, unsigned char *sk); +int crypto_kem_enc(unsigned char *ct, unsigned char *ss, const unsigned char *pk); +int crypto_kem_dec(unsigned char *ss, const unsigned char *ct, const unsigned char *sk); + +#endif +``` +#### Step 5 : Handle Randomness + +* Do not include your own randombytes.c +* pmq4 provides its own RNG implementation + +#### Step 6 : Build the Scheme + +```bash +make clean +make -j4 PLATFORM= --uart newhope512cpa +``` +Expected output: +``` +SUCCESSFUL +``` + +### Using Optimized Cryptographic Functions + +- **FIPS202 (Keccak, SHA3, SHAKE)**: Use optimized Keccak code available in `mupq/common/fips202.h`. +- **SHA-2**: Use C implementations available in `sha2.h`. +- **AES**: Use assembly-optimized implementations available in `common/aes.h`. + +for our NewHope-512-CPA-KEM Implementation we have used optimized keccak code which is in `mupq/common/fips202.h` + +### Contributing Implementations + +- For reference implementations, contribute to [PQClean](https://github.com/PQClean/PQClean). +- For optimized C implementations, contribute to [mupq](https://github.com/mupq/mupq). +- For Cortex-M4 optimized implementations, contribute directly to pqm4. diff --git a/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/benchmarks.png b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/benchmarks.png new file mode 100644 index 0000000000..98fef3f33d Binary files /dev/null and b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/benchmarks.png differ diff --git a/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/introduction.md b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/introduction.md new file mode 100644 index 0000000000..a9e26d0598 --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/introduction.md @@ -0,0 +1,60 @@ +--- +title: Introduction to pqm4 and Post-Quantum Cryptography + +weight: 2 + +layout: learningpathall +--- + +### Post-Quantum Cryptography + +The [pqm4](https://github.com/mupq/pqm4) framework is a benchmarking and implementation suite for post-quantum cryptography (PQC) on Arm Cortex-M4 microcontrollers. It originated from the [PQCRYPTO](https://pqcrypto.eu.org) project, funded by the European Commission, and has since evolved into a widely used platform for evaluating PQC in embedded environments. + +As quantum computing advances, widely used cryptographic schemes such as RSA and elliptic curve cryptography are expected to become insecure. This presents a unique challenge for embedded systems, where devices often remain deployed for 10 to 20 years and must be designed with long-term security in mind. + +Post-quantum cryptography is expected to play a critical role in securing a wide range of embedded applications, including secure firmware updates, device authentication, encrypted communication (e.g., IoT sensor-to-cloud), and integrity protection for edge AI models. These use cases require cryptographic mechanisms that remain secure over the lifetime of the device, even in the presence of future quantum adversaries. + +To address this, new PQC algorithms have been standardized by NIST, including ML-KEM for key exchange and ML-DSA for digital signatures. However, these algorithms are significantly more demanding in terms of computation, memory, and code size compared to classical cryptography — making their deployment on constrained microcontrollers non-trivial. + +The pqm4 framework provides a practical solution by enabling developers to evaluate PQC implementations under real embedded constraints. It offers standardized benchmarking for performance (cycle counts), memory usage (stack), and code size, along with optimized implementations tailored for the Cortex-M4 architecture. This allows developers to move beyond theoretical analysis and make informed decisions about deploying PQC in real-world embedded systems. + + +### Two Public-key Primitives + +Two public-key primitives are particularly fundamental to modern cryptography: +- key encapsulation mechanisms (KEMs) and +- digital signature algorithms (DSAs). + +KEMs allow two parties to establish a shared secret over an insecure channel - the foundation for encrypted communications in protocols like TLS. Digital signatures provide authentication and integrity, ensuring that a message genuinely comes from its claimed sender and hasn't been tampered with. Together, these primitives underpin everything from secure web browsing to firmware updates on embedded devices. + +Post-quantum cryptography replaces classical algorithms with new designs built on mathematical problems that remain hard even for quantum computers. Among the various primitives, KEMs and signatures are the most critical for most applications and have been the focus of NIST's standardization effort. KEMs have received particular urgency due to "harvest now, decrypt later" attacks - adversaries can record encrypted communications today and decrypt them once quantum computers become available. + +This makes protecting data in transit an immediate priority, even though quantum computers may still be years away. Unlike classical public-key cryptography, which relies almost entirely on integer factorization and discrete logarithms, PQC draws on a variety of foundations: Lattices, hash functions, error-correcting codes, multivariate polynomials, and more. This diversity means that different PQC schemes come with very different performance characteristics and trade-offs. + +In this learning path, we will focus on KEMs implementation on Cortex-M. + +### Benefits of pqm4 for ARM Developers + +- Efficient evaluation of post-quantum cryptographic algorithms on ARM Cortex-M4 microcontrollers. +- Accurate measurement of performance, memory usage, and execution cycles on real embedded hardware. +- Standardized framework for testing, benchmarking, and comparing multiple cryptographic implementations. +- Simplified integration and experimentation with new cryptographic schemes and optimizations for ARM platforms. + +### Design Goals + +The primary design goals of the pqm4 library are: + +- Automated functional testing and test vector generation with validation against reference implementations. +- Comprehensive benchmarking, including speed, stack usage, and code size analysis. +- Profiling of cryptographic primitives such as SHA-2, SHA-3, and AES. +- Easy integration of clean and optimized implementations (e.g., from [PQClean](https://github.com/PQClean/PQClean)) and new schemes. + +### Scope of pqm4 + +The pqm4 library includes schemes that are: + +- Standardized by NIST in FIPS203, FIPS204, or FIPS205. +- Selected for standardization by NIST. +- Part of the 4th round of the NIST PQC standardization process. +- Part of the first round of additional signatures of the NIST PQC standardization process. +- Part of the second round of the KpqC competition. diff --git a/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/running-tests-and-benchmarks.md b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/running-tests-and-benchmarks.md new file mode 100644 index 0000000000..7c38b4e2fb --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/running-tests-and-benchmarks.md @@ -0,0 +1,286 @@ +--- +title: Running Tests and Benchmarks with pqm4 + +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +### Testing and Benchmarking + +After building pqm4, multiple binaries are generated for each scheme and implementation. +These binaries are used to verify correctness, measure performance, and analyze resource usage. + +For example, for a KEM such as **ML-KEM-768**, the following binaries are generated: + +#### 1. Test Binary + +```bash +bin/crypto_kem_ml-kem-768_m4_test.bin +``` +This binary verifies that the scheme works correctly. + +For KEMs: +* Generates a keypair +* Performs encapsulation +* Performs decapsulation +* Verifies that both parties derive the same shared secret + +It also checks failure cases such as : +* invalid secret key +* invalid ciphertext + +Expected Output : + +```python +========================== +DONE key pair generation! +DONE encapsulation! +DONE decapsulation! +OK KEYS + ++ +... +OK invalid sk_a + ++ +OK invalid ciphertext + ++ +# +``` + +#### 2. Speed Binary + +```bash +bin/crypto_kem_ml-kem-768_m4_speed.bin +``` +This binary measures execution time (in CPU cycles) for : +* crypto_kem_keypair +* crypto_kem_enc +* crypto_kem_dec + +This is used to evaluate performance on embedded hardware. + +Expected Output : + +```python +========================== +keypair cycles: +123456 + +encaps cycles: +234567 + +decaps cycles: +210000 += +``` + +#### 3. Hashing Binary + +```bash +bin/crypto_kem_ml-kem-768_m4_hashing.bin +``` +This measures how many cycles are spent in: +* SHA-2 +* SHA-3 +* AES + +This helps analyze how much of the algorithm cost comes from symmetric cryptography. + +Expected Output : + +```python +========================== +keypair hash cycles: +50000 + +encaps hash cycles: +80000 + +decaps hash cycles: +75000 += +``` + +#### 4. Stack Binary + +```bash +bin/crypto_kem_ml-kem-768_m4_stack.bin +``` +This measures stack memory usage of: +* keypair +* encapsulation +* decapsulation + +Note: On some boards, stack measurement **may not work correctly** due to platform-specific memory layout. + +Note: Memory allocated outside functions (e.g., public keys, ciphertexts) is not included. + +Expected Output : + +```python +========================== +keypair stack usage: +2048 + +encaps stack usage: +3072 + +decaps stack usage: +2800 +# +``` + +#### 5. Test Vectors Binary + +```bash +bin/crypto_kem_ml-kem-768_m4_testvectors.bin +``` +This generates deterministic test vectors using a fixed random seed. + +These vectors are used to: +* validate correctness +* compare different implementations + +#### 6. Host Test Vectors + +```bash +bin-host/crypto_kem_ml-kem-768_m4_testvectors +``` + +This runs on the host (PC) and generates the same deterministic test vectors for comparison. + +### Running Binaries Manually + +To test a binary on the board: + +#### Flash the binary + +```bash +st-flash write bin/.bin 0x8000000 +``` + +Example + +```bash +st-flash write bin/crypto_kem_ml-kem-768_m4_test.bin 0x8000000 +``` +#### Read output from the board + +```bash +python3 hostside/host_unidirectional.py +``` + note: Press the RESET button on the board to see the output. + + +### Automated Testing and Benchmarking + +pqm4 provides Python scripts to automate testing and benchmarking. + +#### 1. Run Functional Tests + +```bash +python3 test.py -p --uart +``` + +Example for NUCLEO-L476RG Board : + +```bash +python3 test.py -p nucleo-l476rg --uart /dev/tty.usbmodemXXXX ml-kem-768 +``` + +This will: +* **Flash** test binaries +* **Run** them on the board +* **Check** correctness automatically + +Expected Output : +```python +ml-kem-768 - m4fspeed SUCCESSFUL +ml-kem-768 - m4fstack SUCCESSFUL +ml-kem-768 - clean SUCCESSFUL +test: 100%|█████████████████████████████████████████████| 3/3 [00:12<00:00, 4.29s/it, ml-kem-768 - clean] + +``` + + +#### 2. Run Test Vectors + +```bash +python3 testvectors.py -p --uart +``` +Example for NUCLEO-L476RG Board : + +```bash +python3 testvectors.py -p nucleo-l476rg --uart /dev/tty.usbmodemXXXX ml-kem-768 +``` +This will: +* **generates** test vectors on the board +* **compares** them with host-side results + +Expected Output : + +```python +ml-kem-768 - m4fspeed SUCCESSFUL +ml-kem-768 - m4fstack SUCCESSFUL +ml-kem-768 - clean SUCCESSFUL + +test: 100%|█████████████████████████████████████████████| 3/3 [00:12<00:00, 4.29s/it, ml-kem-768 - clean] +``` + +#### 2. Run Benchmarks + +```bash +python3 benchmarks.py -p --uart +``` +Example for NUCLEO-L476RG Board : + +```bash +python3 benchmarks.py -p nucleo-l476rg --uart /dev/tty.usbmodemXXXX ml-kem-768 +``` +This will : +* runs speed and stack benchmarks +* stores results in **benchmarks/** + +Expected Output : + +```python +speed: 33%|███████████████▍ | 1/3 [00:20<00:40, 20.00s/it, ml-kem-768 - m4fspeed] +speed: 66%|██████████████████████████▊ | 2/3 [00:40<00:20, 20.00s/it, ml-kem-768 - m4fstack] +speed: 100%|██████████████████████████████| 3/3 [01:00<00:00, 20.00s/it, ml-kem-768 - clean] +``` + +For most of the schemes there are multiple implementations. +The naming scheme for these implementations is as follows: + +- clean: clean reference implementation from PQClean, +- ref: the reference implementation submitted to NIST (will be replaced by clean in the long term), +- opt: an optimized implementation in plain C (e.g., the optimized implementation submitted to NIST), +- m4: an implementation with Cortex-M4 specific optimizations (typically in assembly). +- m4f: an implementation with Cortex-M4F specific optimizations (typically assembly using floating-point registers). + +Expected output of the benchmark results is stored in the **benchmarks.csv** file + +![benchmarks.csv](./benchmarks.png) + +Note : On some boards, stack measurement may not work correctly due to platform-specific memory layout. + +### Uisng QEMU(Optional) + +For the mps2-an386 platform, binaries can be executed in QEMU: + +```bash +qemu-system-arm -M mps2-an386 -nographic -semihosting -kernel elf/.elf +``` + +Example : + +```bash +qemu-system-arm -M mps2-an386 -nographic -semihosting -kernel elf/crypto_kem_ml-kem-512_m4_test.elf +``` + +To exit QEMU : +* Press Ctrl + A, then X \ No newline at end of file diff --git a/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/setup-installation.md b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/setup-installation.md new file mode 100644 index 0000000000..1a266e4fe6 --- /dev/null +++ b/content/learning-paths/embedded-and-microcontrollers/pqc_pqm4/setup-installation.md @@ -0,0 +1,156 @@ +--- +title: Setting Up the Development Environment + +weight: 3 + +layout: learningpathall +--- + +## Required Hardware and Software + +Before you begin, ensure you have the following hardware and software: + +- **Development Board**: ARM Cortex-M4 based board such as: + - NUCLEO-L476RG + - NUCLEO-L4R5ZI (default in pqm4) + - STM32F4 Discovery +- **ARM Toolchain**: arm-none-eabi toolchain +- **Flashing Tools**: stlink or OpenOCD +- **Python 3.8+** +- **Python Modules**: pyserial, tqdm + +## Installing the ARM Toolchain + + +Download the ARM GNU toolchain from the official website: + +https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads + +Download and install **gcc-arm-none-eabi** for your system. + +Recommended version: **12.x** +Avoid newer versions (e.g., 15.x) as they may cause build/linker issues with pqm4. + +## Installing stlink + +For flashing binaries, install stlink using your package manager or compile it from source: + +```bash +git clone https://github.com/texane/stlink.git +cd stlink +make release +``` + +Verify connection + +```bash +st-info --probe +``` +Expected output: +Found 1 stlink programmers + +## Installing OpenOCD + +For the NUCLEO-L4R5ZI board, OpenOCD is used. Install it via your package manager or compile from source: + +```bash +git clone http://openocd.org +cd openocd +./configure +make +``` +## Installing ChipWhisperer (Optional) +The ChipWhisperer module is only required if you are using the `cw308t-stm32f3` platform. +If you are using other boards (e.g., NUCLEO or STM32 Discovery), you can skip this step. + +```bash +python3 -m pip install chipwhisperer +``` + +## Installing QEMU (Optional) +QEMU is required only if you are using the mps2-an386 platform (simulated ARM Cortex-M4 environment). +If you are using a physical board, you can skip this step. + +For macOS: +```bash +brew install qemu +``` +For Linux: +```bash +sudo apt-get install qemu-system-arm +``` +Note : Ensure the version is 5.2 or higher. + +## Installing Python Dependencies + +Install the required Python modules using pip: + +```bash +python3 -m pip install pyserial tqdm +``` + +## Downloading pqm4 and Submodules + +```bash +git clone --recursive https://github.com/mupq/pqm4.git +cd pqm4 + +``` + +## Building for a Target Platform + +```bash +make -j4 PLATFORM= +``` + +Example for NUCLEO-L476RG board: +make -j4 PLATFORM=nucleo-l476rg + + +## Configuring Serial Port in host_unidirectional.py + +The script `host_unidirectional.py` uses a default serial port (often `/dev/ttyUSB0`) which may not match your system. + +You must update it to match your board’s serial port. + +Open the file: +```bash +nano hostside/host_unidirectional.py +``` +replace the port shown in below line with your actual port +```python +dev = serial.Serial("/dev/tty.usbmodemXXXX", 38400) +``` +to find your port +```bash +ls /dev/tty.* +``` + + +## Flashing and Testing Communication + +Connect the board to your host machine using the mini-USB port. This provides it with power, and allows you to flash binaries onto the board + + +Flash a basic test: + +```bash +st-flash write bin/boardtest.bin 0x8000000 +``` + +Read output: + +```bash +python3 hostside/host_unidirectional.py +``` + +Press RESET button on board + +Expected output : +```python +Hello world +Stack Size +Random number +``` + +