Partial SHA-2 Collision with Prefix/Suffix (oneAPI SYCL)

Link to GitHub Repository: https://github.com/jianxun-p/partial-sha2-collision-with-oneapi

This project searches for a partial collision in SHA-2 outputs: two different inputs that share the same first N bytes of hash output.

Inputs are constrained to this format:

input = prefix || variable_middle(N bytes) || suffix

The implementation uses a Van Oorschot–Wiener (VOW) style collision search with distinguishable points (DPs) and runs in parallel with oneAPI SYCL.

Table of Contents


Results

Here are the results for SHA256 (prefix = 0x00112233 and suffix = 0x33221100):

Collision Size (N)KDeviceHash CountDurationAverage Hash Speed
72AMD R9-9900X CPU2,000,114,392 hashes32 seconds62,503,574 hashes/second
82AMD R9-9900X CPU10,000,252,285 hashes82 seconds121,954,296 hashes/second
92AMD R9-9900X CPU26,000,102,890 hashes179 seconds145,251,971 hashes/second
72Intel Arc B580 GPU2,000,114,392 hashes2 seconds1,000,057,196 hashes/second
82Intel Arc B580 GPU10,000,252,285 hashes7 seconds1,428,607,469 hashes/second
92Intel Arc B580 GPU26,000,102,890 hashes22 seconds1,181,822,858 hashes/second

Potential Optimizations (CPU)

The number of work-item (THREADS) and BATCH_SIZE can be reduced.

Memory allocation/release is time consuming, which increased duration and average hash speed. The effect is more significant for smaller collision sizes.

Bottleneck: There exists 2 copies of the same data in the memory, and copying between them is unnecessary and time consuming.

The step of reduction with a single thread of CPU can be optimized, other threads are running idle at that time.

Potential Optimizations (GPU)

States are written/updated by device, the host only reads it, using device memory with memcpy to host memory for State may be a better alternative.

Bottleneck: The step of reduction with a single thread of CPU can be optimized, the GPU is running idle at that time.


Secure Hash Algorithm 2 (SHA-2) & The Merkle–Damgard Construction

The SHA-2 is based on the Merkle–Damgard construction.

The Merkle–Damgard construction consists of a compression function, (see function sha2_compression), which needed to be:

  • pre-image resistant
    • i.e. when given , it is computationally infeasible to find , such that
  • second pre-image resistant
    • i.e. when given , it is computationally infeasible to find , such that
  • collision resistant
    • i.e. it is computationally infeasible to find any pairs of such that
    • collision resistance implies second pre-image resistance

For a given message , the hash value of a hash function using the Merkle–Damgard construction is , where:

  • : intial hash values (i.e. initial vectors), see NIST's publication
  • for

Secure Hash Algorithm 2 (SHA-2)

See NIST's publication for details: https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf


Van Oorschot–Wiener

Distinguishable Point (DP) are values such that the first K bytes equal to zero.

Create chains of start_DP -> end_DP (Stage 1).

If two chains have the same end_DP and different start_DP, then these two chains must have merged together at some point.

We backtrack (Stage 2) to find the point where they merge together.

Each worker thread:

  1. Starts from a deterministic seed
  2. Repeatedly hashes prefix || middle || suffix
  3. Treats outputs with first K bytes equal to zero as a distinguishable point
  4. Stores chains ending at DPs

When two chains hit the same DP key (matching first N bytes), stage 1 returns two chain starts (X, Y) and distances to that DP.

Stage 2: Backtracking to find the actual partial collision

The two chains are aligned by step count and advanced together until the first point where their first N hash bytes match. The corresponding two inputs are reported.


Features

  • Supports SHA-2 family variants:
    • SHA224, SHA256, SHA384, SHA512, SHA512_224, SHA512_256
  • Configurable partial-collision size (N bytes)
  • Configurable DP condition (K bytes, where K <= N)
  • Prefix/suffix constrained input space
  • Parallel stage-1 walk on CPU/GPU SYCL device
  • Header-only SHA-2 implementation in sha2.hpp

Repository Layout

  • main.cpp: VOW search logic, SYCL kernels, collision reporting
  • sha2.hpp: Header-only SHA-2 implementations
  • Makefile: Build and run targets

Configuration

Edit constants near the top of main.cpp:

  • hash_type: select SHA-2 variant
  • N: number of leading output bytes that must collide
  • K: DP prefix length in bytes (K <= N)
  • prefix, suffix: fixed bytes around the variable N-byte middle
  • THREADS: number of parallel walkers
  • BATCH_SIZE: steps per walker before host merge/check
  • DP_ARRAY_LEN: max DPs stored per thread per batch

Notes

  • Larger N increases expected work roughly as for birthday-style partial collisions (in bits: ).
  • Larger K reduces DP frequency; smaller K increases merge overhead.
  • Tune THREADS, BATCH_SIZE, and DP_ARRAY_LEN for your device memory and throughput.

Build

Prerequisites:

  • Intel oneAPI DPC++/C++ compiler (icpx)
  • oneAPI/SYCL runtime properly configured in shell environment

Option 1: Build with Makefile

The project Makefile compiles to sha2_collision (or sha2_collision.exe on Windows).

Option 2: Direct compile command

icpx -fsycl -O3 -std=c++20 -Wall -Wextra -march=native -o sha2_collision main.cpp

Run

Run the built binary.

Program output includes:

  • selected SYCL device
  • stage-1 batch progress and hash counts
  • detected DP collision
  • stage-2 alignment/backtracking logs
  • final partial collision inputs and outputs
  • total hashes, duration, and hashing speed

Example Collision Condition

If N = 8, success means the first 8 bytes of the two outputs are identical:

hash(input1)[0..7] == hash(input2)[0..7]

with input1 != input2 and both matching the prefix || middle || suffix format.


Disclaimer

This project is for educational and research use (parallel hash search, SYCL programming, and collision-search techniques). Do not use it for unauthorized security testing.