SHA-256 Algorithm Explanation And Implementation in Python

In this article, I will implement the SHA-256 algorithm from scratch in Python. very simple and eas...

ThefCraft
12 min · Jun 19, 2024
962
223

SHA-256 Algorithm Explanation And Implementation in Python

https://www.ssldragon.com/wp-content/uploads/2023/04/sha-256.webp

https://www.ssldragon.com/wp-content/uploads/2023/04/sha-256.webp

Introduction

What SHA-256 is? SHA-256 is a Cryptography Hashing Algorithm Which is widely used method for Hashing messages.

In this article, I will implement the SHA-256 algorithm from scratch in Python.

What is Hashing

Hashing is the process of scrambling information or data beyond recognition. We use hash function to convert input data into hash Digest. These function are irreversible by nature.

HashFunction("Company") = "G#2$F5*L@" (Hash Value or Digest)

Requirements for a hash function

  • It should be impossible to create two different messages produces same hash values.
  • It should be impossible to generate a message that produces a specific hash value.
  • It should be irreversible.

What is SHA

  • Secure Hash Algorithm
  • NSA & NIST joint development
  • Has multiple families such as SHA-O, SHA-1, SHA-2 & SHA-3
  • Declared FIPS in 1993

How SHA-256 Convert data to hash value

Step 1: Padding

The Input message is padded to make sure its length is a multiple of 512 bits (64 bytes).

  • First append a single '1' bit to the message.
  • Append zeroes until the message length is become 64 bits short of any multiple of 512. i.e., length in bits ≡ 448 (mod 512).
  • Padding length to increase complexity of the function.
  • Length of the original data is padded to the result.
  • Length is expressed in the form of 64 bits.
  • The length of the new message should becomes a multiple of 512 bits.

Step 2: Initialize the hash (H) Values for SHA-256

Theses are specific constants defined by the algorithm.

# Initialize hash values (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19)
H0 = 0x6a09e667
H1 = 0xbb67ae85
H2 = 0x3c6ef372
H3 = 0xa54ff53a
H4 = 0x510e527f
H5 = 0x9b05688c
H6 = 0x1f83d9ab
H7 = 0x5be0cd19

Step 3: Processing the data

To preocess the data we need to do some more processing

  • Break the padded message into 512-bit blocks.
  • Initialize Working Variables a, b, ..., h with values H0 through H7.
  • Now process each blocks

Process Each Block

  1. Extend the 512-bit block into 64, 32-bit words (W[0] to W[63]).
  2. For each word from 16 to 63, compute the word using the formula:
W[t] = σ1(W[t-2]) + W[t-7] + σ0(W[t-15]) + W[t-16]
Here, σ0 and σ1 are two specific functions (defined in the SHA-256 Algorithm) that runs on 32-bit words.
  1. Then calculates 64 rounds (t = 0 to 63) of hash computations using the following operations:
T1 = h + Σ1(e) + Ch(e, f, g) + K[t] + W[t]
T2 = Σ0(a) + Maj(a, b, c)
h = g
g = f
f = e
e = d + T1
d = c
c = b
b = a
a = T1 + T2

Here, Σ0, Σ1, Ch, Maj are specific bitwise operations, and K[t] are constants defined for each round.
  1. After processing all 64 rounds for a block, update the hash values: css
H0 = H0 + a
H1 = H1 + b
H2 = H2 + c
H3 = H3 + d
H4 = H4 + e
H5 = H5 + f
H6 = H6 + g
H7 = H7 + h
  1. Continue for Next Block And If there are more blocks in the message, repeat steps 3 and 4.
  2. After processing all blocks, the hash value is the concatenation of H0 through H7, represented as a 256-bit hexadecimal number.

You can watch this video for more information.

Python Implementation

prepreocessing the data

def preprocess(data):
    length_bits = len(data) * 8 # Initial length of the message in bits
    data += b'\x80' # b'\x80' is hexadecimal for binary '10000000'
    # Append '0' bits until length is 448 bits modulo 512
    while len(data) % 64 != 56: # while bit_len(data) % 512 != 448:
        data += b'\x00'
    # Append the length of the original message as a 64-bit integer
    data += length_bits.to_bytes(8, byteorder='big')
    return data

rotation of a 32 bit integer

def rotate(x, n):
    """clockwise rotate a 32-bit integer to n positions"""
    # `x >> n` shifts the bits of `x` to the right by `n` positions. This moves the bits that fall off the right end back to the left end.
    # `x << (32 - n)` shifts the bits of `x` to the left by `(32 - n)` positions. This moves the bits that fall off the left end back to the right end.
    # `|` combines the results of the right shift and left shift operations.
    return ((x >> n) | (x << (32 - n))) & 0xFFFFFFFF

Some sha256 operations

def s0(x):
    return (rotate(x, 7) ^ rotate(x, 18) ^ x>>3) & 0xFFFFFFFF
def s1(x): 
    return (rotate(x, 17) ^ rotate(x, 19) ^ x>>10) & 0xFFFFFFFF

def sigma0(x):
    return (rotate(x, 2) ^ rotate(x, 13) ^ rotate(x, 22)) & 0xFFFFFFFF
def sigma1(x):
    return (rotate(x, 6) ^ rotate(x, 11) ^ rotate(x, 25)) & 0xFFFFFFFF

def Ch(e, f, g):
    # take bit from f if e is one else take bit from g
    # e = 0b01010001_00001110_01010010_01111111
    # f = 0b10011011_00000101_01101000_10001100
    # g = 0b00011111_10000011_11011001_10101011
    # out 0b00011111_10000101_11001001_10001100
    # format(Ch(e,f,g), '032b')
    return (e & f) ^ (~e & g)

def Maj(a, b, c):
    # if the majority of the bits at that position are 1, the result bit will be 1
    # a = 0b01010001_00001110_01010010_01111111
    # b = 0b10011011_00000101_01101000_10001100
    # c = 0b00011111_10000011_11011001_10101011
    # out 0b00011011_00000111_01011000_10101111
    # format(Maj(a,b,c), '032b')
    return (a & b) ^ (a & c) ^ (b & c)

sha256 Constants

# Initialize hash values (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19)
H0 = 0x6a09e667
H1 = 0xbb67ae85
H2 = 0x3c6ef372
H3 = 0xa54ff53a
H4 = 0x510e527f
H5 = 0x9b05688c
H6 = 0x1f83d9ab
H7 = 0x5be0cd19
H = [
    H0, H1, H2, H3,
    H4, H5, H6, H7
]
# Initialize array of round constants (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311)
K = [
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
]

sha256

import struct

def chunk_iter(msg, chunk_size=64):
    # Process the message in successive 512-bit chunks
    for i in range(0, len(msg), chunk_size):
        yield msg[i:i+chunk_size]

def sha256(data):
    h0, h1, h2, h3, h4, h5, h6, h7 = H

    data = preprocess(data)

    for chunk in chunk_iter(data, chunk_size=64):
        # chunk is a bytes object of length 64
        M = struct.unpack('>16I', chunk) # M0...M15 
        # slow M = [int.from_bytes(chunk[i:i+4], byteorder='big') for i in range(0, 64, 4)]

        W = list(M) + [0] * 48

        # Extend the sixteen 32-bit words into sixty-four 32-bit words
        for t in range(16, 64):
            # wt = s1(wt-2) + wt-7 + s0(wt-15) + wt-16
            W[t] = (s1(W[t-2]) + W[t-7] + s0(W[t-15]) + W[t-16]) & 0xFFFFFFFF

        # Initialize working variables
        a, b, c, d, e, f, g, h = h0, h1, h2, h3, h4, h5, h6, h7

        # main loop
        for t in range(64):
            # T1 = h + Σ1(e) + Ch(e, f, g) + K[t] + W[t]
            # T2 = Σ0(a) + Maj(a, b, c)
            T1 = (h + sigma1(e) + Ch(e, f, g) + K[t] + W[t]) & 0xFFFFFFFF
            T2 = (sigma0(a) + Maj(a, b, c)) & 0xFFFFFFFF
            h = g
            g = f
            f = e
            e = (d + T1) & 0xFFFFFFFF
            d = c
            c = b
            b = a
            a = (T1 + T2) & 0xFFFFFFFF

        # computer the intermediate hash     
        h0 = (a + h0) & 0xFFFFFFFF
        h1 = (b + h1) & 0xFFFFFFFF
        h2 = (c + h2) & 0xFFFFFFFF
        h3 = (d + h3) & 0xFFFFFFFF
        h4 = (e + h4) & 0xFFFFFFFF
        h5 = (f + h5) & 0xFFFFFFFF
        h6 = (g + h6) & 0xFFFFFFFF
        h7 = (h + h7) & 0xFFFFFFFF

    concatenate_hashes = (h0 << 224) | (h1 << 192) | (h2 << 160) | (h3 << 128) | (h4 << 96) | (h5 << 64) | (h6 << 32) | h7    
    return hex(concatenate_hashes)

example usage

# Example usage:
message = b'Hello, world!'
hashed = sha256(message)
print("SHA-256 Hash:", hashed[2:])
import hashlib
print("SHA-256 Hash:", hashlib.sha256(message).hexdigest())

Output

SHA-256 Hash: 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3
SHA-256 Hash: 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3

generating sha256 constants programmatically

def constant_generator():
    def is_prime(n:int)->bool:
        if n < 2: return False
        if n in (2, 3): return True
        if n % 2 == 0 or n % 3 == 0: return False
        i = 5
        while i * i <= n:
            if n % i == 0 or n % (i + 2) == 0: return False
            i += 6
        return True
    def first_prime(n:int):
        i = 2
        while n:
            if is_prime(i): 
                yield i
                n-=1
            i+=1

    def sqrt(n:int)->int: return n**(1/2)
    def cbrt(n:int)->int: return n**(1/3)
    def fractionalpart(x:int)->float:
        assert x>=0, "not implemented for negative"
        return x - x//1

    H = []
    K = []
    for p in first_prime(8):
        f = fractionalpart(sqrt(p))
        H.append(int(f * (1 << 32)))
    for p in first_prime(64):
        f = fractionalpart(cbrt(p))
        K.append(int(f * (1 << 32)))
    return H, K

H, K = constant_generator()

Full Code

you can download full code from my github account.

import struct

def preprocess(data):
    length_bits = len(data) * 8 # Initial length of the message in bits
    data += b'\x80' # b'\x80' is hexadecimal for binary '10000000'
    # Append '0' bits until length is 448 bits modulo 512
    while len(data) % 64 != 56: # while bit_len(data) % 512 != 448:
        data += b'\x00'
    # Append the length of the original message as a 64-bit integer
    data += length_bits.to_bytes(8, byteorder='big')
    return data

# & 0xFFFFFFFF Ensures that the result is masked to 32 bits, handling any overflow that may occur during the operations.

def rotate(x, n):
    """clockwise rotate a 32-bit integer to n positions"""
    # `x >> n` shifts the bits of `x` to the right by `n` positions. This moves the bits that fall off the right end back to the left end.
    # `x << (32 - n)` shifts the bits of `x` to the left by `(32 - n)` positions. This moves the bits that fall off the left end back to the right end.
    # `|` combines the results of the right shift and left shift operations.
    return ((x >> n) | (x << (32 - n))) & 0xFFFFFFFF

def s0(x):
    return (rotate(x, 7) ^ rotate(x, 18) ^ x>>3) & 0xFFFFFFFF
def s1(x): 
    return (rotate(x, 17) ^ rotate(x, 19) ^ x>>10) & 0xFFFFFFFF

def sigma0(x):
    return (rotate(x, 2) ^ rotate(x, 13) ^ rotate(x, 22)) & 0xFFFFFFFF
def sigma1(x):
    return (rotate(x, 6) ^ rotate(x, 11) ^ rotate(x, 25)) & 0xFFFFFFFF

def Ch(e, f, g):
    # take bit from f if e is one else take bit from g
    # e = 0b01010001_00001110_01010010_01111111
    # f = 0b10011011_00000101_01101000_10001100
    # g = 0b00011111_10000011_11011001_10101011
    # out 0b00011111_10000101_11001001_10001100
    # format(Ch(e,f,g), '032b')
    return (e & f) ^ (~e & g)

def Maj(a, b, c):
    # if the majority of the bits at that position are 1, the result bit will be 1
    # a = 0b01010001_00001110_01010010_01111111
    # b = 0b10011011_00000101_01101000_10001100
    # c = 0b00011111_10000011_11011001_10101011
    # out 0b00011011_00000111_01011000_10101111
    # format(Maj(a,b,c), '032b')
    return (a & b) ^ (a & c) ^ (b & c)

def chunk_iter(msg, chunk_size=64):
    # Process the message in successive 512-bit chunks
    for i in range(0, len(msg), chunk_size):
        yield msg[i:i+chunk_size]

# Initialize hash values (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19)
H0 = 0x6a09e667
H1 = 0xbb67ae85
H2 = 0x3c6ef372
H3 = 0xa54ff53a
H4 = 0x510e527f
H5 = 0x9b05688c
H6 = 0x1f83d9ab
H7 = 0x5be0cd19
H = [
    H0, H1, H2, H3,
    H4, H5, H6, H7
]
# Initialize array of round constants (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311)
K = [
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
]


def sha256(data):
    h0, h1, h2, h3, h4, h5, h6, h7 = H

    data = preprocess(data)

    for chunk in chunk_iter(data, chunk_size=64):
        # chunk is a bytes object of length 64
        M = struct.unpack('>16I', chunk) # M0...M15 
        # slow M = [int.from_bytes(chunk[i:i+4], byteorder='big') for i in range(0, 64, 4)]

        W = list(M) + [0] * 48

        # Extend the sixteen 32-bit words into sixty-four 32-bit words
        for t in range(16, 64):
            # wt = s1(wt-2) + wt-7 + s0(wt-15) + wt-16
            W[t] = (s1(W[t-2]) + W[t-7] + s0(W[t-15]) + W[t-16]) & 0xFFFFFFFF

        # Initialize working variables
        a, b, c, d, e, f, g, h = h0, h1, h2, h3, h4, h5, h6, h7

        # main loop
        for t in range(64):
            # T1 = h + Σ1(e) + Ch(e, f, g) + K[t] + W[t]
            # T2 = Σ0(a) + Maj(a, b, c)
            T1 = (h + sigma1(e) + Ch(e, f, g) + K[t] + W[t]) & 0xFFFFFFFF
            T2 = (sigma0(a) + Maj(a, b, c)) & 0xFFFFFFFF
            h = g
            g = f
            f = e
            e = (d + T1) & 0xFFFFFFFF
            d = c
            c = b
            b = a
            a = (T1 + T2) & 0xFFFFFFFF

        # computer the intermediate hash     
        h0 = (a + h0) & 0xFFFFFFFF
        h1 = (b + h1) & 0xFFFFFFFF
        h2 = (c + h2) & 0xFFFFFFFF
        h3 = (d + h3) & 0xFFFFFFFF
        h4 = (e + h4) & 0xFFFFFFFF
        h5 = (f + h5) & 0xFFFFFFFF
        h6 = (g + h6) & 0xFFFFFFFF
        h7 = (h + h7) & 0xFFFFFFFF

    concatenate_hashes = (h0 << 224) | (h1 << 192) | (h2 << 160) | (h3 << 128) | (h4 << 96) | (h5 << 64) | (h6 << 32) | h7    
    return hex(concatenate_hashes)


if __name__ == "__main__":
    # Example usage:
    message = b'Hello, world!'
    hashed = sha256(message)
    print("SHA-256 Hash:", hashed[2:])
    import hashlib
    print("SHA-256 Hash:", hashlib.sha256(message).hexdigest())