白乐天

道阻且长,行则将至。

详解MD5

MD5简介

MD5(Message-Digest Algorithm 5) 是一种广泛使用的哈希函数,其核心功能是将任意长度的输入数据转换为固定长度的128位(16字节)哈希值,通常以32位十六进制字符串表示。

数据填充

把输入数据填充到64字节(512位)的倍数(最少填充9个字节,最多填充64+8个字节)。

填充方式:

  • 在原始数据的后面填充0x80
  • 填充N个0
  • 使用小端序在最后8个字节填充输入数据的bit长度

分块处理

将填充后的数据分割为多个512位(64字节)的块,每个块进一步分为16个32位子块(记为M[0]M[15])。

初始化寄存器

初始化4个32位寄存器(A、B、C、D),初始值为:

1
2
3
4
A = 0x67452301  
B = 0xEFCDAB89
C = 0x98BADCFE
D = 0x10325476

四轮非线性变换

每个512位块会经过四轮处理(每轮16步,共64步),每轮使用不同的逻辑函数和参数

四轮逻辑函数

1
2
3
4
a = A = 0x67452301  
b = B = 0xEFCDAB89
c = C = 0x98BADCFE
d = D = 0x10325476

每轮使用不同的非线性函数(共4个):

  • Round 1F(b, c, d) = (b & c ) | ((~b) & d)
  • Round 2G(b, c, d) = (b & d) | (c & (~d))
  • Round 3H(b, c, d) = b ⊕ c ⊕ d
  • Round 4I(b, c, d) = c ⊕ (b | (~d))

消息子块的使用顺序

每轮以不同顺序使用16个消息子块 M[k]

  • Round 1k = 0, 1, 2, ..., 15(顺序使用)
  • Round 2k = (1 + 5i) mod 16(i从0到15)
  • Round 3k = (5 + 3i) mod 16(i从0到15)
  • Round 4k = 7i mod 16(i从0到15)

常量表与循环左移

每步操作使用一个预定义的32位常量 T[1..64],其值为 ⌊2^32 × |sin(i)|⌋(i为步数)。循环左移位数 s 按如下规则变化:

  • Round 1: s = [7, 12, 17, 22] 循环使用
  • Round 2: s = [5, 9, 14, 20]
  • Round 3: s = [4, 11, 16, 23]
  • Round 4: s = [6, 10, 15, 21]

单步操作详解(以第i步为例)

每步操作对寄存器 A, B, C, D 进行更新,公式如下:

计算中间值

temp= FF(a,b,c,d,M[k],s[i],T[i])

FF(a,b,c,d,M[k],s[i],T[i]) = b + ((a+F(b,c,d) + M[k] + T[i]) <<< s[i])

(其中 + 是模 2^32 加法,<<< s 表示循环左移s位)

更新寄存器

1
a = b + temp

寄存器轮换

生成最终哈希值

将所有块处理完成后,将寄存器A、B、C、D的值按小端序拼接,转换为十六进制字符串,即为MD5哈希值。

python实现md5

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# T Table
T = [[0xD76AA478, 0xE8C7B756, 0x242070DB, 0xC1BDCEEE, 0xF57C0FAF, 0x4787C62A, 0xA8304613, 0xFD469501, 0x698098D8, 0x8B44F7AF, 0xFFFF5BB1, 0x895CD7BE, 0x6B901122, 0xFD987193, 0xA679438E, 0x49B40821],
[0xF61E2562, 0xC040B340, 0x265E5A51, 0xE9B6C7AA, 0xD62F105D, 0x02441453, 0xD8A1E681, 0xE7D3FBC8, 0x21E1CDE6, 0xC33707D6, 0xF4D50D87, 0x455A14ED, 0xA9E3E905, 0xFCEFA3F8, 0x676F02D9, 0x8D2A4C8A],
[0xFFFA3942, 0x8771F681, 0x6D9D6122, 0xFDE5380C, 0xA4BEEA44, 0x4BDECFA9, 0xF6BB4B60, 0xBEBFBC70, 0x289B7EC6, 0xEAA127FA, 0xD4EF3085, 0x04881D05, 0xD9D4D039, 0xE6DB99E5, 0x1FA27CF8, 0xC4AC5665],
[0xF4292244, 0x432AFF97, 0xAB9423A7, 0xFC93A039, 0x655B59C3, 0x8F0CCC92, 0xFFEFF47D, 0x85845DD1, 0x6FA87E4F, 0xFE2CE6E0, 0xA3014314, 0x4E0811A1, 0xF7537E82, 0xBD3AF235, 0x2AD7D2BB, 0xEB86D391]]

ROLStep = [[7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22],
[5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20],
[4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23],
[6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21]]

# Logic Functions
f = lambda b, c, d : (b & c) | ((~b) & d)
g = lambda b, c, d : (b & d) | (c & (~d))
h = lambda b, c, d : b ^ c ^ d
i = lambda b, c, d : c ^ (b | (~d))
rho_2 = lambda i : (1 + 5 * (i)) % 16
rho_3 = lambda i : (5 + 3 * (i)) % 16
rho_4 = lambda i : (7 * (i)) % 16

def ROLs(x, y):
x, y = int(x), int(y)
mask1 = (1 << y) - 1
return ((x >> (32 - y)) & mask1) | ((x << y) & ~mask1)

# Compress Hash Functions
def FF(a, b, c, d, M, s):
# print("%x"%a, "%x"%b, "%x"%c, "%x"%d, "%x"%M, s, "%x"%((f(b, c, d)) % (2 ** 32)))
a = (b + ROLs((a + f(b, c, d) + M + T[0][s]) % (2 ** 32), ROLStep[0][s])) % (2 ** 32)
return d, a, b, c

def GG(a, b, c, d, M, s):
# print("%x"%a, "%x"%b, "%x"%c, "%x"%d, "%x"%M, s, "%x"%((g(b, c, d)) % (2 ** 32)))
a = (b + ROLs((a + g(b, c, d) + M + T[1][s]) % (2 ** 32), ROLStep[1][s])) % (2 ** 32)
return d, a, b, c

def HH(a, b, c, d, M, s):
# print("%x"%a, "%x"%b, "%x"%c, "%x"%d, "%x"%M, s, "%x"%((h(b, c, d)) % (2 ** 32)))
a = (b + ROLs((a + h(b, c, d) + M + T[2][s]) % (2 ** 32), ROLStep[2][s])) % (2 ** 32)
return d, a, b, c

def II(a, b, c, d, M, s):
# print("%x"%a, "%x"%b, "%x"%c, "%x"%d, "%x"%M, s, "%x"%((i(b, c, d)) % (2 ** 32)))
a = (b + ROLs((a + i(b, c, d) + M + T[3][s]) % (2 ** 32), ROLStep[3][s])) % (2 ** 32)
return d, a, b, c

# Read 32bit Litte Endian From y to x
def load32L(y):
# print(y)
x = (((ord(y[3]) & 255) << 24) + ((ord(y[2]) & 255) << 16) + ((ord(y[1]) & 255) << 8) + (ord(y[0]) & 255))
return x

# Store 32bit Little Endian From x to y
def store32L(x):
y = [""] * 4
y[0] = chr(x & 255)
y[1] = chr((x>>8) & 255)
y[2] = chr((x>>16) & 255)
y[3] = chr((x>>24) & 255)
return "".join(y)

def gen64LS(x):
x, y = int(x), ""
y += chr(x & 255)
y += chr((x>>8) & 255)
y += chr((x>>16) & 255)
y += chr((x>>24) & 255)
y += chr((x>>32) & 255)
y += chr((x>>40) & 255)
y += chr((x>>48) & 255)
y += chr((x>>56) & 255)
return y

def compressMD5(msg, a, b, c, d):
for i in range(int(len(msg) / 64)):
aa, bb, cc, dd = a, b, c, d
s = msg[64 * i : 64 * (i + 1)]
w = [0] * 16
for j in range(16):
w[j] = load32L(s[4 * j: 4 * (j + 1)])
for j in range(16):
# print("%x"%a, "%x"%b, "%x"%c, "%x"%d, "%x"%w[j])
a, b, c, d = FF(a, b, c, d, w[j], j)
for j in range(16):
# print("%x"%a, "%x"%b, "%x"%c, "%x"%d, "%x"%w[rho_2(j)])
a, b, c, d = GG(a, b, c, d, w[rho_2(j)], j)
for j in range(16):
# print("%x"%a, "%x"%b, "%x"%c, "%x"%d, "%x"%w[rho_3(j)])
a, b, c, d = HH(a, b, c, d, w[rho_3(j)], j)
for j in range(16):
# print("%x"%a, "%x"%b, "%x"%c, "%x"%d, "%x"%w[rho_4(j)])
a, b, c, d = II(a, b, c, d, w[rho_4(j)], j)

# a, b, c, d = (a + aa) & 0xFFFFFFFF, (b + bb) & 0xFFFFFFFF, (c + cc) & 0xFFFFFFFF, (d + dd) & 0xFFFFFFFF
a, b, c, d = a + aa, b + bb, c + cc, d + dd
return a, b, c, d

def hexDigest(hexNum):
s = ""
s += "%0.2x"%(hexNum & 255)
s += "%0.2x"%((hexNum >> 8) & 255)
s += "%0.2x"%((hexNum >> 16) & 255)
s += "%0.2x"%((hexNum >> 24) & 255)
return s

def hashMD5(msg):
a, b, c, d = 0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476
msg = str(msg)
strlen, endLen = len(msg), len(msg) % 64
segments = []
msg += chr(0x80)
endLen += 1
fill = 0

while endLen > 56:
while endLen < 64:
msg += chr(fill)
endLen = (endLen + 1) % 64
# print("endLen =", endLen, "filled =", chr(fill), "Msg =", msg)
endLen = 0

while endLen < 56:
msg += chr(fill)
endLen = (endLen + 1) % 64

msg = msg + gen64LS(strlen * 8)
a, b, c, d = compressMD5(msg, a, b, c, d)
# print(hex(a), hex(b), hex(c), hex(d))

output = hexDigest(a) + hexDigest(b) + hexDigest(c) + hexDigest(d)

return output

# Main Program
if __name__ == "__main__":
message = "123456"

print("Original:", message)
print("My MD5 Hash:", hashMD5(message))
# print("HashLib MD5 Hash:", hashlib.md5(message.encode("UTF-8")).hexdigest())

版本2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import struct
import math

# 左旋转
def left_rotate(x, c):
return ((x << c) | (x >> (32 - c))) & 0xFFFFFFFF

# 定义四个非线性函数
def F(x, y, z): return (x & y) | (~x & z)
def G(x, y, z): return (x & z) | (y & ~z)
def H(x, y, z): return x ^ y ^ z
def I(x, y, z): return y ^ (x | ~z)

# 初始化向量
A0 = 0x67452301
B0 = 0xefcdab89
C0 = 0x98badcfe
D0 = 0x10325476

# 常量表 T
T = [int(2**32 * abs(math.sin(i + 1))) & 0xFFFFFFFF for i in range(64)]

# 每轮的位移数
S = [ # 4轮,每轮16步
[7, 12, 17, 22]*4,
[5, 9, 14, 20]*4,
[4, 11, 16, 23]*4,
[6, 10, 15, 21]*4
]

# 主函数
def md5(message):
message = bytearray(message) # 拷贝为可变数组
orig_len_in_bits = (8 * len(message)) & 0xffffffffffffffff

# 添加 0x80 和填充 0
message.append(0x80)
while (len(message) * 8) % 512 != 448:
message.append(0)

# 添加原始长度(64位)
message += struct.pack('<Q', orig_len_in_bits)

# 初始化缓冲区
A, B, C, D = A0, B0, C0, D0

# 分块处理
for chunk_offset in range(0, len(message), 64):
chunk = message[chunk_offset:chunk_offset + 64]
M = list(struct.unpack('<16I', chunk))

a, b, c, d = A, B, C, D

for i in range(64):
if 0 <= i <= 15:
f = F(b, c, d)
g = i
s = S[0][i]
elif 16 <= i <= 31:
f = G(b, c, d)
g = (5*i + 1) % 16
s = S[1][i - 16]
elif 32 <= i <= 47:
f = H(b, c, d)
g = (3*i + 5) % 16
s = S[2][i - 32]
else:
f = I(b, c, d)
g = (7*i) % 16
s = S[3][i - 48]

temp = (a + f + T[i] + M[g]) & 0xFFFFFFFF
a, d, c, b = d, c, b, (b + left_rotate(temp, s)) & 0xFFFFFFFF

# 将结果加回缓冲区
A = (A + a) & 0xFFFFFFFF
B = (B + b) & 0xFFFFFFFF
C = (C + c) & 0xFFFFFFFF
D = (D + d) & 0xFFFFFFFF

# 输出最终结果(小端序拼接)
return ''.join(f'{word:02x}' for word in struct.pack('<4I', A, B, C, D))

# 示例:
if __name__ == "__main__":
print(md5(b'Bileton')) # 应输出: 900150983cd24fb0d6963f7d28e17f72