Algorithms Design and its Applications Algorithms with numbers Back Ground • Number theory was once viewed as a beautiful but largely useless subject in pure mathematics. • Today number- theoretic algorithms are used widely , due in part to the invention of cryptographic schemes based on large prime numbers. • The feasibility of these schemes rests on our ability to find large primes easily, while their security rests on our inability to factor the product of large primes . Size of Cost • In this lecture, a "large input" typically means an input containing "large integers" rather than an input containing "many integers". • We measure the size of an input in terms of the number of bits required to represent that input. • An algorithm with integer inputs a1, a2, ..., ak is a polynomial - time algorithm if it runs in time polynomial in lg a1, lg a2, ..., lg ak, that is, polynomial in the lengths of its binary- encoded inputs. Basic arithmetic Bases and logs • How many digits are needed to represent the number N>=0 in base b? • How much does the size of a number change when we change the base? Addition • The sum of any three single-digit numbers (with base b>=2) is at most two digits long. • Given two binary number x and y, how long does our algorithm take to add them? O(n), n is the number of bits of x and y. Multiplication • 13 * 11 Multiplication • 乘法的时间复杂度是多少呢? • 对于长度为n的乘数来说,将产生n个中间结果, 而对这些中间结果进行相加的次数是n-1次,从而 时间复杂度为O(n(n-1))=O(n2) Multiplication • another way to multiply Multiplication Multiplication • Al Khwarizmi乘法算法的时间复杂度? • 由于乘数每次都被取半,对于二进制来说,取半 意味着去掉最右边一位数,而乘数的长度为n,因 此该算法在递归n次后终结。每次递归需要进行一 次长度为n位的加法运算,其时间复杂度为O(n)。 因此Al Khwarizmi乘法算法的时间复杂度为O(n2) 。 Division • To divide an integer x by another integer y≠0 means to find a quotient q and a remainder r, where x = yq+r and r <y. Modular arithmetic Modular Arithmetic Basic Modular Arithmetic Basic Modular Addition • x+y mod N • Since x and y are in the range 0 to N-1, their sum is between 0 and 2(N-1). If the sum exceeds N-1, subtract off N. • The overall computation consists of an addition, and possibly a subtraction, of numbers that never exceed 2N. the running time is O(n), where n = log N. Modular multiplication • xy mod N • start with regular multiplication, then reduce the answer modulo N. The product can be as large as (N1)2, at most 2n bits long since log(n-1)2 = 2log(N1)≤2n. • The running time is O(n2). Modular exponentiation • 在密码学中,常需计算xy mod N. 这个的x,y和N均 为几百位长的整数。如何快速计算? • 直接算xy,运算结果很大!即便x和y只有20位长 , xy也要大概1千万位长。 • 为保证中间运算结果不要太大,每步运算都模N. Modular exponentiation • Simple idea: repeatedly multiplying by x modulo N. problem: if y is 500 bits long, we need to perform y -1 ≈ 2500 multiplications! Modular exponentiation • better idea: starting with x and squaring repeatedly modulo N, we get we need to perform log y multiplications, ach takes O(log2N) to compute. • To determine xy mod N, we simply multiply together and appropriate subset of these powers, those corresponding to 1’s in the binary representation of y. • A polynomial-time algorithm is within reach! Modular exponentiation sicily 1294. 高级机密 • 信息加密。 • 目前比较流行的编码规则称为RSA,是由美国麻 省理工学院的三位教授发明的。这种编码规则是 基于一种求密取模算法的:对于给出的三个正整 数a,b,c,计算a的b次方除以c的余数。 • 题目要求:计算 ab mod c sicily 1294. 高级机密 问题分析 • 不好的算法: – 先求出a的b次方,再模c。但题目给出的a,b,c的 范围比较大,要算出ab要用到高精度乘法,然 后模c还要用到高精度除法; • 较好的算法: – 利用同余的性质, xy mod c = x * (y mod c) mod c sicily 1294. 高级机密 代码 d = 1; for (i = 1; i <= b; ++i) { d = d * a % c; } cout << d; Euclid’s algorithm for greatest common divisor • Euclid’s rule If x and y are positive integer with x≥y, then gcd(x,y)= gcd(x mod y, y) • Proof. 因为 gcd(x,y)能整除x和y,因此整除x-y,即是x-y 的因子,因此gcd(x,y)≤ gcd(x-y, y). 而反过来推,同理可得gcd(x-y, y) ≤ gcd(x,y)。 故gcd(x,y)=gcd(x-y, y)。由此显然可得结论。 Euclid’s algorithm for greatest common divisor Euclid’s algorithm for greatest common divisor • This means that after any two consecutive round, both a and b, are at the very least halved in value – the length of each decreases by at least one bit. If they are initially n-bits integers, then the base case will be reached within 2n recursive calls. And since each call involves a quadratic-time division, the total time is O(n3) An extension of Euclid algorithm 只要找到两个整数x和y,使得ax+by=d,且d是a和b 的因子,则d就是a和b的最大公因子;如果d是a和b 的最大公因子,则d一定可以表示为ax+by形式。 只要对欧几里得算法稍加扩展,即可找到所需的系 数x和y。 An extension of Euclid algorithm An extension of Euclid algorithm • Lemma For any positive integers a and b, the extended Euclid algorithm returns integers x, y, and d such that gcd(a,b) = d = ax+by • Proof. 对b做归纳假设。当b=0, 验证可知算法正确。 算法调用gcd(b,a mod b)来计算gcd(a,b)。 由于a mod b < b, 由归纳假设知返回结果是正确的. Modular division 减法求最大公约数 于大整数而言,取模运算(其中用到除法)是非常昂贵的 开销,将成为整个算法的瓶颈。有没有办法能够不用取模 运算呢? 如果一个数能够同时整除x和y,则必能同时整除x-y和y; 而能够同时整x-y和y的数也必能同时整除x和y,即x和y的 公约数与x-y和y的公约数是相同的,其最大公约数也是相 同的,即f(x, y)= f(x-y, y),那么就可以不再需要进 行大整数的取模运算,而转换成简单得多的大整数的减法。 实例:f(42, 30)=f(30, 12)=f(12, 18)= f(18, 12) = f(12, 6)= f(6, 6)= f(6, 0)= 6 不足之处。最大的瓶颈就是迭代的次数比之前的算法多了 不少,如果遇到(10 000 000 000 000, 1) 减法求最大公约数 代码 BigInt gcd(BigInt x, BigInt y) { if(x < y) return gcd(y, x); if(y == 0) return x; else return gcd(x - y, y); } 求最大公约数算法三 算法一(欧几里得算法)的问题在于计算复杂的大整数除法 运算,而算法二虽然将大整数的除法运算转换成了减法运 算,降低了计算的复杂度,但它的问题在于减法的迭代次 数太多,如果遇到(10 000 000 000 000, 1)的情况就很 糟糕。 能否结合算法一和算法二从而使其成为一个最佳的算法呢? 求最大公约数算法三 记x和y的最大公约数为f(x, y)。 若x, y均为偶数,f(x, y)= 2 * f(x/2, y/2)= 2 * f (x>>1, y>>1) 若x为偶数,y为奇数,f(x, y)= f(x/2, y)= f(x>>1, y) 若x为奇数,y为偶数,f(x, y)= f(x, y/2)= f(x, y>>1) 若x, y均为奇数,f(x, y)= f(x, x - y), 那么在f(x, y)= f(x, x - y)之后,(x - y)是一个偶 数,下一步一定会有除以2的操作。 最坏情况下的时间复杂度是O(log2(max(x, y))。 求最大公约数算法三 示例: f(42, 30)= f(1010102, 111102) = 2 * f(101012, 11112) = 2 * f(11112, 1102) = 2 * f(11112, 112) = 2 * f(11002, 112) = 2 * f(112, 112) = 2 * f(02, 112) = 2 * 112 =6 求最大公约数算法三 BigInt gcd(BigInt x, BigInt y) { if(x < y) return gcd(y, x); if(y == 0) return x; else { if(IsEven(x)){ if(IsEven(y)) return (gcd(x >> 1, y >> 1) << 1); else return gcd(x >> 1, y); } else { if(IsEven(y)) return gcd(x, y >> 1); else return gcd(y, x - y); } } } 同余 • 同余 – 设m是正整数,a,b是整数,如果m|(a-b),则称a和b关于模m同余, 记作a≡b(mod m)或者说,如果a,b除以m的余数相等,则称a和b关 于模m同余 • 同余的性质 – – – – a≡a(mod m) 如果a≡b(mod m),则b≡a(mod m) 如果a≡b(mod m)且b≡c(mod m), a≡c(mod m) 如果a≡b(mod m)且c≡d(mod m),则a±c≡b± d(mod m), ac≡bd(mod m) 同余 • 同余的性质(cont.) – – – – – 如果a≡b(mod m),则an≡bn(mod m),n∈N 如果ac≡bc(mod m),则a≡b(mod (m/gcd(c,m)) 如果a≡b(mod m)且d|m,则a≡b(mod d) 如果a≡b(mod m),则ad≡bd(mod m) 如果a≡b(mod mi),i=1,2,…,n,l=lcm(m1,m2,…,mn),则 a≡b(mod l) – 如果p为素数,则ap ≡ a(mod p);如果gcd(a,p)=1,则ap-1 ≡ 1(mod p) Primality Testing 筛法求素数 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 筛法求素数 代码 const int MAX = 10000; bool prime[MAX + 1]; void searchprime() { memset(prime, true, sizeof(prime)); prime[1] = false; for (int i = 2; i*i<= MAX; ++i) { if (prime[i]) { int j = i * 2; while (j <= MAX) { prime[j] = false; j += i; } } } } 代码(筛法求素数) for (int i = 2; i <= (int) floor(sqrt(MAX)); ++i) { if (prime[i]) { int j = i * 2; while (j <= MAX) { prime[j] = false; j += i; } } } } Fermat’s little theorem Fermat’s little theorem • Proof. Algorithm for testing primality Algorithm for testing primality Algorithm for testing primality An algorithm for testing primality, with low error probability Carmichael numbers • 561 = 3*11*17, not a prime. • fool the Fermat test, because a560 ≡1 (mod 561) for all values of a relatively prime to 561. • Rabin and Miller algorithm. Generating random primes Generating random primes • Q: what is the probability that the output of the algorithm is really prime? • A: suppose we perform the test with base a=2 for all numbers N≤25∙109. RSA • RSA基本原理 • 选定一个数N,再选择一个N到N的双射函数f作为 加密密钥(公钥),该函数的逆函数作为解密密 钥(密钥)。f的选定必须使得其逆函数无法从f 推出。 RSA RSA