Number Theory and Public Key Cryptography
Introduction to Number Theory
Modular Arithmetic
 modular arithmetic is 'clock arithmetic'
 a congruence a = b mod n says when divided by n
that a and b have the same remainder
 100 = 34 mod 11
 usually have 0<=b<=n1
 12mod7 = 5mod7 = 2mod7 = 9mod7 _{}
 _{} b is called the residue of a mod n
 can do arithmetic with integers modulo n with all results between 0 and n
 addition
 a+b mod n
 subtraction
 ab mod n = a+(b) mod n
 multiplication
 a.b mod n
 derived from repeated addition
 can get a.b=0 where neither a,b=0
 division
 a/b mod n
 is multiplication by inverse of b: a/b = a.b^{1} mod n
 if n is prime b^{1} mod n exists s.t
b.b^{1} = 1 mod n
 eg 2.3=1 mod 5 hence 4/2=4.3=2 mod 5
 integers modulo n with addition and multiplication form a commutative ring
with the laws of
 Associativity
 (a+b)+c = a+(b+c) mod n
 Commutativity
 a+b = b+a mod n
 Distributivity
 (a+b).c = (a.c)+(b.c) mod n _{}
 also can chose whether to do an operation and then reduce modulo n, or
reduce then do the operation, since reduction is a homomorphism from the ring
of integers to the ring of integers modulo n
 a+/b mod n = [a mod n +/ b mod n] mod n
 (the above laws also hold for multiplication)
 if n is constrained to be a prime number p then this forms a
Galois Field modulo p denoted GF(p) and all the normal laws
associated with integer arithmetic work
Exponentiation in GF(p)
 many encryption algorithms use exponentiation  raising a number
a (base) to some power b (exponent) mod p
 exponentiation is basically repeated multiplication, which take s
O(n) multiples for a number n
 a better method is the square and multiply algorithm[1]
let base = a, result =1
for each bit ei (LSB to MSB) of exponent
if ei=0 then
square base mod p
if ei=1 then
multiply result by base mod p
square base mod p (except for MSB)
required ae is result
 only takes O(log_{2} n) multiples for a number
n
see Sebbery p9 Fig2.1 + example
Discrete Logarithms in GF(p)
 the inverse problem to exponentiation is that of finding the discrete
logarithm of a number modulo p
 find x where a^{x} = b mod p
Seberry examples p10
 whilst exponentiation is relatively easy, finding discrete logarithms is
generally a hard problem, with no easy way
 in this problem, we can show that if p is prime, then there
always exists an a such that there is always a discrete logarithm for
any b!=0
 successive powers of a "generate" the group mod p
 such an a is called a primitive root and these are also
relatively hard to find
Greatest Common Divisor
 the greatest common divisor (a,b) of a and b
is the largest number that divides evenly into both a and b
 Euclid's Algorithm is used to find the Greatest Common Divisor
(GCD) of two numbers a and n, a<n
 use fact if a and b have divisor d so does
ab, a2b
GCD (a,n) is given by:
let g0=n
g1=a
gi+1 = gi1 mod gi
when gi=0 then (a,n) = gi1
eg
find (56,98)
g0=98
g1=56
g2 = 98 mod 56 = 42
g3 = 56 mod 42 = 14
g4 = 42 mod 14 = 0
hence
(56,98)=14
see Seberry Fig 2.2 p11 GCD alg [2]
Inverses and Euclid's Extended GCD Routine
 unlike normal integer arithmetic, sometimes a number in modular arithmetic
has a unique inverse
 a^{1} is inverse of a mod n if
a.a^{1} = 1 mod n
 where a,x in {0,n1}
 eg 3.7 = 1 mod 10
 if (a,n)=1 then the inverse always exists
 can extend Euclid's Algorithm to find Inverse by keeping
track of g_{i }= u_{i}.n + v_{i}.a
 Extended Euclid's (or Binary GCD) Algorithm to find
Inverse of a number a mod n (where (a,n)=1) is:
Inverse(a,n) is given by:
g0=n u0=1 v0=0
g1=a u1=0 v1=1
let
y = gi1 div gi
gi+1 = gi1  y.gi = gi1 mod gi
ui+1 = ui1  y.ui
vi+1 = vi1  y.vi
when gi=0 then Inverse(a,n) = vi1
Example
eg: want to find Inverse(3,460):
i y g u v
0  460 1 0
1  3 0 1
2 153 1 1 153
3 3 0 3 460
hence Inverse(3,460) = 153 = 307 mod 460
see Seberry Fig 2.3 p14 Inverse alg [3]
Euler Totient Function [[phi]](n)
 if consider arithmetic modulo n, then a reduced set of residues is
a subset of the complete set of residues modulo n which are relatively prime to
n
 eg for n=10,
 the complete set of residues is {0,1,2,3,4,5,6,7,8,9}
 the reduced set of residues is {1,3,7,9}
 the number of elements in the reduced set of residues is called the
Euler Totient function [[phi]](n)
 there is no single formula for [[phi]](n) but for various cases count how
many elements are excluded[4]:
p (p prime) [[phi]](p) =p1
pr (p prime) [[phi]](p) =pr1(p1)
p.q (p,q prime) [[phi]](p.q) =(p1)(q1)
see Seberry Table 2.1 p13
 several important results based on [[phi]](n) are:
 Theorem (Euler's Generalization)
 let gcd(a,n)=1 then
 a^{[[phi]](n)} mod n = 1
 Fermat's Theorem
 let p be a prime and gcd(a,p)=1 then
 a^{p1} mod p = 1
 Algorithms to find Inverses a^{1} mod n
 search 1,...,n1 until an a^{1}_{ }is found with
a.a^{1} mod n
 if [[phi]](n) is known, then from Euler's Generalization
 a^{1} = a^{[[phi]](n)1} mod n
 otherwise use Extended Euclid's algorithm for inverse
Computing with Polynomials in GF(qn)
 have seen arithmetic modulo a prime number GF(p)
 also can do arithmetic modulo q over polynomials of degree
n, which also form a Galois Field GF(q^{n})
 its elements are polynomials of degree (n1) or lower

a(x)=a_{n1}x^{n1}+a_{n2}x^{n2}+...+a_{1}x+a_{0}
 have residues for polynomials just as for integers
 p(x)=q(x)d(x)+r(x)
 and this is unique if deg[r(x)]<deg[d(x)]
 if r(x)=0, then d(x) divides p(x), or is a factor of p(x)
 addition in GF(q^{n}) just involves summing equivalent terms in
the polynomial modulo q (XOR if q=2)

a(x)+b(x)=(a_{n1}+b_{n1})x^{n1}+...+(a_{1}+b_{1})x+(a_{0}+b_{0})
Multiplication with Polynomials in GF(qn)
 multiplication in GF(q^{n}) involves [5]
 multiplying the two polynomials together (cf longhand multiplication; here
use shifts & XORs if q=2)
 then finding the residue modulo a given irreducible polynomial of
degree n
 an irreducible polynomial d(x) is a 'prime' polynomial, it has no
polynomial divisors other than itself and 1
 modulo reduction of p(x) consists of finding some r(x) st:
p(x)=q(x)d(x)+r(x)
 nb. in GF(2^{n}) with d(x)=x^{3}+x+1 can do
simply by replacing x^{3} with x+1
 eg in GF(2^{3}) there are 8 elements:
 0, 1, x, x+1, x^{2}, x^{2}+1, x^{2}+x,
x^{2}+x+1
 with irreducible polynomial d(x)=x^{3}+x+1* arithmetic
in this field can be summarised as:
Seberry Table 2.3 p20
 can adapt GCD, Inverse, and CRT algorithms for GF(q^{n})
 [[phi]](p(x)) = 2^{n}1 since every poly except 0 is
relatively prime to p(x)
 arithmetic in GF(q^{n}) can be much faster than integer
arithmetic, especially if the irreducible polynomial is carefully chosen
 eg a fast implementation of GF(2^{127}) exists
 has both advantages and disadvantages for cryptography, calculations are
faster, as are methods for breaking

Complexity Theory  A Potted Intro
 Complexity theory  study of how hard a problem is to solve in general
 allows classification of types of problems
 some problems intrinsically harder than others, eg
 multiplying numbers O(n^(2))
 multiplying matrices O(n^(2)(2n1))
 solving crossword O(26^(n))
 recognizing primes O(n^(log log n))
 deal with worst case complexity
Complexity Theory  Some Terminolgy
 an instance of a problem is a particular case of a general problem
 the input length of a problem is the number n of symbols
used to characterize a particular instance of it
 the order of a function f(n)  is some O(g(n)) of
some function g(n) s.t.
 f(n)<=c.g(n), for all n>=0, for some c
 a polynomial time algorithm (P) is one which solves any
instance of a particular problem in a length of time O(p(n)), where p is some
polynomial on input length
 an exponential time algorithm (E)  is one whose solution
time is not so bounded
 a nondeterministic polynomial time algorithm (NP)  is one
for which any guess at the solution of an instance of the problem may be
checked for validity in polynomial time
 NPcomplete problems  are a subclass of NP problems for which it
is known that if any such problem has a polynomial time solution, then all
NP problems have polynomial solutions. These are thus the hardest NP
problems
 CoNP problems  are the complements of NP problems, to
prove a guess at a solution of a CoNP problem may well require an exhaustive
search of the solution space
PublicKey Ciphers
 traditional secret key cryptography uses a single key shared by
both sender and receiver
 if this key is disclosed communications are compromised
 also does not protect sender from receiver forging a message &
claiming is sent by sender, parties are equal
 publickey (or twokey) cryptography involves the use
of two keys:
 a publickey, which may be known by anybody, and can be used to
encrypt messages, and verify signatures
 a privatekey, known only to the recipient, used to decrypt
messages, and sign (create) signatures
 the publickey is easily computed from the private key and other
information about the cipher (a polynomial time (Ptime) problem)
 however, knowing the publickey and public description of the cipher, it
is still computationally infeasible to compute the private key (an NPtime
problem)
 thus the publickey may be distributed to anyone wishing to communicate
securly with its owner (although secure distribution of the publickey is a
nontrivial problem  the key distribution problem)
 have three important classes of publickey algorithms:
 PublicKey Distribution Schemes (PKDS)  where the scheme is used
to securly exchange a single piece of information (whose value depends on the
two parties, but cannot be set).
 This value is normally used as a session key for a privatekey scheme
 Signature Schemes  used to create a digital signature only, where
the privatekey signs (create) signatures, and the publickey verifies
signatures
 Public Key Schemes (PKS)  used for encryption, where the
publickey encrypts messages, and the privatekey decrypts messages.
 Any publickey scheme can be used as a PKDS, just by selecting a message
which is the required session key
 Many publickey schemes are also signature schemes (provided encryption
& decryption can be done in either order)
Digital Signatures and Signature Schemes
 where the privatekey signs (create) signatures, and the publickey
verifies signatures
 only the owner can create the digital signature, hence it can be used to
verify who created a message
 generally don't sign the whole message (doubling the size of information
exchanged), but just a digest or hash of the message,
 a hash function takes the message, and produces a fixed size
(typically 64 to 512 bits) value dependent on the message
 it must be hard to create another message with the same hash value
(otherwise some forgeries are possible)
 developing good hash functions is another nontrivial problem
DiffieHellman PublicKey Distribution Scheme
 first publickey type scheme proposed was a PKDS by Diffie & Hellman
in 1976:
W Diffie, M E Hellman, "New directions in Cryptography", IEEE Trans.
Information Theory, IT22, pp644654, Nov 1976
 an excellent overview of cryptography at this time is:
W Diffie, M E Hellman, "Privacy and Authentication: An Introduction to
Cryptography", Proc. of the IEEE, Vol 67 No 3, pp 397427, Mar 1979
 it is a publickey distribution scheme
 it cannot be used to exchange an arbitrary message
 only a key, whose value depends on the participants (and their private and
public key information)
 is based on exponentiation in a finite (Galois) field, either over
integers modulo a prime, or a polynomial field
 nb exponentiation takes O((log n)^{3}) operations
 its security relies on the difficulty of computing logarithms in these
fields
 nb discrete logarithms takes O(e^{ log n log log n}) operations
 DiffieHellman PKDS works as follows:
 two people A & B, who wish to exchange some key over an insecure
communications channel can:
 select a large prime p (~200 digit), and
 [[alpha]] a primitive element mod p
 A has a secret number x_{A}
 B has a secret number x_{B}
 A and B compute y_{A }and y_{B }respectively, which
are then made public
 y_{A} = [[alpha]]^{xA} mod p
y_{B} = [[alpha]]^{xB} mod p
 the key is then calculated as
 K_{AB} = [[alpha]]^{xA.xB} mod p
 = y_{A}^{xB} mod p (which
B can compute)
 = y_{B}^{xA} mod p (which
A can compute)
 and may then be used in a privatekey cipher to secure communications
between A and B
 nb: if the same two people subsequently wish to communicate, they will
have the same key as before, unless they change their publickey
(usually not often)
RSA PublicKey Cryptosystem
 best known and widely regarded as most practical publickey scheme was
proposed by Rivest, Shamir & Adleman in 1977:
R L Rivest, A Shamir, L Adleman, "On Digital Signatures and Public Key
Cryptosystems", Communications of the ACM, vol 21 no 2, pp120126, Feb 1978
 it is a publickey scheme which may be used for encrypting messages,
exchanging keys, and creating digital signatures
 is based on exponentiation in a finite (Galois) field over integers modulo
a prime
 nb exponentiation takes O((log n)^{3}) operations
 its security relies on the difficulty of calculating factors of large
numbers
 nb factorization takes O(e^{ log n log log n}) operations
 (same as for discrete logarithms)
 the algorithm is patented in North America (although algorithms cannot be
patented elsewhere in the world)
 this is a source of legal difficulties in using the scheme
 RSA is a public key encryption algorithm based on exponentiation using
modular arithmetic
 to use the scheme, first generate keys:
 KeyGeneration by each user consists of:
 selecting two large primes at random (~100 digit), p, q
 calculating the system modulus R=p.q p, q primes
 selecting at random the encryption key e,
 e < R, gcd(e, F(R)) = 1
 solving the congruence to find the decryption key d,
 e.d [[equivalence]] 1 mod [[phi]](R) 0
<= d <= R
 publishing the public encryption key: K1={e,R}
 securing the private decryption key: K2={d,p,q}
 Encryption of a message M to obtain ciphertext C is:
 C = M^{e }mod R 0 <= d <=
R
 Decryption of a ciphertext C to recover the message M is:
 M = C^{d} = M^{e.d} =
M^{1+n.}^{[[phi]](R)} = M mod R
 the RSA system is based on the following result:
_{ }if R = pq where p, q are distinct large primes then
X [[phi]](R) = 1 mod R
for all x not divisible by p or q
and [[Phi]](R) = (p1)(q1)
RSA Example
 usually the encryption key e is a small number, which must be relatively
prime to [[phi]](R) (ie GCD(e, [[phi]](R)) = 1)
 typically e may be the same for all users (provided certain precautions
are taken), 3 is suggested
 the decryption key d is found by solving the congruence:_{ }
e.d [[equivalence]] 1 mod [[phi]](R), 0 <= d <= R,
 an extended Euclid's GCD or Binary GCD calculation is done to do this.
given e=3, R=11*47=517, [[phi]](R)=10*46=460
then d=Inverse(3,460) by Euclid's alg:
i y g u v
0  460 1 0
1  3 0 1
2 153 1 1 153
3 3 0 3 460
ie: d = 153, or 307 mod 517
 a sample RSA encryption/decryption calculation is:_{ }
M = 26
C = 263 mod 517 = 515
M = 515307 mod 517 = 26
Security of RSA
 The security of the RSA scheme rests on the difficulty of factoring the
modulus of the scheme R
 best known factorization algorithm (BrentPollard) takes:
operations on number R whose largest prime factor is p
Decimal Digits in R #Bit Operations to Factor R
20 7200
40 3.11e+06
60 4.63e+08
80 3.72e+10
100 1.97e+12
120 7.69e+13
140 2.35e+15
160 5.92e+16
180 1.26e+18
200 2.36e+19
 This leads to R having a length of 200 digits (or 600 bits) given that
modern computers perform 1100 MIPS the above can be divided by 10^{6}
to get a time in seconds
 nb: currently 1e+14 operations is regarded as a limit for computational
feasability and there are 3e+13 usec/year
 but most (all!!) computers can't directly handle numbers larger than
32bits (64bits on the very newest)
 hence need to use multiple precision arithmetic libraries to handle
numbers this large
MultiPrecision Arithmetic
 involves libraries of functions that work on multiword (multiple
precision) numbers
 classic references are in Knuth vol 2  "Seminumerical Algorithms"
 multiplication digit by digit
 do exponentiation using square and multiply[6]
 are a number of well known multiple precision libraries available  so
don't reinvent the wheel!!!!
 can use special tricks when doing modulo arithmetic, especially with the
modulo reductions
Faster Modulo Reduction
* Chivers (1984) noted a fast way of performing modulo reductions whilst doing
multiprecision arithmetic calcs
Given an integer A of n characters (a_{0}, ... , a_{n1}) of
base b
then
ie: this implies that the MSD of a number can be removed and its remainder mod
m added to the remaining digits will result in a number that is congruent mod m
to the original.
* Chivers algorithm for reducing a number is thus:
1. Construct an array R = (b^{d}, 2.b^{d}, ... ,
(b1).b^{d})(mod m)
2. FOR i = n1 to d do
WHILE A[i] != 0 do
j = A[i];
A[i] = 0;
A = A + b^{id}.R[j];
END WHILE
END FOR
where A[i] is the i^{th} character of number A
R[j] is the j^{th} integer residue from the array R
n is the number of symbols in A
d is the number of symbols in the modulus
Speeding up RSA  Alternate Multiplication Techniques
 conventional multiplication takes O(n^{2}) bit operations, faster
techniques include:
 the SchonhageStrassen Integer Multiplication Algorithm:
 breaks each integer into blocks, and uses them as coefficients of a
polynomial
 evaluates these polynomials at suitable points, & multiplies the
resultant values
 interpolates these values to form the coefficients of the product
polynomial
 combines the coefficients to form the product of the original integer
 the Discrete Fourier Transform, and the Convolution Theorem are used to
speed up the interpolation stage
 can multiply in O(n log n) bit operations
 the use of specialized hardware because:
 conventional arithmetic units don't scale up, due to carry propogation
delays
 so can use serialparallel carrysave, or delayed carrysave techniques
with O(n) gates to multiply in O(n) bit operations,
 or can use parallelparallel techniques with O(n^{2}) gates to
multiply in O(log n) bit operations
RSA and the Chinese Remainder Theorem
 a significant improvement in decryption speed for RSA can be obtained by
using the Chinese Remainder theorem to work modulo p and q respectively
 since p,q are only half the size of R=p.q and thus the arithmetic is much
faster
 CRT is used in RSA by creating two equations from the decryption
calculation:_{ }
M = Cd mod R
as
follows:
M1 = M mod p = (C mod p)d mod (p1)
M2 = M mod q = (C mod q)d mod (q1)
then
the pair of equations
M = M1 mod p M = M2 mod q
has
a unique solution by the CRT, given by:
M = [((M2 +q  M1)u mod q] p + M1
where
p.u mod q = 1
Primality Testing and RSA
 The first stage of keygeneration for RSA involves finding two large
primes p, q
 Because of the size of numbers used, must find primes by trial and error
 Modern primality tests utilize properties of primes eg:_{ }
 a^{n1} = 1 mod n where GCD(a,n)=1
 all primes numbers 'n' will satisfy this equation
 some composite numbers will also satisfy the equation, and are called
pseudoprimes.
 Most modern tests guess at a prime number 'n', then take a large number
(eg 100) of numbers 'a', and apply this test to each. If it fails the number is
composite, otherwise it is is probably prime.
 There are a number of stronger tests which will accept fewer composites as
prime than the above test. eg:
RSA Implementation in Practice
 Software implementations_{ }
 generally perform at 110 bits/second on block sizes of 256512 bits_{
}
 _{} two main types of implementations:
  on micros as part of a key exchange mechanism in a hybrid scheme
  on larger machines as components of a secure mail system
 Harware Implementations_{ }
 generally perform 10010000 bits/sec on blocks sizes of 256512 bits_{
}
 _{} all known implementations are large bit length conventional ALU
units
El Gamal
 a variant of the DiffieHellman key distribution scheme, allowing secure
exchange of messages
 published in 1985 by ElGamal in
T. ElGamal, "A Public Key Cryptosystem and a Signature Scheme Based on
Discrete Logarithms", IEEE Trans. Information Theory, vol IT31(4), pp469472,
July 1985.
 like DiffieHellman its security depends on the difficulty of factoring
logarithms
 Key Generation
 select a large prime p (~200 digit), and
 [[alpha]] a primitive element mod p
 A has a secret number x_{A}
 B has a secret number x_{B}
 A and B compute y_{A }and y_{B }respectively,
which are then made public
 y_{A} = [[alpha]]^{xA} mod p
 y_{B} = [[alpha]]^{xB} mod p
 to encrypt a message M into ciphertext C,
 selects a random number k, 0 <= k <= p1
 computes the message key K
 computes the ciphertext pair: C = {c1,c2}
 C_{1} = [[alpha]]^{k} mod p
C_{2} = K.M mod p
 to decrypt the message
 extracts the message key K
 K = C_{1}^{xB} mod p =
[[alpha]]^{k.xB} mod p
 extracts M by solving for M in the following equation:
Other PublicKey Schemes
 a number of other publickey schemes have been proposed, some of the
better known being:
 Knapsack based schemes
 McEleice's Error Correcting Code based schems
 ALL of these schemes have been broken
 the only currently known secure public key schemes are those based on
exponentiation
(all of which are patented in North America)
 it has proved to be very very difficult to develop secure public key
schemes
 this in part is why they have not been adopted faster, as their
theorectical advantages might have suggested
[1] examples: 7^5 > a=7, e={101}
[2] more examples: GCD(84,60)=12; GCD(323,238)=17;
GCD(323,299)=1;
[3]more examples: Inv(299,323)=148 show trace of g=u.n+v.a
[4] show examples: eg [[phi]] of 3, 3.3, 3.5
[5] example: a=x^2+1, find a^2 mod (x^3+x+1)=x^2+x+1 showing
modulo reduction process
[6] follow with examples of multiple precision arithmetic
[CSC Info]
Lawrie.Brown@adfa.oz.au / 24Apr96