Hash Functions- Applications and Properties
A hash function $H$ accepts a variable-length block of data $M$ as input and produces a fixed-size hash value $h = H(M)$. A “good” hash function has the property that the results of applying the function to a large set of inputs will produce outputs that are evenly distributed and apparently random. In general terms, the principal object of a hash function is data integrity. A change to any bit or bits in $M$ results, with high probability, in a change to the hash value.
The kind of hash function needed for security applications is referred to as a cryptographic hash function. A cryptographic hash function is an algorithm for which it is computationally infeasible (because no attack is significantly more efficient than brute force) to find either (a) a data object that maps to a pre-specified hash result (the one-way property) or (b) two data objects that map to the same hash result (the collision-free property). Because of these characteristics, hash functions are often used to determine whether or not data has changed.
Figure 11.1 depicts the general operation of a cryptographic hash function. Typically, the input is padded out to an integer multiple of some fixed length (e.g., 1024 bits), and the padding includes the value of the length of the original message in bits. The length field is a security measure to increase the difficulty for an attacker to produce an alternative message with the same hash value,
The cryptographic hash function is used in a wide variety of security applications and Internet protocols.
Message Authentication
Message authentication is a mechanism or service used to verify the integrity of a message. Message authentication assures that data received are exactly as sent (i.e., there is no modification, insertion, deletion, or replay). In many cases, there is a requirement that the authentication mechanism assures that purported identity of the sender is valid. When a hash function is used to provide message authentication, the hash function value is often referred to as a message digest.
The essence of the use of a hash function for message integrity is as follows. The sender computes a hash value as a function of the bits in the message and transmits both the hash value and the message. The receiver performs the same hash calculation on the message bits and compares this value with the incoming hash value.If there is a mismatch, the receiver knows that the message (or possibly the hash
value) has been altered (Figure 11.2a).
The hash value must be transmitted in a secure fashion. That is, the hash value must be protected so that if an adversary alters or replaces the message, it is not feasible for adversary to also alter the hash value to fool the receiver. This type of attack is shown in Figure 11.2b. In this example, Alice transmits a data block and attaches a hash value. Darth intercepts the message, alters or replaces the data block, and calculates and attaches a new hash value. Bob receives the altered data with the new hash value and does not detect the change. To prevent this attack, the hash value generated by Alice must be protected.
Figure 11.3 illustrates a variety of ways in which a hash code can be used to provide message authentication, as follows.
a. The message plus concatenated hash code is encrypted using symmetric encryption. Because only $A$ and $B$ share the secret key, the message must have come from $A$ and has not been altered. The hash code provides the structure or redundancy required to achieve authentication. Because encryption is applied to the entire message plus hash code, confidentiality is also provided.
b. Only the hash code is encrypted, using symmetric encryption. This reduces the processing burden for those applications that do not require confidentiality.
c. It is possible to use a hash function but no encryption for message authentication.The technique assumes that the two communicating parties share a common secret value $S$. $A$ computes the hash value over the concatenation of $M$ and $S$ and appends the resulting hash value to $M$. Because $B$ possesses $S$, it can recompute the hash value to verify. Because the secret value itself is not sent, an opponent cannot modify an intercepted message and cannot generate a false message.
d. Confidentiality can be added to the approach of method (c) by encrypting the entire message plus the hash code.
When confidentiality is not required, method (b) has an advantage over methods (a) and (d), which encrypts the entire message, in that less computation is required.
More commonly, message authentication is achieved using a message authentication code (MAC), also known as a keyed hash function. Typically, MACs are used between two parties that share a secret key to authenticate information exchanged between those parties. A MAC function takes as input a secret key and a data block and produces a hash value, referred to as the MAC, which is associated with the protected message. If the integrity of the message needs to be checked, the MAC function can be applied to the message and the result compared with the associated MAC value. An attacker who alters the message will be unable to alter the associated MAC value without knowledge of the secret key. Note that the verifying party also knows who the sending party is because no one else knows the secret key.
Note that the combination of hashing and encryption results in an overall function that is, in fact, a MAC (Figure 11.3b). That is, $E(K, H(M))$ is a function of a variable-length message $M$ and a secret key $K$, and it produces a fixed-size output that is secure against an opponent who does not know the secret key. In practice, specific MAC algorithms are designed that are generally more efficient than an encryption algorithm.
Digital Signatures
Another important application, which is similar to the message authentication application, is the digital signature. The operation of the digital signature is similar to that of the MAC. In the case of the digital signature, the hash value of a message is encrypted with a user’s private key. Anyone who knows the user’s public key can verify the integrity of the message that is associated with the digital signature. In this case, an attacker who wishes to alter the message would need to know the user’s private key.
Figure 11.4 illustrates, in a simplified fashion, how a hash code is used to provide a digital signature.
a. The hash code is encrypted, using public-key encryption with the sender’s private key. As with Figure 11.3b, this provides authentication. It also provides a digital signature, because only the sender could have produced the encrypted hash code. In fact, this is the essence of the digital signature technique.
b. If confidentiality as well as a digital signature is desired, then the message plus the private-key-encrypted hash code can be encrypted using a symmetric secret key. This is a common technique.
Other Applications
Hash functions are commonly used to create a one-way password file.Hash of a password is stored by an operating system rather than the password itself. Thus, the actual password is not retrievable by a hacker who gains access to the password file. In simple terms, when a user enters a password, the hash of that password is compared to the stored hash value for verification.This approach to password protection is used by most operating systems.
Hash functions can be used for intrusion detection and virus detection. Store H(F) for each file on a system and secure the hash values (e.g., on a CD-R that is kept secure). One can later determine if a file has been modified by recomputing H(F). An intruder would need to change F without changing H(F).
A cryptographic hash function can be used to construct a pseudorandom function (PRF) or a pseudorandom number generator (PRNG). A common application for a hash-based PRF is for the generation of symmetric keys.
Simple Hash Functions
All hash functions operate using the following general principles. The input (message, file, etc.) is viewed as a sequence of n -bit blocks. The input is processed one block at a time in an iterative fashion to produce an $n$-bit hash function.One of the simplest hash functions is the bit-by-bit exclusive-OR (XOR) of every block. This can be expressed as
$C_i = b_{i1} ⊕b_{i2 }⊕ \cdots ⊕b_{im}$
where
$C_i = i$th bit of the hash code, $1 \le i \le n$
$m$ = number of $n$-bit blocks in the input
$b_{ij }= i$th bit in $j$th block
⊕ = XOR operation
This operation produces a simple parity bit for each bit position and is known as a longitudinal redundancy check.A simple way to improve matters is to perform a one-bit circular shift, or rotation, on the hash value after each block is processed. The procedure can be summarized as follows.
1. Initially set the $n$-bit hash value to zero.
2. Process each successive $n$-bit block of data as follows:
a. Rotate the current hash value to the right by one bit.
b. XOR the block into the hash value.
This has the effect of “randomizing” the input more completely and overcoming any regularities that appear in the input. Figure 11.5 illustrates these two types of hash functions for 16-bit hash values.
Given a message, it is an easy matter to produce a new message that yields that hash code: Simply prepare the desired alternate message and then append an n-bit block that forces the new message plus block to yield the desired hash code.
Although a simple XOR or rotated XOR (RXOR) is insufficient if only the hash code is encrypted, you may still feel that such a simple function could be useful when the message together with the hash code is encrypted (Figure 11.3a).
Security Requirements for Cryptographic Hash Functions
Before proceeding, we need to define two terms. For a hash value $h = H(x)$, we say that $x$ is the preimage of $h$. That is, $x$ is a data block whose hash value, using the function $H$, is $h$. Because $H$ is a many-to-one mapping, for any given hash value $h$, there will in general be multiple preimages. A collision occurs if we have $x ≠ y$ and $H(x) = H(y)$. Because we are using hash functions for data integrity, collisions are clearly undesirable.
Let us consider how many preimages are there for a given hash value, which is a measure of the number of potential collisions for a given hash value. Suppose the length of the hash code is $n$ bits, and the function $H$ takes as input messages or data blocks of length $b$ bits with $b \gt n$. Then, the total number of possible messages is $2^b$ and the total number of possible hash values is $2^n$. On average, each hash value corresponds to $2^{b-n}$ preimages. If $H$ tends to uniformly distribute hash values then, in fact, each hash value will have close to $2^{b-n}$ preimages.
Table 11.1 lists the generally accepted requirements for a cryptographic hash function.The first three properties are requirements for the practical application of a hash function.
The fourth property, preimage resistant, is the one-way property: it is easy to generate a code given a message, but virtually impossible to generate a message given a code. This property is important if the authentication technique involves the use of a secret value (Figure 11.3c).
The fifth property, second preimage resistant, guarantees that it is infeasible to find an alternative message with the same hash value as a given message. This prevents forgery when an encrypted hash code is used (Figures 11.3b and 11.4a). If this property were not true, an attacker would be capable of the following sequence:
First, observe or intercept a message plus its encrypted hash code; second, generate an unencrypted hash code from the message; third, generate an alternate message with the same hash code.
A hash function that satisfies the first five properties in Table 11.1(above) is referred to as a weak hash function. If the sixth property, collision resistant, is also satisfied, then it is referred to as a strong hash function. A strong hash function protects against an attack in which one party generates a message for another party to sign.For example, suppose Bob writes an IOU message, sends it to Alice, and she signs it. Bob finds two messages with the same hash, one of which requires Alice to pay a small amount and one that requires a large payment. Alice signs the first message, and Bob is then able to claim that the second message is authentic.
Figure 11.6 shows the relationships among the three resistant properties.A function that is collision resistant is also second preimage resistant, but the reverse is not necessarily true. A function can be collision resistant but not preimage resistant and vice versa. A function can be preimage resistant but not second preimage resistant and vice versa.Table 11.2 shows the resistant properties required for various hash function applications.
The final requirement in Table 11.1, pseudorandomness, has not traditionally been listed as a requirement of cryptographic hash functions but is more or less implied. Cryptographic hash functions are commonly used for key derivation and pseudorandom number generation, and that in message integrity applications, the three resistant properties depend on the output of the hash function appearing to be random. Thus, it makes sense to verify that in fact a given hash function produces pseudorandom output.
Comments
Post a Comment