Understanding hashCode() and equals() in Java
Java’s hashCode() and equals() methods are fundamental to the functioning of many core Java classes, particularly those in the Collections framework, such as HashMap, HashSet, and Hashtable. These methods define how Java objects behave when stored in collections that use hashing for efficient retrieval.
In this blog, we’ll dive deep into how these methods work, the rationale behind them, and their impact on performance. We’ll also look at the relationship between hashCode() and equals(), explore best practices, and investigate a real-world example with Java's String class.
Overview of hashCode() and equals()
In Java, objects are often stored in collections such as HashMap
or HashSet
, which use hashing for efficient access and storage. For these collections to work as expected, the objects need to adhere to certain rules regarding equality (equals()
) and hashing (hashCode()
).
hashCode()
: This method returns an integer hash code, which is used by hash-based collections to determine the "bucket" where an object should be placed.equals()
: This method checks if two objects are meaningfully equal. It compares their internal state to determine if they represent the same logical entity.
The correct implementation of these methods ensures that objects can be retrieved efficiently from collections and behave correctly when compared.
Deep Dive into hashCode()
What is a Hash Code?
A hash code is a numerical value generated for an object. It serves as a compact representation of the object’s contents. Hash codes are particularly useful for storing objects in hash-based collections (like
HashMap
,HashSet
, andHashtable
) because they allow these collections to quickly find an object in a large dataset by using the hash code to determine the storage location (bucket).
How hashCode() Works
The hashCode()
method is defined in Java’s Object
class and can be overridden to provide custom hash codes for objects. Here’s the method signature:
public int hashCode();
When you insert an object into a hash-based collection, the collection first calls the object’s hashCode()
method to get its hash code. This hash code is then used to find the correct bucket where the object should be stored.
But what happens if two objects have the same hash code? This is called a hash collision, and it’s something we’ll explore in more depth later. For now, the important takeaway is that two objects with the same hash code may be stored in the same bucket, and the collection will use equals()
to differentiate between them.
Overriding hashCode()
Here’s an example of how you can override
public class Person {
private String name;
private int age;
@Override
public int hashCode() {
int result = 17; // Start with a non-zero constant
result = 31 * result + name.hashCode(); // Combine fields
result = 31 * result + age;
return result;
}
}
Example of Hash Code Calculation:
Let's consider the string
String str = "abc";
System.out.println(str.hashCode());
The hash code is calculated as follows:
For the first character
'a'
(ASCII 97):h = 31 * 0 + 97 = 97
For the second character
'b'
(ASCII 98):h = 31 * 97 + 98 = 3105
For the third character
'c'
(ASCII 99):h = 31 * 3105 + 99 = 96354
Thus, the hash code for "abc"
is 96354
.
The Role of Prime Numbers in hashCode()
You might have seen prime numbers like
31
commonly used in hash code calculations (e.g.,31 * h + val[i]
). This is not arbitrary—prime numbers help reduce the likelihood of hash collisions. By multiplying intermediate hash values by a prime number, you distribute the hash values more evenly across possible buckets, ensuring better performance for hash-based collections.Prime numbers ensure that hash codes are distributed more uniformly because their factors are less predictable, reducing the likelihood that different combinations of object fields will produce the same hash code.
Hash Collisions and Buckets
A hash collision occurs when two different objects return the same hash code. When this happens, both objects are stored in the same bucket, but they still need to be distinguishable. This is where the
equals()
method comes into play.In a
HashMap
, for instance, if two objects have the same hash code, they will be placed in the same bucket. However, the collection will then useequals()
to check if the objects are truly equal. If they are, the collection considers them duplicates; if not, both objects are stored in the same bucket but treated as distinct elements.
Deep Dive into equals()
What is Equality?
In Java, the equals()
method defines logical equality between two objects. It is used to determine whether two objects represent the same logical entity, even if they are different instances.
Here’s the method signature of equals()
:
public boolean equals(Object obj);
The default implementation of equals()
in the Object
class compares object references using ==
, which checks for reference equality—i.e., whether two objects point to the same memory location. However, this default behavior is not suitable for most applications, where you want to compare the actual contents or values of objects, not just their memory addresses.
Overriding equals()
Here’s an example of how you might override equals()
in a custom class:
public class Person {
private String name;
private int age;
@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null || getClass() != obj.getClass()) return false;
Person person = (Person) obj;
return age == person.age && name.equals(person.name);
}
}
The Contract Between equals() and hashCode()
There is an important contract between
If two objects are equal according to
equals()
, they must have the same hash code.If two objects are not equal, they can have the same or different hash codes (hash collisions).
Hash-based collections rely on this contract to function properly. If two equal objects have different hash codes, they might be stored in different buckets, causing the collection to behave incorrectly (e.g., failing to retrieve an object or incorrectly treating two objects as distinct).
Reflexive: For any non-null object
x
,x.equals(x)
must returntrue
.Symmetric: For any non-null objects
x
andy
, ifx.equals(y)
returnstrue
, theny.equals(x)
must also returntrue
.Transitive: For any non-null objects
x
,y
, andz
, ifx.equals(y)
returnstrue
andy.equals(z)
returnstrue
, thenx.equals(z)
must also returntrue
.Consistent: For any non-null objects
x
andy
, repeated calls tox.equals(y)
must consistently returntrue
orfalse
, provided no information used inequals()
comparisons is modified.
The Importance of hashCode() and equals() in Collections
In hash-based collections like HashMap
, HashSet
, and Hashtable
, the hashCode()
and equals()
methods are used together to store and retrieve elements.
Insertion in a HashMap:
Hashing: The
hashCode()
method is called on the key to determine the bucket (array index) where the object should be placed.Equality Check: If another object already exists in the bucket (due to a hash collision), the
equals()
method is used to check if the two objects are equal. If they are equal, the existing value is replaced with the new one; otherwise, the new object is stored in the same bucket using techniques like chaining or probing.
Retrieval from a HashMap:
When retrieving a value from a HashMap
using a key:
Hashing: The
hashCode()
of the key is used to find the correct bucket.Equality Check: The
equals()
method is used to identify the correct key-value pair within the bucket (if multiple objects are stored due to hash collisions).
Impact of Incorrect Implementation:
If
equals()
is overridden withouthashCode()
, hash-based collections might not work correctly. Two objects that are equal according toequals()
could end up in different buckets because their hash codes differ.If
hashCode()
is poorly implemented, you might experience frequent hash collisions, leading to performance degradation in collections likeHashMap
.
Common Mistakes and Best Practices:
Always Override hashCode()
When Overriding equals()
:
Ensure that equal objects have the same hash code.
Failing to override
hashCode()
when overridingequals()
violates the contract between the two methods and can lead to bugs in collections.
Use Immutable Fields
It’s a good practice to use immutable fields (e.g.,
final
fields) inhashCode()
andequals()
to prevent the state of an object from changing after it’s been inserted into a collection.If the fields that
equals()
andhashCode()
rely on can be modified, the object’s behavior in hash-based collections may become unpredictable.
Use Prime Numbers for Hashing
Prime numbers like 31 are commonly used in hash functions because they help distribute hash values more uniformly across the hash table, reducing collisions.
Avoid Using Floating-Point Numbers
Using
float
ordouble
inhashCode()
can be tricky due to precision issues. If you must include them, consider converting them toint
usingFloat.floatToIntBits()
orDouble.doubleToLongBits()
.
Caching Hash Codes for Immutable Objects
If an object is immutable (like
String
), you can cache its hash code to avoid recomputation, improving performance. This is done in theString
class, where thehashCode
is computed once and stored for future use.
String Class implementation
In Java, String
is a class that represents a sequence of characters.
Implementation of equals() in String
Here’s the implementation of the equals()
method in the String
class:
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String) anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
Breakdown of equals() Logic:
Reference Check: The method first checks if the two references (
this
andanObject
) point to the same object (this == anObject
). If true, they are considered equal, and the method returnstrue
.Type Check: If the two objects are not the same reference, the method checks whether
anObject
is an instance ofString
. If not,false
is returned.Length Check: If both objects are strings, their lengths are compared. If the lengths differ, the strings cannot be equal, and the method returns
false
.Content Comparison: If the lengths are the same, the method compares the individual characters of the two strings. If any character differs,
false
is returned. If all characters match,true
is returned, indicating that the strings are equal.
This implementation ensures that two strings are considered equal if and only if they contain the exact same sequence of characters in the same order.
Implementation of hashCode() in String
Here’s the actual implementation of
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
Breakdown of hashCode() Logic:
Cached Hash Code: The
String
class caches the hash code once it has been calculated. The variablehash
is used to store the hash code, and the method first checks ifhash
is already set (h == 0
). If it’s non-zero, the method returns the cached hash code.Hash Calculation: If the hash code hasn’t been calculated yet, the method iterates over the characters in the string (
char val[] = value
). For each character, the current hash is multiplied by 31 and added to the character’s value (h = 31 * h + val[i]
). This results in a final hash code that represents the string.Return Value: Once the hash code is computed, it is cached in the
hash
variable and returned.