Jan 9, 2009

Equals and HashCode

It happens far too often that developers shortcut creating equals() and hashCode() in their objects. This discussion hopes to enlighten those short-cutters, and introduce a simple solution to their problem.

The use of equals() and hashCode() is assumptive of only one concern, and that is that an object is believed to be distinct based on its content, not its instantiation. If an object has no reason to be distinguished from other instances of the same object, then there is no reason to implement these methods.

If, however, the contents of the object do indeed need to be evaluated, which may be more often than one thinks, then it is not only strongly encouraged, but also quite likely required to do so. When desired or required, it’s the responsibility of all good developers to get it right.

Equals and HashCode Contract

There are way too many other resources explaining the details of the “equals-hashCode contract” to go over it in much detail here, but I’ll hit the high points just to make sure that everyone is on the same page.

To start with, a basic understanding of Object (http://java.sun.com/javase/6/docs/api/java/lang/Object.html) is assumed, particularly the implications of equals().

Equals

The basic idea of equals() is to determine if two objects have the same value. The trouble comes into determining what the “value” of an object is. An object, for this discussion, is an aggregation of properties (member variables) and optionally methods (member “functions” and “procedures”) that act on those properties. An object without properties doesn’t concern this discussion, and the presence or absence of methods is probably not going to affect the evaluation of its properties.

Some concepts of equals() that must hold true work for all ideas of equality. This is discussed in detail in the Object JavaDoc, but here’s the main gist of the discussion that must be maintained. They’re essentially the same as equals considerations for mathematics.

If A is equal to B, then B must be equal to A. In Java, that means that a.equals(b) must have the same result as b.equals(a).

If A is equal to B and B is equal to C, then A must also be equal to C. In Java that means that a.equals(b) and b.equals(c) and a.equals(c) (and so on) must all have the same result.

Finally, the same results should happen for all instances of the object with the same compared values. That is, the results need to be consistent. Every time we make a new instance of the object, and fill its members with the same values, we should get the same result out of the equals() evaluation.

With that understanding, consider the Object.equals() that every object automatically inherits. This implementation technically meets the needs of those criteria, however, it does not truly reflect the equality of an object based on its values.

The Object.equals() compares the objects references, and returns true if, and only if, the objects are the same. That is, even if different variables are used, the variables refer to the same instance of an object. Essentially, Object.equals() really only works if a.equals(a) is the case being tested.

Object a = new Object();
Object b = a;
a.equals(b); // This is true
b = new Object();
a.equals(b); // This is false

Knowing this, we can see that we can’t rely on the Object.equals() for any real purpose.

One other criteria of equals() is that if a class implements equals() it must implement hashCode(). The explanation is pretty straight forward.

HashCode

An object’s hashCode() relates to equals() this in one simple statement.

Equal objects must have equal hashCode.

That’s all. It is often misunderstood that hashCode() needs to somehow uniquely identify an object or reflect differences in equals(), but that is expressly not the case. It is very simply the case that if a.equals(b) then it must be true that a.hashCode() == b.hashCode().

It is also required that hashCode() needs to be consistent. Therefore, for any set of values in an object, at least for those that are important to equals(), the same hashCode() needs to be presented. That’s all.

If the objects do not equal each other, their hashCode() is irrelevant; they may be the same or different. It is not necessary to go to any lengths to ensure that if a.equals(b) == false that a.hashCode() != b.hashCode().

One may argue that this means that hashCode() could simply return a constant value and the contract would be met. This is indeed true, but it also is fair that as much as is reasonably possible, distinct values in an object should return different hashCode() values. The use of the concept “reasonable” allows some flexibility and allows us to not go crazy with our hashCode() implementations.

Object gets in the way here, too, because if no hashCode() is provided by an object, the one Object provides is essentially a representation of the object in memory. Again, this helps meet the equals-hashCode contract because if comparing the same object, the equals will be true and the hashCode will be the same.

Reasons to Use Equals and HashCode

Almost as important as understanding what equals() and hashCode() have to do with each other and some of their pitfalls, is understanding why a developer might need to implement these methods. This can be summed up with one word: Collection

OK, so a few words should be used, but this gives a good starting point. If it is the case that an object will ever be put into a Collection of any kind (Map, Set, List, etc.) that it should implement equals() and hashCode().

This again supposes that the uniqueness of an object is determined by the values of its members. In order for an object to correctly identified in a Collection.contains() method, or to be correctly distinct in a Set or as a key in a Map, equals() needs to be implemented. Otherwise, distinct objects with identical member values will not be recognized as “the same” because Object.equals() will not indicate this.

Many other good examples exist, but this is fairly trivial and tremendously common, and gives good reason for nearly every object with properties to implement equals() and hashCode().

Simple Sample

Take for this discussion this very trivial object.

public class Simple {
    public Integer objectInteger = null;
    public int primitiveInt = 0;
}

This bean (for purposes of this and most other discussions, an object with properties and trivial methods) simply contains a small number of, well, numbers. We have one each of the object Integer and the primitive int. They’re all made public to keep the source short; in a proper bean it’s quite likely they’d be private and have getter and setter methods to access them.

If we take this bean as written, there is no way to determine if two instances are equal based on their values.

Sample a = new Simple();
Sample b = new Simple();
a.primitiveInt = 1;
b.primitiveInt = 1;
a.equals(b); // This is false!

Since it is not executed, the comment must be trusted; go ahead and make the necessary files and test it out and the result will be as noted. The code sample shows the instantiation of two Sample objects, and each is provided with the same int value of one.

It should probably be the case that these objects would be evaluated as equal, however. Looking at it logically, the object members are both null, and the int value in each instance has been set to one. These objects likely represent the same concept for which it was written, and should, at least for our purposes, be considered “equal.”

The equals() method for this object would be pretty simple to write. There are a couple of objects, so care needs to be taken to take null into account, but this object could be corrected in one pass.

public class Simple {
    public Integer objectInteger = null;
    public int primitiveInt = 0;

    public boolean equals(Object object) {
        if(!(object instanceof Simple))
            return false;
        Simple simple = (Simple)object;
        return (this == simple)  ||
               ((primitiveInt == simple.primitiveInt) &&
                ((objectInteger == null)
                   ? (simple.objectInteger === null)
                   : objectInteger.equals(simple.objectInteger)));
    }
}

Looking at the new equals() method in detail, the first thing done is to ensure that the comparison object is of the same type, and if that is not the case, surely there is no match. One helpful thing about the use of instanceof is that it also catches the case of a null value passed, and an instance of a class is definitely not equal to null. The next line simply casts the parameter as our type. This can surely be done in-line, but since the comparison uses the value a few times, this is more readable.

The last line (the return) is one likely to be assaulted by style hawks. The bean is simple, with two members, so rather than a series of if statements one properly grouped boolean can do the job.

The first bit compares to see if this instance is the same as the passed instance. Since the next operator is an or, the operation will return true at this point if someone is doing some derivative of a.equals(a). There’s not a real way the same object could have different values for its members, so the binary OR stops here; for our simple example it may be overkill, but if there were more items to compare it would make a bigger performance impact.

Should a different instance be checked, the next half of the OR starts comparing the members. The AND operator between the member checks will cause the evaluation to stop at the first failure, so the primitives are compared first. Should they be the same, the object is compared. The object comparison is done with a ternary operator to first compare for null values, then to compare the values of the objects. This provides a nice null-safe comparison that will return true if the primitives have the same value and the objects are either both null or represent the same Integer value.

In this example, the comparison uses the direct members. It may make sense in some cases to use a more complex getter, if the getters do some more work than just return the members. One could argue that because we want to allow for this that we always use the getters, but that discussion is beyond the intent of this one.

It may also be the case that an object has some members that don’t factor into the equality of the object. If that occurs, simply leave them out of the boolean equation, and they’ll be ignored. A good example of this would be a bean that represented a row in a database; in this case, the equals comparison could only include any members that would match the database primary key (e.g., a unique ID). There’s more on that idea, too, some of which is touched on later in this discussion.

Implementing hashCode() is just about as simple. The key is to try to get a consistent result out of any calculation done within. The sample below shows a hashCode() that will work for the related equals().

public class Simple {
    public Integer objectInteger = null;
    public int primitiveInt = 0;

    public boolean equals(Object object) {
        if(!(object instanceof Simple))
            return false;
        Simple simple = (Simple)object;
        return (this == simple)  ||
               ((primitiveInt == simple.primitiveInt) &&
                ((objectInteger == null)
                   ? (simple.objectInteger === null)
                   : objectInteger.equals(simple.objectInteger)));
    }

    public int hashCode() {
        long hashCode = 10002003l + primitiveInt;
        if(objectInteger != null)
            hashCode += objectInteger.hashCode();
        return Long.valueOf(hashCode).intValue();
    }
}

I’ve repeated the equals() to help avoid scrolling back and forth. One key to note is that the hashCode() includes both of the member variables since our equals does as well. Since objects that result in equals() resolving to true must have the same hashCode() it is recommended that the hashCode() calculation include a bit for each represented variable.

Another thing that is easy to spot is that the hashCode() calculation uses a long, but returns only an int. This is done simply to reduce overflow errors. It is also obvious that the calculation does not start with zero. It is recommended that each class start with a distinct value. It’s unclear exactly why, but it’s easy to accommodate. An easy trick for Serializable classes is to use the serialVersionUID as the seed, but since this trivial object doesn’t it has just a dummy instead.

It is easy to see that (after being checked for null) the member object is asked for its hashCode(), not its value. It happens to be the case that for an Integer the hashCode() will be its value, but if this pattern is repeated, it will become easy to add any kind of object, including a String, or even a member variable of type Simple.

A word of caution about the case where a class contains a member reference of the same type (a self-made linked-list, for example), is that should that be a self-reference, an infinite loop would be created. a simple check for this will protect the code from that happening; this check would be as simple as adding the following example to the calcuation (assuming our Simple class had a member Simple named simple) to the equals() method:

if(this != simple.simple) hashCode += simple.hashCode();

Finally, the long used in the calculation is truncated to an int for return.

Now, looking at this, it will be the case the same hashCode() is possible for objects with different but similar values. This is perfectly acceptable, and does not violate any part of the equals-hashCode contract. The following example shows how the above Simple class would have the same hashCode() for objects that are not evaluated as equal.

Sample a = new Simple();
Sample b = new Simple();
a.primitiveInt = 1;
b.objectInteger = new Integer(1);
a.equals(b); // This is false!
a.hashCode() == b.hashCode(); // This is OK.

Yes, again this sample won’t compile…it’s an example. (For those that wonder why it won’t run as-is: the last line is a boolean in the middle of no where…that only works in C/C++ and the like, but not Java. Put it in an if statement or assign it to a value to make it compile.)

Hard Part Remains

Now we can see that it’s easy to do, and doesn’t need to be overly complex.

Of course, the hard part is determining which members of an object make it into the equals() and hashCode() method. It may seem reasonable, and may occasionally be the case, that every member needs to be in there. It’s probably more likely that a “key” of sorts is all that’s needed.

The quick mention previously about using an “id,” as when the object represents a database-persisted object, might also bring much discussion. It’s not necessarily adequate to rely only on a key, as it may be the case that an ORM (such as Hibernate) may not assign a key until it’s actually persisted. In this case, it might be necessary to implement a method that has a quick “is the id the same” check, but perhaps check other key members if one or both of the ids is yet unassigned. A database representing a person, for example, might need to know that a “username” is unique enough, so even though there is an “id” that is the primary key, that the other field is a good second-level check. This allows for one to be pulled from the database and one to be created and populated in the program to still be recognized as equal.

About the Author

Object Partners profile.
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Android Development for iOS Developers
Android development has greatly improved since the early days. Maybe you tried it out when Android development was done in Eclipse, emulators were slow and buggy, and Java was the required language. Things have changed […]
Add a custom object to your Liquibase diff
Adding a custom object to your liquibase diff is a pretty simple two step process. Create an implementation of DatabaseObject Create an implementation of SnapshotGenerator In my case I wanted to add tracking of Stored […]
Keeping Secrets Out of Terraform State
There are many instances where you will want to create resources via Terraform with secrets that you just don’t want anyone to see. These could be IAM credentials, certificates, RDS DB credentials, etc. One problem […]
Validating Terraform Plans using Open Policy Agent
When developing infrastructure as code using terraform, it can be difficult to test and validate changes without executing the code against a real environment. The feedback loop between writing a line of code and understanding […]