ETags and Spring Data REST
ETags are a useful part of HTTP that don’t get a ton of love in our current API-everything world. Luckily, the Spring Data Rest team added support for ETags in their recent 2.3/upcoming 2.3.1 releases so we can leverage them out of the box! Here we will explore their benefits in regard to APIs and see them in action in a RESTful service with Spring Boot to demonstrate.
What are ETags?
ETags are opaque identifiers that get assigned to resources by the server. It’s easiest to think of them as resource versions that get updated each time the state of the resource is updated. This allows you to take versioning into consideration at the API level and instruct the server to only carry out an action if the state of the resource is what you expect – these are called “conditional requests”. These conditional requests afford you some interesting capabilities as an API designer.
One the server side, ETags can be created for each resource however you like, including:
- Hashes of the request headers/body
- Utilization of version numbers already implemented at the data store level
- Hashes of partial resource parts
or anything that can guarantee uniqueness for each resource state.
Clients of the API don’t have to understand or care how the server calculates ETags, they just have to grab the value in the “ETag” response header and use it for their next request. There are two* headers that clients use in their requests to take advantage of ETags: “If-Match” and “If-None-Match” (more on this shortly).
*There is also an If-Range header and a topic of weak/strong validation, but I won’t be covering those features since they aren’t being added to Spring Data REST at this time.
ETags in Spring Data REST
Spring Data REST has added ETag support out of the box for data stores that provide optimistic locking capabilities through @Version (such as JPA and MongoDB). The ETag for each resource is simply built by grabbing the value of your field annotated by @Version – so you don’t have to worry about manually computing one yourself. The project I’m using for the tutorial in the rest of the post is available on Github and uses MongoDB.
Optimistic Concurrency Control
With ETags we can achieve an optimistic-locking effect at the API level for HTTP methods that update a resource’s state (PUT, PATCH, and DELETE). This helps you guard against the classic scenario:
- Client A GETs a resource
- Client B GETs the same resource
- Client B updates the resource with a PATCH (beat Client A to the punch)
- Client A doesn’t know about Client B’s update and overwrites it with a PATCH of its own
ETags allow clients to say “only execute this state-changing API call if the resource hasn’t changed since I last retrieved it”. This protects you (and others) from inadvertently overwriting their changes.
To do this we use ETag values and the “If-Match” header. When you POST a new resource or GET an existing resource, an “ETag” header is populated in the response. All you have to do is use that value in an “If-Match” header in your next request, and if that value doesn’t match the current resource’s version, a “412 Precondition Failed” is responded telling you to do a fresh GET and retry the update if still appropriate. Here is an example of this interaction using the sample project on Github:
This functionality has been around for a long time in many data stores, but using it at the API level gives you a way to standardize this type of concurrency control for your resources no matter what kind of data stores (or downstream services) are used behind the scenes.
You can also use ETags during reads (GET) to save valuable bandwidth. In this manner, you are telling the server “ONLY send the representation of this resource if it has changed” – this is called a “conditional GET”. So instead of just issuing a GET request every time you need the resource state, you can use the ETag returned in the original POST or previous GET request in an “If-None-Match” header, and if it matches the current state of the resource a “304 Not Modified” is responded. When you get a 304, you know that you can utilize the state that you already have in-memory and that data doesn’t have to get retransmitted over the wire once again. An example:
This may not seem like that big of deal, but when you factor in how many different users (each with several devices that provide all-day access) will be pounding on your API eventually, the reduction of all that data transmission really adds up. A good example of usage comes from Github, who do not count conditional GETs resulting in 304s against rate limits.