Masking sensitive data in Log4j 2
A growing practice across many organizations is to log as much information as is feasible, to allow for better debugging and auditing. Tools like Splunk and ELK may it even easier to index the logs, treating the them almost like databases. However, with PCI and HIPAA standards, those same organizations may want to mask much of the data to prevent unauthorized or unprotected access to sensitive data. In this blog post I’ll detail one potential approach to masking that data, so developers do not need to worry about filtering individual log statements.
You’re going to need to use Log4j 2 (potentially with SLF4J as well). A sample pom.xml for just these dependencies would include the lines:
If you’re using a different logging framework, then I imagine this guide may not be very helpful.
I’m going to dive right in, as there are a few different files we need to create or modify to get log masking to work. The first file we will create is a pretty basic one. It’s going to hold all of our logging markers, so that we can tell Log4J to only run the masking on the log statements that need it. Masking our logs means we’re taking a performance hit, so we should not do it any more than we need to:
I’ve got two basic Markers in that class, one for JSON, and one for XML. You can define as many as you need — for different content types, data types, etc. For this tutorial we’re only going to be using the JSON marker.
Let’s continue by extending the LogEventPatternConverter:
Ok, let’s stop and analyze the important bits. The “ConverterKeys” value and the “NAME” field we pass to the LogEventPatternConverter define the pattern that we will include in our “log4j2.xml” config. It’s what we need to include to ever see our masking at work. I believe you cannot override the default “%m”, so we are defining our own custom pattern “cm”. In fact, we will call “cm” INSTEAD of the default “m” in our configuration.
Next, the constructor and the “newInstance()” methods are required for our converter to be properly invoked by Log4j. The “format()” method holds the crux of our work. You can see that it takes the formatted message, and returns it if we do not have any Markers for the current logging statement. If we DO have markers (like for example our JSON one), then and only then will we attempt to mask the message.
I’ve implemented a simple JSON regex replacement for the mask method, but there are many different approaches you can take: you can hydrate the JSON and replace the values based on name/path, you can inspect an object to see if it’s annotated with a “DoNotMask” annotation, or you can even define simple regex values to replace (e.g. credit cards, SSNs). The implementation I provide is meant as a proof-of-concept example, and is not prod-ready. Also, if you DO decide to implement multiple strategies for different markers, it makes sense to move that logic into specific classes (I have included everything in one file for simplicity).
As a simple demonstration of this class, let’s also include the tests:
At this point however, we are still not ready to use our class, as Log4j does not know to look for it. For this, we need to update the log4j2.xml file:
The key parts here are to update “Configuration packages” attribute to include the package (or parent) of your LogEventPatternConverter, and to replace, or append “cm” rather than “m” in the pattern. If your logs should be filtering but are instead prefixed by a “c”, then Log4j has not picked up your converter, and you should make sure that the names are correct, and that the package is included in the “Configuration” node!
So hopefully now we have everything hooked up so that our log statements can be masked. In order to take advantage of our converter, we need to log our statements with the appropriate Marker:
If all went well, you should now see your sensitive data being replaced with your mask. As a final note, if you are using Spring Boot, by default Log4J is configured BEFORE Spring Boot components and @Value fields, so if you put your fields-to-mask into a properties file, it may take some extra configuration to make sure Log4J picks them up.
One thought on “Masking sensitive data in Log4j 2”
Thank you so much, your blog really helped me.
this is good example.
Can you please also share the complete log4j2.xml file?
Your explanation looks very helpful. Could you please provide one example on how to implement this?