Converting a single-threaded application to a many-threaded application is often times a tricky proposition. There is more to just the challenge of refactoring the code to work in a concurrent fashion. Tricky little unexpected details from the deep, dark depths start to bubble up. Would you believe an innocuous library like C3P0 can cause major performance problems?

I was recently given the opportunity to enhance the performance of a program written in Groovy used to synchronize data from one system to another. The initial execution of this program was to process 28 million records and subsequent runs were estimated to process 4 million records. Projections based on the state of the program at the time indicated the initial load would take 15 days and the subsequent runs would take 2 days. This obviously needed improvement.

The first major improvement was to implement multi-threading. The Groovy GPars library is very easy to use. Overall projected execution time dropped as the number of threads introduced increased. This was great progress as the estimated execution time decreased to 10 days with only a small amount of refactoring to accommodate splitting up the data to be processed concurrently. Not exactly mind blowing performance gains, but things were moving in the right direction.

Something was amiss at this point, however. Ramping up past 16 threads did not increase performance, and in fact started to make performance worse. This was not expected.

Additional code tweaks were made to try to take advantage of batching up statements, caching some data, and other reorganization. This gained perhaps 2 more days for an estimated total of 8 days of processing. Still, the threading issue was causing problems. Why wouldn’t this program run faster with more threads? It just didn’t make sense.

There were enough connections in the connection pool. The database accepted all of those connections, so there wasn’t an issue of waiting for free connections. The program barely moved the CPU needle on the 8 core, 8GB machine it was running on. The queries generated by Hibernate were sensible. Using VisualVM, I could see garbage collection was high, which is to be expected from a program of this type, but even that wasn’t holding anything back.

The program was initially written to be run on multiple machines. So, we set it up to run on two machines with 16 threads each. Execution time was cut in half! We tried it on three machines. Execution time was cut to about a third. So, it probably couldn’t be related to the database connections or free connections in the pool or even the database itself. I set it up so that I could run two instances of the program on one machine. Execution time was half. It couldn’t be the limitations of one machine, because again, the CPU barely lifted a finger.

There is a plugin available for VisualVM called Threads Inspector. So I opened up VisualVM again and set up a single run of 32 threads. I attached to the running instance of my program and clicked on the Threads tab to see if I could glean any information from it. The threads were waiting about 75% of the time. That seemed unusual. They were not blocking, they were waiting. Waiting on what? You can do thread dumps of individual threads so while one of the threads was in a wait state, I dumped it. It was waiting on the connection pool library, C3P0. Here is part of the offending thread dump:

java.lang.Object.wait()
com.mchange.v2.c3p0.stmt.GooGooStatementCache.acquireStatement
com.mchange.v2.c3p0.stmt.GooGooStatementCache.checkoutStatement

Apparently, it was waiting on acquiring a statement from the statement cache. C3P0 has two settings for the statement cache: maxStatements and maxStatementsPerConnection. maxStatements is a global setting over the entire pool, where maxStatementsPerConnection is a maximum per connection. If maxStatements is enabled, the number of statements in the pool is shared between all of the connections. As the number of threads increased, this pool became more and more stretched between the connections. I changed the value of this setting to 0 and set maxStatementsPerConnection to 50 and reran the program.

Thank the maker! Processing times immediately plummeted to an estimated two days with 32 threads and ramping up to 128 threads brought that number to about 16 hours. At this point, the database was becoming the bottleneck, but processing times came down to a much more manageable number.

Lessons learned? Configuration settings for a single-threaded application may not make sense for a multi-threaded application. But who would have thought something like the C3P0 library could cause so much grief?

About the Author

Brendon Anderson profile.

Brendon Anderson

Sr. Consultant

Brendon has over 15 years of software development experience at organizations large and small.  He craves learning new technologies and techniques and lives in and understands large enterprise application environments with complex software and hardware architectures.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Android Development for iOS Developers
Android development has greatly improved since the early days. Maybe you tried it out when Android development was done in Eclipse, emulators were slow and buggy, and Java was the required language. Things have changed […]
Add a custom object to your Liquibase diff
Adding a custom object to your liquibase diff is a pretty simple two step process. Create an implementation of DatabaseObject Create an implementation of SnapshotGenerator In my case I wanted to add tracking of Stored […]
Keeping Secrets Out of Terraform State
There are many instances where you will want to create resources via Terraform with secrets that you just don’t want anyone to see. These could be IAM credentials, certificates, RDS DB credentials, etc. One problem […]
Validating Terraform Plans using Open Policy Agent
When developing infrastructure as code using terraform, it can be difficult to test and validate changes without executing the code against a real environment. The feedback loop between writing a line of code and understanding […]