Skipping to the end of the story - the Eclipse Memory Analyzer is awesome!
We have a high profile and very politically entrenched application which is plagued by many issues outside the control of the IT department. Recently a consultant with much history with the application produced changes to support a mobile application. We were told that the changes were minimal - basically bolt-on pieces only. That did not turn out to be the case, the consultants idea of small bolt-on involved replacing some core Struts configuration affecting the original core application as well as the new code. Timing of this could not have been much worse since we ended up needing to upgrade to the Struts 2.5.15.3 library due to security issues at the same time. Anyways, the mobile bolt-on had not been well-tested and neither was the combined application code. This was determined a bit later when IT went back and did some testing of the mobile code related branch code.
Anyways, IT was forced to fix a number of issues before a Struts upgrade can occur while merging the mobile release and producing a build for deployment. Due to a lack of adequate functional user resources assigned to the application, testing is typically inadequate. The build was forced to go into production and then the problems began. The symptoms were basically high heap utilization, increasingly high CPU utilization followed by instability and finally out-of-memory (OOM) errors being logged after which the application is basically brain dead.
After a couple days of analysis, it was still unclear as to what the root cause was. There was never a clear repeating pattern. We ended up implementing multiple daily application restarts to help clear the memory but that was only mildly effective. There were some items which seemed odd while analyzing the application with jvisualvm but the problem was "lost in the trees". Some initial attempts at analyzing the heap dump produced when the OOM error occurred failed because of issues related to the size of the heap dump and a lack of functionality to help identify the root cause.
I had installed the Eclipse Memory Analyzer a while back with the intent of trying it out but never had the time to do so. At this point, it seemed like a good time to try. The first attempt was not successful though since I tried it on my "old" laptop which was running Windows 7 32bit. Fortunately, I had received another laptop a week or so prior with Win 7 x64 which I was in the midst of setting up. After increasing memory in Eclipse upto around 3200mb, I started the import of the heap dump from our prod system and after around 45 minutes it was done processing and offered up to show leak suspects. I gladly selected yes and found that it had one suspect which was taking up more than 300mb of memory. It provided a stack trace which led me to a some particular functionality. The interesting item found was that data in the stack/trace was indicative that the functionality was accessed by a non-logged in user. After some communication with the consultant, functional staff and another developer on the application indicated that the functionality should NOT be accessible to a non-logged in user - further investigation followed. This lead to identifying that a DB call was returning around over 400k rows of data instead of under of 100 rows and was retaining it. It turned out that the consultant had an "OR" condition which explicitly returned all data to a non-logged in user (even though the functionality shouldn't be available when not logged in..). After removing the extraneous OR and having the functional team test for any negative effects - the changes were put into production.
The result of the fix was the application was stable and using under 50% of the average memory in use when unstable and CPU utilization averages mostly in the single digits.
So roughly one hour of work with the Memory Analyzer helped solve a problem which was not anywhere near being solved after multiple days of effort.
Hats off the the people responsible for the Memory Analyzer - you have my gratitude!
Now if we can figure out what the consultant did which is causing a non-logged in behavior.