Finding and fixing issues in a production system can be really difficult. Usually by the time the problem is visible, users are already complaining. Fixing these problems under the eye of management is no fun for anybody, especially when you don’t know where the problems may be.
You may or may not have access to the servers in question, and you may have to diagnose an issue involving multiple servers. And sometimes there’s a third party involved, such as a database administrator (DBA) or hosting company, for whom your problem is not a priority. Depending on how detailed your log files are, you might be able to search through them and find some hints. It may also be that your code is using third party jars, and they may not log the level of detail you need.
How APM can help
It’s often possible to derive useful information from log files, network monitoring, database server monitoring, and the like. The problem there is that you’re trying to infer things about your code’s behaviour from the information that you’ve already decided to log. If you change your logging to add more information, it’s too late. The error has already happened.
Application Performance Management (APM) systems allow you to remotely instrument your code and log data to an external system continuously. This is advantageous for several reasons. Since this data collection and logging is happening in the background, you don’t need to think about logging metrics during software development. When you need information about the performance of your software in production, the information has already been gathered for you during the normal operation of the system. It has been gathered under real system load on the actual production environment, as opposed to data from a test system under simulated load. It also means that when an error occurs in production, such as a performance bottleneck, data about it has already been gathered and is already available.
In addition to providing help diagnosing problems, an APM system can provide more visibility into your code’s performance and usage patterns by providing metrics about which pages are accessed most often and how much time the server is taking to generate those pages. Once a page has been identified as needing improvement, an APM can help you drill in and see where the server is spending the most time. This lets you can prioritize your fixes.
For example, the page below gives an overview of what the New Relic dashboard application performance looks like when viewed in New Relic, just to give you a feel for what APM can tell you. We see usage spikes, and can see how much time is being spent in application code versus database code. We can see the performance of front-end page rendering alongside the back-end application code. And we can see how our application performs in different countries around the globe.
THIS IS A PREVIEW. DOWNLOAD ISSUE 12 TO READ THE FULL ARTICLE.