Description of problem: Debug logging generates too many messages for large installations. As all alerting and condition processing happens on the server side then with greater than 100 agents the files become difficult to follow and maintain relevant context. As an example, with 138 agents and just nine simple alerts defined, each roll of the log files is around 50MB of text messages and only covered about 15 minutes of monitored time. With a more complicated monitoring setup or more aggressive alerting, log file processing becomes fairly tedious and grep calls can begin to slow down. Often you need to know the precise time window to focus on. When there are no telltale exceptions thrown and you don't yet know the problem it can become tedious. Are there alternative logging mechanisms like tomcat, apache, etc. where we can learn of better ways to group or break out logging? Another alternative is to provide some better tooling examples/sample for log file mining. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. N/A 2. 3. Actual results: Expected results: Additional info:
Actually writing all that log information also has an impact on raw server performance - creating the info can be expensive - writing the log file involves writing to disk, which could become the bottleneck (and yes, I have seen that in the past with our internal unti tests and small file sizes for the size rolling appender).