Having read the Installation:LoggingContextFilter and the Installation:Useful Server Logging Configurations wiki articles, I’m still coming up short in regards to what I want to achieve.
Context
I am asking from the context of a multi-server environment, with several assystEnterprise servers, each running in their separate windows VMs, with a load balanced URL facing our users, and our technicians.
I am trying to address some issues in regards to the %JBOSS_HOME%/standalone/log/server.log and how the logging of certain normal or foreseeable events caused by our users or IT-staff causes massive entries in the server.log.
Having the added noise of errors from normal and foreseeable behavior in the main log file adds extra complexity and increases the time spent noticing patterns and determining the cause of reported issues.
On the other hand, some error messages are not verbose enough to be of use as they lack critical information needed to investigate the issue, or for aggregation purposes to determine if additional training might be necessary for some of our support staff.
Overarching goal
This is the first step in a larger picture of enabling automatic monitoring the system of our Assyst installation, as well as certain Assyst functionality, using a combination of SCOM, CheckMK, and Grafana. Any proposed solution would have to be compatible, or preferably synergistic with this overarching goal in mind.
Some examples:
Too much information
In an instance where a user has been forwarded an email chain from a technician, where part of the email contains the AssystWEB link. This often cause some users to attempt to open said URL, which then fails as they do not have an assystUser, but also causes a set of error messages in the server log, such as:
… tAuthenticator] (default task-19869) getAccountUser 'prefix-xxxxxx' does not have an account.
oDate] eTime]
However, each such attempt causes a total of 12 Errors, and 4 Warnings relating to authenticating and downloading icon for that user. This results in ~ 1050 lines in the server.log, for each user attempt.
Not enough information:
In the case of an assystUser performing a search it is foreseeable that some of them will hit the cap of allowable results (as set in the system parameters). However, when the error is logged there is no information regarding who performed the search, what the used search terms where, or what search parameters where selected when performing the search.
Searches failing due to system parameter limit is also mostly irrelevant in regards to the stability of the system, and could be better presented in a log dedicated to errors caused by users/IT-staff.
Having more information regarding what caused errors could help us aggregate these error messages, and determine if it’s a repeat user, or if it’s a broader issue pertaining to certain search parameters that have been selected, etc.
Questions, and request for input
- Does IFS, or any Customers using Assyst have any experience to share, or examples of how this or similar issues has been approached?
- Are there examples of how, and what log outputs should be written to the windows event log to enable automatic system and functionality monitoring by SCOM or similar software?
- What resources are recommended for further insights into the possibilities and pitfalls in this area in the in regards of Assyst?
- Are there for instance examples of how to make a more granular logging setup where certain events are written to a set of separate logs, where a «main log» such as server.log simply states that something was recorded to that separate log?
- Am I approaching this situation from the wrong angle, and if so, what steps would you recommend?