Power Servers running IBM i are remarkably stable systems, to such a degree that it's easy to overlook those rare occasions when something on the system has gone awry and requires operator attention. When an exception does occur, many IBM i sites have third party monitoring tools or home-grown software in place to notify system administrators that an issue has occurred that may require some kind of action.
For instance, if a job goes into MSGW status or an expected process is not running in a particular subsystem, a message can be sent to one or more members of the IT staff to draw attention to the problem.
If you're at a site that does not currently have such a system in place, you're in luck! Included with the latest Valence 5.1 build is a System Monitor utility that works in conjunction with Nitro iAdmin settings to send emails to appropriate parties. The emails include a link to launch iAdmin in either the Valence Mobile Portal App or in the Valence Portal on the desktop.
When a job goes into MSGW status, you can configure iAdmin to send notifications to everyone or to specific people depending on the message ID. You can also configure it to ignore certain messages, such as when a printer has run out of paper or needs a form loaded.
For global notifications, you can set up email addresses to receive alerts in Portal Admin > Settings. Scroll down to the "Nitro iAdmin" section and put one or more email addresses, separated by commas, in the "Email address to include on all MSGW notifications" field.
Specifying a global email recipient for message notifications is not mandatory, as you can define individual users to receive notifications within the iAdmin app. You can further specify specific messages that should be ignored and thus not trigger any email notifications.
You can also make user-specific exceptions, such that messages triggered by a certain user are ignored, or that certain users should not be notified on certain messages. To control which messages are ignored, log into the Valence Portal, launch the Nitro iAdmin app and go to the Settings section. There are two panels in the section: On the left is a global list of messages to be ignored; On the right are settings specific to your user (meaning, whichever user profile you used to log into the portal).
The records in the left panel are global in scope, controlling which messages should be ignored. You can edit a record by clicking on it, or add a new one by hitting the "plus" icon in the upper right. The edit/add form looks like this:
The fields on the ignore message panel break down as follows:
- MSG ID - The message ID to be ignored. If left blank then the ignore record applies to all IBM i messages, subject to constraints of the other fields specified below. The message ID can be a wild card value as well -- so CPA39* would ignore any messages that start with CPA39.
- User - The IBM i job user to be ignored. If specified, then the ignore applies to any jobs submitted by the specified user, subject to any other constraints specified in the other fields.
- Name - The job name to be ignored. If specified, then the ignore record only applies to the specified job.
- When logged in as - The Valence user who would wish to ignore the message. If specified, then the Valence user will not be notified of the MSGW condition when logged in, nor will he/she receive an email message. Any other users, however, would be notified.
Referring to the message list depicted in Figure 2, there are three ignore conditions set up:
- When the MRP_DAILY job triggers a MSGW condition, user JIMBOB will not be notified. Any other users set up to receive notifications, or the global email recipient specified in Portal Admin > Settings, will receive the notification.
- Any MSGW condition caused by IBM i user SCOTT will not trigger a notification. In this scenario, Scott might be an IT developer who is doing frequent testing and causing a lot of MSGWs, which he presumably takes care of himself.
- Any CPA3394 message ("Load form XXXX") is ignored. Presumably the users changing forms on a printer would know how to deal with this, so no notification to IT of that particular MSGW condition is required.
The right panel in the Settings page of iAdmin includes a control for designating the logged in user as an email recipient for MSGW notifications. Check the "Send email on MSGW" box to include the logged in user as one of the recipients. The email address is pulled from the Valence user profile via Portal Admin > Users.
In addition to knowing when jobs have gone into a MSGW state, it can also be helpful to know when one or more jobs may not be running at all. Subsystem monitoring makes it possible to set a min/max job threshold on a subsystem and be notified if the number of active jobs in the subsystem falls outside that range. This can be handy for jobs that are expected to be running all day, or for a certain period of the day.
The first step for creating notifications for subsystem exceptions is to enter one or more email addresses to receive the notification. This is done on a global basis in the Nitro iAdmin section of Portal Admin > Settings:
The next step in setting up a subsystem monitor is to indicate what the min/max job count on a subsystem should be. This is done through the Nitro iAdmin app, in the Subsystems section. Click the selection icon on the desired subsystem, then select "Monitor":
Next you will see the monitor settings for the selected subsystem, with the following configuration fields:
- Minimum job threshold - if the number of active jobs falls below this value, the subsystem will be considered to be in an exception state. A value of zero means there is no minimum.
- Maximum job threshold - if the number of active jobs falls above this value, the subsystem will be considered to be in an exception state. A value of 9999 means there is no maximum.
- Send email when threshold breached - if the subsystem falls into an exception state, the email recipient(s) specified in Portal Admin > Settings will receive an email alert when the condition occurs
- Interval between notifications - the interval in minutes between notifications of the exception condition. A setting of 60 means hourly notifications will be sent until the exception condition has been corrected.
- Timeframe - The list below the settings listed above controls which days of week and which time ranges the exception condition should be monitored. If no timeframes are entered then the exception condition is checked every day, around the clock.
In the example depicted in figure 5, the QEDI subsystem is being monitored five days a week to verify there is at least one job running. In the event no job is running, an email alert will be sent to the John Doe email address specified in figure 4.
Activating the System Monitor
With your MSGW and/or subsystem exceptions configured, the final step is to launch a batch job that handles the monitoring functions, calling program VVSYSMON. This job should be submitted to a subsystem that is always running on your system, such as QCTL. With your appropriate Valence instance library in your library list, enter the following command:
SBMJOB CMD(CALL PGM(VVSYSMON)) JOB(VVSYSMON) JOBQ(QCTL)
You can stop the MSGW and subsystem monitoring process by simply putting the job on hold or ending it (i.e., using option 3 or 4 in WRKACTJOB).