System Sentinel

Basic Product Design

Sentinel provides integrated infrastructure and application monitoring, scalable to a global footprint. It is currently installed on tens of thousands of servers worldwide, seamlessly monitoring applications and infrastructure in large corporations whose businesses include financial services, investment banking, high frequency securities trading, media and publishing.

Architecture -- The Back Endshow

A basic Sentinel installation consists of a pair of redundant sentinel servers and a set of distributed programmable agents also known as Scouts that reside upon the machines that are to be monitored.

Sentinel Servers act as the central repository for both state information and snapshot data of the technology estate that is being monitored.

A single pair of Sentinel servers can easily monitor ten thousand or more machines depending upon the amount and type of message traffic that they are receiving.

Scouts are deployed on each and every server throughout the installation where monitoring is to be performed. Scouts are configured via a simple but extensible rule based programming language. Additionally, certain application-specific wrappers are provided whose function is to generate and distribute rules to sets of machines. This allows for single sets of rules to be distributed and maintained for sets of similar machines.

Scouts wake up periodically, execute their rule base, and report both state information and snapshot data back to the Sentinel servers.
Architecture -- The Front Endshow

The front end of a Sentinel installation consists of three distinct pieces that allow users to interact with events and data that have been collected by the back end of a Sentinel installation.

The first of these is the Sentinel Dashboard. The dashboard connects to the Sentinel servers and allows users to easily create diagrammatic views of the monitored estate. The icons that make up a view can be made to change color to reflect the informational state of the Sentinel servers. There are no limits to the kinds of views that can be created or the conditional complexity to which elements respond. Icons themselves can act as portals into other views so that visualization of an entire technology estate can be navigated by simple mouse clicks.

The second piece is the Sentinel Notification engine. The notification engine allows actions such as SMS, email and ticketing, to occur based upon the informational state of the Sentinel servers. Users, and groups of users, can easily be alerted to attend to remediation requirements based upon changes to the technology estate.

Finally, Sentinel keeps a historical database of all triggering events and snapshot data that is stored in an SQL database and may be used for reporting, trouble shooting and capacity planning.

A Brief Guide to Built in Methods

The following is a brief introduction to the basic monitoring methods that are available within a standard Sentinel installation. Because Sentinel is extensible through several different means, there is in reality, no practical limitation as to the kinds of things that can be monitored. If the data exists and can be made available to Sentinel, it can be easily incorporated into the monitoring rule bases. Each rule executed by a Sentinel agent contains a conditional expression that determines if and when the rule will fire, in turn causing Dashboard icons to light, Notifications to be sent, and the Event itself to be logged. Following is a synopsis of the existing methods that can be incorporated into the condition clause of a Sentinel rule.

Processor Performanceshow

Sentinel includes a large set of processor performance monitoring methods. CPU utilization over an entire box, and on individual processors within the box is available. The granularity of measurements can be down to one second both for data collection and for event triggers. Some of the complimentary methods within Processor Performance group are CPU count, Kernel CPU Usage, Percent Time in Io Wait State, Percent Time in IRQ requests, Percent Time in SOFTIRQ requests, Percent Time in Guest Operating Systems and Percent Time in Host Operating System for virtual machines. Each of these is also available in granularity down to one second.
Process Monitoringshow

Sentinel exposes the entirety of the process table to its rules. Events can be generated from expressions that contain values from the following methods. Process Name, Process Id, Parent Process Id, Process Arguments, Process CPU Utilization, Virtual Process Memory Consumed, Process Resident Memory Consumed, Process Start Time, and Process User Id. These methods allow events to be generated based on the current status of the table. For example, a rule that triggered if a key process was not running would have a condition clause that looked like: when not exists (PROC = 'keyproc'). More complex condition clauses can be built up with various boolean operations as in: when not exists (PROC = 'keyproc') and (&time between 090000 and 160000) which will fire only from 9:00am to 4:00pm regardless of whether or not the process 'keyproc' is running.

On Window's boxes where the process table is not exposed in this manner, the equivalent metrics are available through the Windows Management Interface.
Diskshow

Several methods exist within Sentinel to monitor disk capacity and swap space. These methods can be used to determine what percentage of disk/swap space, is used or available and the absolute number bytes used and remaining. For example the condition clause: when (DISK('c:/') < 200), will trigger an event if there are fewer than 200 megabytes of space remaining on drive c:.
Networkshow

Sentinel provides a suite of methods designed to test network interfaces. Each interface can be queried for Name, Speed, Duplex, Current State (Up or Down), Input and Output Error, Input and Output Drops, and Transfer Rates both over the last check cycle and at one second intervals. For example the condition clause: when (NETSTATE = 0), will trigger an event for each of the network interfaces that is currently down. Sentinel also supports the monitoring of Bonded Interfaces so that multiple physical interfaces can be assigned to a single logical interface.
TCP/UDPshow

Both UDP and TCP tables are available to various methods that can monitor host to host communication in a Sentinel installation. Sentinel can montior both UDP and TCP queue depths to aid in communications performance monitoring, and can do so on a port by port basis. The TCP stack is polled once every second and both RECVQ and XMITQ are stored for further processing. For example, to test for latency on the receive queue for a socket bound to port 1234 one could write a condition clause such as: when (TCPLOCALPORT = 1234 and TCPWORSTRECVQ > 7000) which would fire if the size of the RECVQ on port 1234 exceed 7000.
Logfile Monitoringshow

Sentinel supplies a suite of methods that may be used to monitor the content of log files. There are three basic methodologies that allow log files of various content types to be monitored for actionable events. The first monitoring method ingests log file lines sequentially, scanning via the use of regular expression pattern matching for key phrases that signal in sequence, an actionable event and then the clearing of that event. For example, if one were monitoring a log file that contained lines that were added each time there was a change of state to a network connection, one might look for the the phrase 'Eth0 Down' to generate a trap, and following the phrase 'Eth0 Up' to signify a clearing of the trap.

The second methodology also ingests Log File lines sequentially, but does not attempt to correlate them. Instead it looks for pattern matches and pattern exclusions to generate events for all lines that pass these tests.

The third methodology is similar to the second in that it also selects sets of passing lines, but instead of generating events on each one, it bins, counts and generates events when the type and count of event exceeds defined threshold levels. One classic use for this kind of approach is to monitor a log file that contains user log-in attempts. If a user tries to log-in too many times within a specified time frame, possibly signifying a security problem, this approach can be used to catch that behaviour.
Script Functionsshow

One of the means that Sentinel's native functionality can be extended is via the use of scripts. Sentinel has various methods that can be configured to interact with scripts generating data in the form of either JSON or keyword:value pairs. The Sentinel agent has a rich query language to extract data from a JSON document. This method is completely extensible in the sense that any application that can present data in this format can have it ingested by a Sentinel agent and used in the condition clause of a Sentinel rule. Furthermore, the scripts in question can be automatically pulled down from the Sentinel configuration server allowing for easy mass distribution of the same methodology to like machines.
Multicast Connectivityshow

As part of Sentinel's network connectivity testing methods, there are a set of functions that allow for the end to end testing of a multicast network. The basic idea is to install a specialized Sentinel executable as an echoing device in one part of the network and to have multicast packets bounced off this device and picked up across the network. Failure of the echoing indicates a loss of network connectivity.
Synthetic Transaction Functionsshow

Sentinel comes with the ability to carry on user defined transactions over a TCP connection. Rules can be configured to initiate sequences of logical handshakes between the agent's host machine and any machine reachable across the network to which it is connected. The success or failure of the ensuing data exchange can be used to generate actionable events. Additionally a method to PING any arbitrary host:ip address also exists as a test for connectivity. Large groups of machines can also be PING'd by Sentinel's dedicated PING agent.

Agents

Agents (also known as Scouts) are lightweight executables that are distributed to machines in the monitored network. There are standard operating system specific agents for Windows, Linux and Solaris that will normally form the bulk of the agents in the network. Agents download their rule bases from the Configuration Server and then repeatedly cycle through the rules, generating traps and warnings as actionable events are encountered. These events are sent to the Sentinel Server that then passing them on to interested listeners (Dashboards, Database, and Notification Engine). In addition to reporting events, agents can also be configured to report metrics data back to the Sentinel Server where it can then be used for any number of reasons such as capacity planning and troubleshooting.

Extensionsshow

In addition to extending the standard agents through script invocation, a plug-in API exists so that rule language methods may be extended. This will be useful for particular monitoring needs that are not already encompassed by the methods available in the Sentinel rule language. This API supports both C++ and Java interfaces. The facility also exists so that clients may create their own agents, often doing so when performance requirements or direct connection to critical processes is warranted, such as might be indicated by the needs of a low-latency trading operation.

In addition to the standard OS specific agents, there are specialized agents that have been created by Sentinel to perform specific tasks. Some of them are outlined in the following paragraphs.
Correlationshow

Sentinel provides a specialized Correlation Agent that like other agents attaches itself to the Sentinel Server. Instead of interacting with the native operating system and applications of the server on which it is installed, it takes as input the current event state of the entire monitored network as represented by the Sentinel Server. In this way it can correlate events from any number of machines on the network irrespective of their distribution and ownership. So for example, if there was a primary Web server in Australia serving pages and back-ups in Mumbai and Miami, the Correlation Agent could be configured to understand the joint availability of the three machines. This very powerful concept is also used in Sentinel's Business Process Monitoring wrapper to track process dependencies across independent servers.
SNMPshow

Sentinel provides an SNMP scout that can poll SNMP-enabled devices and test the returned MIB variables in Sentinel rules to generate Sentinel events. It can also receive SNMP traps and generate Sentinel events from these. The SNMP scout can be run on any host that has IP connectivity to the SNMP-enabled device to be monitored. For an SNMP-enabled server, it can be run either directly on that server or on a central server which can poll the SNMP-enabled server. For black-box type devices such as network hardware, the SNMP scout can be run only on a server which will poll the network device.
Pingshow

The ping scout can be used to do a simple ping test of a number of machines. This can be used to monitor network connectivity, or to monitor machines to which no other access is possible. The ping scout sends update messages for both itself (so that the sentinel display can show whether the scout itself is running properly) and for any machine which it has been configured to ping.
Tibrvshow

The TIBRV agent can listen for multicast on a given transport and subject, and generate events based upon its presence or absence. It can also expose message rates, subscription errors or publisher errors and generate events accordingly.
Pluginsshow

The functionality of Sentinel agents can also be extended through the use of our plugin API, which allows native code to be included into the agent in order to use any sort of application data in Sentinel rules.

Market Data and Middleware

Sentinel specializes in providing essential tools to monitor business critical Market Data and Middleware platforms. Today's market conditions demands that the integrity, reliability, and latency of these pivotal systems is a focus point for key personnel. Sentinel provides peace of mind to front office traders; business managers, enterprise managers, operational teams, and risk officers that these critical systems are being monitored to the highest standard.

Thomson Reuters RMDSshow
Sentinel provides intelligent scouts to monitor the health of your Thomson Reuters RMDS platforms globally. The starting point is to verify that the key components are running via our built-in process monitoring. Built on to this is the log file interrogation which will allow you alert on standard log events alongside ones that are pertinent to your installation. The more sophisticated stage involves our scout hooking in to the Thomson Reuters RMDS monitoring components to extract key metrics. These can be leveraged against defined thresholds used to both alert against real time issues and predict trending within your system.

Benefits
1. Monitors key services from both external vendors and internal groups
2. Ensures health across remote bridged sites
3. Tracks update rates between source and sink applications
4. Reports on the health of user and system caches
Wombatshow
Sentinel has the ability to poll Wombat administrative messages to provide a representative state of your environment. This intelligence of the current status can then be used to define conditional thresholds to ensure that you are promptly alerted to any key events.

Benefits
1. Monitors status of all feed handlers
2. Verifies current operational mode and state of handlers and notifies if changes occur
3. Reports on cache information
Database interrogationshow

Text to follow.