Abstract

Monitoring our services and hosts is very important. We have some productive systems on there, e.g. Mailserver, Drupal Hosting or GIT Services. If these services are down for a while, the responsible persons will get called by all the people who have their Mails and Stuff on there. This is no good and its annoying.

Monitoring Services

We have currently two differnt Monitoring Services:

Both services have nice webinterfaces. The collectd system has a small client on each system which needs to be monitored. collectd can catch up some data, e.g. load, memory, cpu, interfaces, df, ... These data is processed by nagios and shown as status (Service Up, Service Warning, Service Down, Service Unknown) and Notification Mails are send.

Infrastructure

We have an own collectd server running in VZ Kronos on caeli. Because caeli crushes itself too often in the last weeks, ReOx decided to buy a small VPS Server and run his own collectd and nagios. All Hostservers (e.g. caeli) runs a collectd-client and send the data to both collectd servers. All important systems are monitored with PING and PING6 from nagios. Also HTTP, IMAP, SMTP and other services which are reachable from the outer world are monitored by nagios. All other information is processed by collectd-nagios, a service which can show the information from collectd in a format for nagios.

Notification

Notification only makes sense, if the user (=admin) is reading his mails and the mailserver is not on a system which is monitored. (otherwise mails will not be send in case of crash)

Basic Concept is to use external Mailserver like Gmail, Chello or another provider. Most people have an backup adress for these reasons. Another concept is to use the monitoring system as very small mailserver with POP3 or IMAP access to handle the mails. This means that the monitoring server itself is also a mailserver.

In this case the notification mail is send to a special mail adress, e.g. alert@domain.tld and is then distributed by the mailserver to everyones private mail or direct on the mailserver.

because gmail has great implementation on android phones, a private notification mailbox at googles mailservice is advised.

Access

The Access to the web based monitor platform must be restricted. These Information contains a potential risk for black hat attacks, because you can see our internal services, with versions, network map and so on. because an ldap connection cannot be made, only one single user is planned to have access to the webplatform.

to secure the connection, SSL should be used here.

Emergency Concept

When really critical: Inform others directly: jabber, sms, etc When you can fix it: fix it! (e.g. restart of server, when crashed, restart of service when hung)

See also: FeatureSpecs/EmergencyConcept

42Wiki: Monitoring (last edited 2011-08-02 11:16:31 by SebastianBachmann)