Monitoring

There are several 3rd party options available to monitor the system, depending on your needs. Do not run other configuration management agents such as Puppet or Chef on the server! This is because CFengine (sipxsupervisor service) is already in use. Other configuration management agents will likely interfere with CFengine/sipxsupervisor functioning correctly.

  • Sipxcom has built-in SNMP alarms for your convenience beneath Diagnostics - Alarms. Be sure to check those first.

  • The sipcodes.sh script can also be used to produce high level SIP statistics ad hoc. Given the proxy log is at INFO or DEBUG verbosity the sipcodes script will automatically collect and report statistics upon snapshot collection as ./var/log/sipxpbx/sipcodes.log.

  • The SIP proxy service has a option to save proxy statistics to a json file, /var/log/sipxpbx/proxy_stats.json.

_images/system_services_proxy_stats.png
  • The /var/log/sipxpbx/proxy_stats.json file can be consumed by tools such as Grafana , ELK stacks, Graylog, Splunk, etc to create current or historical graphs.

  • Nagios , Munin, Cacti and many others can also be used to monitor service status, server health, etc.

Nagios

This section provides example configuration of a Nagios Core to monitor sipxcom services.

Prerequisites

You’ll need a separate server running Nagios Core, and administrative access to all the sipxcom servers. For this example I have compiled Nagios Core from source using the default settings rather than using a OS package. The path to files may vary if you have installed via rpm or apt. I will use the Nagios Remote Plugin Executor (NRPE) as a means to aggregate the checks, but there are alternatives available if NRPE doesn’t suit your needs. Most of the checks used are available from the standard Nagios Plugins. If you want to expand on these there are many more available from the Nagios Exchange.

Overview

If compiled from source using the defaults, Nagios Core will install to /usr/local/nagios:

# tree --charset=ASCII -d nagios/
nagios/
|-- bin
|-- etc
|   `-- objects
|-- libexec
|-- sbin
|-- share
|   |-- contexthelp
|   |-- docs
|   |   `-- images
|   |-- images
|   |   `-- logos
|   |-- includes
|   |   `-- rss
|   |       `-- extlib
|   |-- js
|   |-- media
|   |-- ssi
|   `-- stylesheets
`-- var
    |-- archives
    |-- rw
    `-- spool
        `-- checkresults

The etc/objects directory is where your host configuration files are stored. I recommend organizing hosts into groups beneath the objects directory, then grouping similar services beneath that. For example, if you have a group of three sipxcom servers at example.org:

$ mkdir /usr/local/nagios/etc/objects/example.org
$ mkdir /usr/local/nagios/etc/objects/example.org/sipx
$ touch /usr/local/nagios/etc/objects/example.org/sipx/sipx1.cfg
$ touch /usr/local/nagios/etc/objects/example.org/sipx/sipx2.cfg
$ touch /usr/local/nagios/etc/objects/example.org/sipx/sipx3.cfg

By structuring in this way the system administrator can quickly understand who it belongs to and what it does. This is also especially helpful if you intend on running Nagios in a multi tenant fashion.

Preparing a host for monitoring

Before stepping into the sipx1.cfg configuration on the Nagios server we’ll need to prepare the sipxcom server(s) for our checks. Nagios Remote Plugin Executor (NRPE) works as an aggregate point for multiple check scripts.

_images/nagios_nrpe.png

You’ll need to download and install both NRPE and the standard Nagios plugins on each host you intend on monitoring. After installing these you may wish to pause for a moment and review the check scripts now available under /usr/local/nagios/libexec. The NRPE configuration, /usr/local/nagios/etc/nrpe.cfg, was likely copied from the sample provided within the NRPE tarball. You should review this file for any environmental changes you may need to make such as partition locations:

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

The commands defined should match what is being called within the host configuration file. For example, checks for sipx1.example.org are defined on the nagios server in /usr/local/nagios/etc/objects/example.org/sipx/sipx1.cfg:

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check Users
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_users
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check Swap
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_swap
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check Load
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_load
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check Boot Partition
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_boot
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check Root Partition
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_root
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check Zombie Processes
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_zombie_procs
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check Total Processes
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_total_procs
        }

The check_command line is essentially “connect with NRPE and run xxx”. Be sure that xxx is a defined within /usr/local/nagios/etc/nrpe.cfg of the host you are checking against. For things you want to execute from the Nagios server, make certain that you’ve defined those commands in the Nagios server /usr/local/nagios/etc/objects/commands.cfg. For example I defined the SSL certificate check on my Nagios server command.cfg:

define command {
        command_name    check_ssl_certificate
        command_line    $USER1$/check_ssl_certificate -H $HOSTADDRESS$ -c 3 -w 7
       }

But in /usr/local/nagios/etc/objects/example.org/sipx1.cfg, this is defined without the check_nrpe prefix so it will execute from the Nagios server rather than on the sipx1.example.org host:

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             SSL Certificate Expiration
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_ssl_certificate
        }

Sipxcom services

Below are additional examples for sipx1.example.org that pertain to sipXcom/sipx services. These would be defined in /usr/local/nagios/etc/objects/example.org/sipx/sipx1.cfg on our Nagios server:

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             NTP
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_ntp_time!0.5!1
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check SIP Registration
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_sip_registration
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             SSH
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_ssh!-p 22
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             TFTP
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_tftp
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             FTP
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_ftp!-H sipx1.example.org -p 21
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check sipx Web UI
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_ui
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check XMPP
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_jabber
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             TCP SIP SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_tcp_sip_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             UDP SIP SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_udp_sip_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             TCP SIPS SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_tcp_sips_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             SIP TLS SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_sip_tls_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             SIP RR SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_sip_rr_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             SIP MWI SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_sip_mwi_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             XMPP client SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_xmpp_client_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             XMPP server SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_xmpp_server_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             XMPP conference server SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_xmpp_conf_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             TCP Voicemail SRV
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_tcp_vm_srv
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check SIPXCONFIG
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_sipxconfig
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check SIPXCDR
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_sipxcdr
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             Check MySQL homer.db
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_homer
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             MongoDB Connection Check
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_mongo_connect
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             MongoDB Long running ops
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_mongo_lag
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             MongoDB Operations Count
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_nrpe!check_mongo_ops
        }

define service{
        use                             generic-service
        host_name                       sipx1.example.org
        service_description             SSL Certificate Expiration
        contact_groups                  admins
        notifications_enabled           1
        check_command                   check_ssl_certificate
        }

The command definitions for all commands prefixed with check_nrpe should be defined on sipx1.example.org within /usr/local/nagios/etc/nrpe.cfg, for example:

# system checks
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/vg_root-lv_root
command[check_boot]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/vda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 200 -c 250
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 80% -c 50%
command[check_memory]=/usr/local/nagios/libexec/check_memory.pl

# sipx service checks
command[check_sipxconfig]=/usr/local/nagios/libexec/check_postgres.pl -db SIPXCONFIG --action connection
command[check_sipxcdr]=/usr/local/nagios/libexec/check_postgres.pl -db SIPXCDR --action connection
command[check_ui]=/usr/local/nagios/libexec/check_http -w5 -c 10 --ssl -H sipx1.example.org -u /sipxconfig/app
command[check_sip_registration]=/usr/local/nagios/libexec/check_registrations.sh
command[check_ntp_time]=/usr/local/nagios/libexec/check_ntp_time -H sipx1.example.org -w 0.5 -c 1
command[check_mongo_connect]=/usr/bin/python /usr/local/nagios/libexec/check_mongo -H sipx1.example.org -A connect
command[check_mongo_ops]=/usr/bin/python /usr/local/nagios/libexec/check_mongo -H sipx1.example.org -A count
command[check_mongo_lag]=/usr/bin/python /usr/local/nagios/libexec/check_mongo -H sipx1.example.org -A long

# dns checks
command[check_tcp_sip_srv]=/usr/local/nagios/libexec/check_dns -H _sip._tcp.example.org -s 127.0.0.1 -q SRV
command[check_udp_sip_srv]=/usr/local/nagios/libexec/check_dns -H _sip._udp.example.org -s 127.0.0.1 -q SRV
command[check_tcp_sips_srv]=/usr/local/nagios/libexec/check_dns -H _sips._tcp.example.org -s 127.0.0.1 -q SRV
command[check_sip_tls_srv]=/usr/local/nagios/libexec/check_dns -H _sip._tls.example.org -s 127.0.0.1 -q SRV
command[check_sip_mwi_srv]=/usr/local/nagios/libexec/check_dns -H _sip._tcp.mwi.example.org -s 127.0.0.1 -q SRV
command[check_sip_rr_srv]=/usr/local/nagios/libexec/check_dns -H _sip._tcp.rr.example.org -s 127.0.0.1 -q SRV
command[check_tcp_vm_srv]=/usr/local/nagios/libexec/check_dns -H _sip._tcp.vm.example.org -s 127.0.0.1 -q SRV
command[check_xmpp_server_srv]=/usr/local/nagios/libexec/check_dns -H _xmpp-server._tcp.example.org -s 127.0.0.1 -q SRV
command[check_xmpp_client_srv]=/usr/local/nagios/libexec/check_dns -H _xmpp-client._tcp.example.org -s 127.0.0.1 -q SRV
command[check_xmpp_conf_srv]=/usr/local/nagios/libexec/check_dns -H _xmpp-server._tcp.conference.example.org -s 127.0.0.1 -q SRV

As there are checks that are executed server side, those need to be defined in /usr/local/nagios/etc/objects/commands.cfg on the Nagios server:

define command{
command_name check_tftp
command_line $USER1$/check_tftp --get $HOSTADDRESS$ 000000000000.cfg 7167
}

define command{
command_name check_jabber
command_line $USER1$/check_jabber -H $HOSTADDRESS$ --expect='xmlns="jabber:client" from="example.org"'
}

define command {
command_name check_ssl_certificate
command_line $USER1$/check_ssl_certificate -H $HOSTADDRESS$ -c 3 -w 7
}

3rd Party Checks

check_jabber is used for XMPP checks. check_mongo is used for MongoDB checks. check_postgres is used for PostgreSQL checks. For check_sip_registration I created a shell script that utilizes sipx-dbutil.

Additional Notes

  • You may find some checks complain of missing utils.pm. If you do, check if the script is making any references to the nagios plugins directory. You may need to alter the path to /usr/local/nagios/libexec/.

  • Be sure to inspect any firewalls between your Nagios server and the sipXcom/sipx servers prior to running your checks. Some services such as ssh are restrictive by default in the sipXcom/sipx firewall.

  • It is possible to utilize sipsak to test against the SIP stack, however be aware that by default sipxcom SIP security feature will will ban the source IP address of client using default sipsak User Agent string.

  • Try not to cause unnecessary stress or bandwidth consumption on the server with your service checks. Once a day is probably good enough for a check interval for some services such as the SSL certificate check. See the “External Command Check Interval” section here : http://nagios.sourceforge.net/docs/3_0/configmain.html.

Graylog

Graylog is open source log management/aggregation software. A fully supported Commercial/Enterprise version also exists. For this example I am using the open source version on a Debian 10 server.

Installation on Debian 10

Starting from a fresh Debian 10 minimal installation:

apt-get update && apt-get upgrade -y
apt-get install apt-transport-https openjdk-11-jre-headless uuid-runtime pwgen dirmngr curl
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 4B7C549A058F8B6B
echo "deb http://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main" | tee /etc/apt/sources.list.d/mongodb-org-4.2.list
apt-get update && apt-get install mongodb-org -y
systemctl daemon-reload
systemctl enable mongod.service
systemctl restart mongod.service
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
echo "deb https://artifacts.elastic.co/packages/oss-6.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-6.x.list
apt-get update && apt-get install elasticsearch-oss -y
echo "cluster.name: graylog" >> /etc/elasticsearch/elasticsearch.yml
echo "action.auto_create_index: false" >> /etc/elasticsearch/elasticsearch.yml
systemctl daemon-reload
systemctl enable elasticsearch.service
systemctl restart elasticsearch.service
wget https://packages.graylog2.org/repo/packages/graylog-3.1-repository_latest.deb
dpkg -i graylog-3.1-repository_latest.deb
apt-get update && apt-get install graylog-server -y

For admin password as password and hash edit /etc/graylog/server/server.conf and set:

echo "password_secret = naln41C22HRxw3hy9mJ8bipFWBo1aewKFgtXDXp22dNjNJNqEtid6uC0476zIfX5iQ3mZuRp9y7h3XcNY63inPo6vJy7FuLP"
echo "root_password_sha2 = 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
echo "http_bind_address = 192.168.1.114:9000"
echo "http_publish_uri = http://192.168.1.114:9000"
systemctl enable graylog-server.service
systemctl start graylog-server.service

The Graylog webui should now be up on http://192.168.1.114:9000. Create a GELF UDP input using the default port 12201.

Fluentd on Graylog server

Fluent Bit is an open source and multi-platform Log Processor and Forwarder which allows you to collect data/logs from different sources, unify and send them to multiple destinations. It’s fully compatible with Docker and Kubernetes environments. Fluent Bit is written in C and has a pluggable architecture supporting around 30 extensions.

For this example fluentd is running on the Graylog server. It is used to convert data received into the Graylog GELF format.

# fluentd on graylog
apt-get install sudo ntp ntpdate ntpstat ruby-gelf
curl -L https://toolbelt.treasuredata.com/sh/install-debian-buster-td-agent3.sh
systemctl daemon-reload
systemctl enable td-agent
td-agent-gem install gelf
cd /etc/td-agent/plugin
wget https://raw.githubusercontent.com/emsearcy/fluent-plugin-gelf/master/lib/fluent/plugin/out_gelf.rb
cd ../

Append to /etc/td-agent/td-agent.conf:

<source>
    type syslog
    tag hostname_goes_here
</source>
<match *.*>
    type copy
    <store>
        type gelf
        host 0.0.0.0
        port 12201
        flush_interval 5s
    </store>
    <store>
        type stdout
    </store>
</match>

Restart the service and configure the service to start at boot with:

systemctl restart td-agent
systemctl enable td-agent

Fluentbit on the sipxcom server

For this example I am using fluentbit on the sipxcom server to ship logs to the fluentd instance of the Graylog server:

# fluentbit on sipx/uniteme centos7
cd /etc/yum.repos.d/
nano fluentbit.repo

Inside fluentbit.repo:

[fluentbit]
name = fluentbit
baseurl = http://packages.fluentbit.io/centos/7
gpgcheck=1
gpgkey=http://packages.fluentbit.io/fluentbit.key
enabled=1

Next update the packages:

yum update
yum install td-agent-bit -y
mv /etc/td-agent-bit/td-agent-bit.conf ~/td-agent-bit.conf.orig
nano /etc/td-agent-bit/td-agent-bit.conf

Inside td-agent-bit.conf:

[INPUT]
    Name cpu
    Tag  cpu.local
    Interval_Sec 1

[INPUT]
    Name mem
    Tag memory

[INPUT]
    Name disk
    Tag disk.local
    Interval_Sec 1

[INPUT]
    Name netif
    Tag netif.eth0
    Interval_Sec 1
    Interface eth0

[INPUT]
    Name health
    Tag health.proxy
    Host 192.168.2.14
    Port 5060
    Interval_Sec 60
    Alert true
    Add_Host true
    Add_Port true

[INPUT]
    Name health
    Tag health.registrar
    Host 192.168.2.14
    Port 5070
    Interval_Sec 60
    Alert true
    Add_Host true
    Add_Port true

[INPUT]
    Name health
    Tag health.bridge
    Host 192.168.2.14
    Port 5090
    Interval_Sec 60
    Alert true
    Add_Host true
    Add_Port true

[INPUT]
    Name health
    Tag health.mongo
    Host 127.0.0.1
    Port 27017
    Interval_Sec 60
    Alert true
    Add_Host true
    Add_Port true

[INPUT]
    Name health
    Tag health.pgsql
    Host 127.0.0.1
    Port 5432
    Interval_Sec 60
    Alert true
    Add_Host true
    Add_Port true

[INPUT]
    Name health
    Tag health.dns
    Host 127.0.0.1
    Port 53
    Interval_Sec 60
    Alert true
    Add_Host true
    Add_Port true

[INPUT]
    Name tail
    Path /var/log/sipxpbx/proxy_stats.json
    Refresh_Interval 1
    Parser json

[OUTPUT]
    Name  forward
    Match *
    Host 192.168.1.114
    Port 24224

And finally restart the service:

service td-agent-bit restart

You should now see Graylog reporting activity on the GELF input.