在Ubuntu14.04上使用Monit进行LEMP堆栈监控

介绍

Monit 是一个免费的开源服务监控应用程序，可以执行各种基于事件的操作。 Monit 可以发送电子邮件通知、重新启动服务或应用程序，或采取其他响应操作。

本教程将建立在一个基本的 LEMP 堆栈（Linux、Nginx、MySQL、PHP）之上。将合并 Monit 以监控堆栈中的所有服务，并提醒根用户注意任何不利条件。

可选的外部监控服务器也可用于远程监控 Web 应用程序或其他服务。

先决条件

在我们开始之前，您首先需要设置一个 Ubuntu 14.04 Droplet
您将需要一个具有 sudo 权限的标准用户帐户
本教程将 Monit 添加到现有的 LEMP 堆栈中。有关如何创建初始 LEMP 堆栈的教程，请参阅如何在 Ubuntu 14.04 上安装 Linux、nginx、MySQL、PHP (LEMP) 堆栈
可选：如果您想监控远程网站、DNS 或邮件服务器，您应该为该服务器设置一个可公开访问的域或 IP 地址（步骤 6 中有更多信息）

第 1 步 — 为监控通知配置电子邮件传递

系统监控的一部分通常涉及警报的电子邮件通知。因此，必须正确发送电子邮件，以便 Monit 发送电子邮件通知。典型的 Monit 警报电子邮件将类似于以下内容：

From: monit@example.com 
To: root@yourserver.com

Resource limit matched Service example.com

        Date:        Mon, 22 Dec 2014 03:04:06
        Action:      alert
        Host:        example.com
        Description: cpu user usage of 79.8% matches resource limit [cpu user usage>70.0%]

Your faithful employee,
Monit

本教程将设置 Monit 以在每次触发警报时向您发送电子邮件。

注意： 默认情况下，Monit 的通知可能会转到您的垃圾邮件文件夹。必须正确配置反向 DNS（称为 PTR 记录），以确保成功传递邮件的可能性最高。您的 Droplet 的主机名必须与其完全限定的域名 (FQDN) 匹配，因此，例如，它们都可以是 hostname.example.com。要编辑 DigitalOcean Droplet 的 PTR 记录，请访问 DigitalOcean 控制面板。导航到设置并选择 重命名 选项卡。输入新主机名并单击 重命名 。

本指南假设您没有预先存在的邮件传输代理 (MTA)，因此我们将安装 Postfix。 Postfix 的本地安装允许系统将通知电子邮件发送到外部邮件提供商，例如 Gmail 或 Yahoo。

要开始安装 Postfix 作为您的 MTA，首先更新系统的存储库源列表。

sudo apt-get update

然后从 Ubuntu 的存储库安装 Postfix 和 GNU Mailutils 软件包。

sudo apt-get install postfix mailutils

在安装快结束时，系统将提示您选择服务器配置类型，如下面的屏幕截图所示。选择互联网站点。

当提示输入 系统邮件名称 时，请使用您的 Droplet 的完全限定域名 (FQDN)。 注意：系统邮件名称也可以稍后在/etc/mailname中更改。

接下来，打开文件 /etc/aliases 进行编辑。本指南将使用 Nano，但您可以使用您喜欢的任何文本编辑器。

sudo nano /etc/aliases

在这里，我们将添加一个个人电子邮件地址，我们将在其中接收 Monit 的通知电子邮件。这些邮件通知将来自我们 LEMP 服务器的根用户。

postmaster: root
root: myemail@gmail.com

如果需要，还可以添加多个目的地：

root: username, itstaff@mycompany.com, otherperson@other.com

保存更改并退出 Nano。然后运行以下命令来更新别名文件：

sudo newaliases

可以从您的 Droplet 发送测试消息以检查邮件传递。如果测试邮件不是第一次出现在您的收件箱中，请检查垃圾邮件文件夹。

echo test | mail -s "test message from my VPS" root

第 2 步 — 安装和配置 Monit

Monit 在 Ubuntu 软件包存储库中也可用。有关 Monit 的简要参考指南，请参阅本教程。

Monit 可以安装在您的 LEMP 服务器上：

sudo apt-get install monit

在 Ubuntu 14.04 上，Monit 配置文件位于 /etc/monit/ 中，主 Monit 配置文件是 /etc/monit/monitrc。

在 Nano 中打开 monitrc 进行编辑：

sudo nano /etc/monit/monitrc

取消注释以下行并更改它们以匹配如下所示：

set mailserver localhost    #Use localhost for email alert delivery.

set mail-format {
      from: monit@$HOST
   subject: monit alert --  $EVENT $SERVICE
   message: $EVENT Service $SERVICE
                 Date:        $DATE
                 Action:      $ACTION
                 Host:        $HOST
                 Description: $DESCRIPTION

            Your faithful employee,
            Monit
}

set alert root@localhost not on { instance, action }   #Set email address to receive alerts. This guide uses root mail.

仍在 monitrc 文件中，现在取消注释以下行并更改 example.com 以匹配您的服务器的域或 IP 地址。

check system example.com
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 2 then alert
    if memory usage > 75% then alert
    if swap usage > 25% then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 20% then alert

我们还将在文件末尾添加此条目：

check filesystem rootfs with path /  #Alert if low on disk space.
    if space usage > 90% then alert

保存更改并退出 Nano。

第 3 步 — 在 Monit 中为 LEMP 服务配置服务监控

在 Ubuntu 14.04 上，Monit 配置可以直接在 /etc/monit/monitrc 文件中指定，也可以通过 /etc/monit/conf.d/ 中的单个文件指定。在本教程中，将在 /etc/monit/conf.d/ 目录下创建单个文件。

首先，我们将为 Monit 提供管理服务的方法。为了在本教程中简单起见，我们将所有进程监控放在位于 /etc/monit/conf.d/lemp-services 的单个文件中。使用以下条目，Monit 将监视 Nginx、MySQL 和 PHP-FPM，并在它们因任何原因异常停止时重新启动这些服务。

我们可以使用 Nano 创建工作文件：

sudo nano /etc/monit/conf.d/lemp-services

在我们的 LEMP 堆栈中为服务添加以下条目：

check process nginx with pidfile /var/run/nginx.pid
    group www-data
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"
    
check process mysql with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"
        
check process php5-fpm with pidfile /var/run/php5-fpm.pid
    start program = "/etc/init.d/php5-fpm start"
    stop program = "/etc/init.d/php5-fpm stop"

然后保存您的更改。

第 4 步 — 添加操作以重新启动不健康的 LEMP 服务

现在 Monit 能够管理选择的服务，可以添加操作以根据需要重新启动服务。例如，Monit 具有监控 TCP 连接的能力。如果服务器不再证明 HTTP 连接，Monit 可以重新启动 PHP-FPM 或 Nginx 以自动解决问题。

为了构建我们现有的配置，我们现在将进一步编辑 /etc/monit/conf.d/lemp-services。我们将在下面添加的显示为红色，如果 HTTP 连接不再可用，我们将告诉 Monit 重新启动 Nginx 和 PHP-FPM。此外，如果套接字不可用，我们将让 Monit 重新启动 MySQL。

注意： 确保使用您在第一个和第三个条目中看到 example.com 的 Droplet 的域或 IP 地址。

check process nginx with pidfile /var/run/nginx.pid
    group www-data
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"
    if failed host example.com port 80 protocol http then restart
    if 5 restarts within 5 cycles then timeout
    
check process mysql with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"
    if failed unixsocket /var/run/mysqld/mysqld.sock then restart
    if 5 restarts within 5 cycles then timeout
    
check process php5-fpm with pidfile /var/run/php5-fpm.pid
    start program = "/etc/init.d/php5-fpm start"
    stop program = "/etc/init.d/php5-fpm stop"
    if failed host example.com port 80 protocol http then restart
    if 5 restarts within 5 cycles then timeout

保存更改并关闭 Nano。然后重新启动 Monit 以应用您到目前为止所做的配置更改。

sudo service monit restart

第 5 步（可选）- 监控错误和关键字日志

Monit 还可以监控特定关键字的日志，然后执行操作或发送警报。这在 Web 应用程序遇到问题或团队需要从日志中通知特定回溯或事件的情况下很有帮助。

下面是一个 Nginx 日志示例，其中包含 Monit 可以监视的超时错误并发出警报：

2014/12/22 11:03:54 [error] 21913#0: *202571 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 2600:3c01::f03c:91ff:fe6e:5a91, server: example.com, request: "GET /wp-admin/admin-ajax.php?action=wordfence_doScan&isFork=1&cronKey=40cb51ccsdfsf322fs35 HTTP/1.0", upstream: "fastcgi://unix:/var/run/example.com.sock", host: "example.com"

基于我们现有的配置，在 Nano 中再次打开您的 LEMP 服务配置文件。

sudo nano /etc/monit/conf.d/lemp-services

添加以下条目。当 Nginx 与 PHP-FPM 通信发生任何超时时，这将发送通知。

check file nginx-error with path /var/log/nginx/error.log
    if match "^timed out" then alert

保存更改并关闭 Nano。然后重新启动 Monit 以使更改生效：

sudo service monit restart

第 6 步（可选）- 使用 Monit 监控远程网站和其他服务

除了在本地使用 Monit，Monit 还可以监视各种外部服务和连接。在本例中，我们将使用我们已经设置的本地 Monit 实例，并为外部服务添加一些新的监控配置。

出于带外目的，最好在完全不同的数据中心拥有一个外部 Monit 系统。如果 Web 应用程序位于纽约，那么在旧金山拥有一个小型外部 Monit 服务器将是理想的。

以下是可以在运行 Monit 的第二台主机上实施的外部 Monit 检查的示例。这些示例将放置在外部服务器的 /etc/monit/conf.d/lemp-external 文件中，以在 remote-example.com 上远程检查我们的 LEMP 堆栈。

使用 Nano 创建此配置文件：

sudo nano /etc/monit/conf.d/lemp-external

监控 ICMP 响应和 HTTP 和 HTTPS 连接：

# ICMP check
check host remote-example.com with address remote-example.com
    if failed icmp type echo
        for 5 times within 5 cycles
        then alert

# HTTP check
    if failed 
          port 80 protocol http 
       for 5 times within 5 cycles
       then alert        

# HTTPS check
    if failed 
          port 443 type tcpSSL protocol http 
       for 5 times within 5 cycles
       then alert

监控 DNS：

check host ns1.example.com with address ns1.example.com
    if failed port 53 type udp protocol dns then alert

监控 SMTP：

check host smtp.example.com with address smtp.example.com
    if failed port 25 type tcp protocol smtp then alert

监控 Web 应用程序的 Healthcheck URL

对于 Web 应用程序，Monit 还可以对健康检查 URL 执行特定请求。以下是站点 remote-example.com 的示例，其运行状况检查 URL 为：https://remote-example.com/healthcheck。

check host remote-example.com with address remote-example.com
    if failed 
          port 443 type tcpSSL protocol http 
       request "/healthcheck"     
       for 5 times within 5 cycles
       then alert

第 7 步 — 从命令行管理监视器

Monit 还提供了一个命令行实用程序。从那里，可以使用简单的命令来检查整体监控状态并完成有用的任务，例如临时启动或停止监控。

为了从命令行运行 Monit 状态检查，必须启用 Monit Web 服务。为此，在 Nano 中打开 /etc/monit/monitrc 进行编辑。

sudo nano /etc/monit/monitrc

取消注释以下行以在本地启用 Web 服务：

set httpd port 2812 and
        use address localhost
        allow localhost

保存更改并退出 Nano。然后重启监视器：

sudo service monit restart

现在可以从命令行检查 Monit 的状态。

以下是临时禁用和启用监控的命令：

sudo monit unmonitor all

sudo monit monitor all

第 8 步 — 查看报告

让我们看一下我们一直在设置的所有检查的报告。

sudo monit status

现在，您将看到配置 Monit 以检查的所有内容的输出，包括本地 LEMP 服务和任何外部检查：

sudo monit status
The Monit daemon 5.6 uptime: 0m 

System 'example.com'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.00] [0.01] [0.05]
  cpu                               0.5%us 0.4%sy 0.0%wa
  memory usage                      115132 kB [22.9%]
  swap usage                        0 kB [0.0%]
  data collected                    Mon, 22 Dec 2014 16:50:42

Filesystem 'rootfs'
  status                            Accessible
  monitoring status                 Monitored
  permission                        755
  uid                               0
  gid                               0
  filesystem flags                  0x1000
  block size                        4096 B
  blocks total                      5127839 [20030.6 MB]
  blocks free for non superuser     4315564 [16857.7 MB] [84.2%]
  blocks free total                 4581803 [17897.7 MB] [89.4%]
  inodes total                      1310720
  inodes free                       1184340 [90.4%]
  data collected                    Mon, 22 Dec 2014 16:50:42

Process 'nginx'
  status                            Running
  monitoring status                 Monitored
  pid                               14373
  parent pid                        1
  uptime                            28m 
  children                          4
  memory kilobytes                  1364
  memory kilobytes total            9228
  memory percent                    0.2%
  memory percent total              1.8%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                0.018s to example.com:80 [HTTP via TCP]
  data collected                    Mon, 22 Dec 2014 16:50:42

Process 'mysql'
  status                            Running
  monitoring status                 Monitored
  pid                               12882
  parent pid                        1
  uptime                            32m 
  children                          0
  memory kilobytes                  44464
  memory kilobytes total            44464
  memory percent                    8.8%
  memory percent total              8.8%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  unix socket response time         0.000s to /var/run/mysqld/mysqld.sock [DEFAULT]
  data collected                    Mon, 22 Dec 2014 16:50:42

Process 'php5-fpm'
  status                            Running
  monitoring status                 Monitored
  pid                               17033
  parent pid                        1
  uptime                            0m 
  children                          2
  memory kilobytes                  13836
  memory kilobytes total            22772
  memory percent                    2.7%
  memory percent total              4.5%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                0.018s to example.com:80 [HTTP via TCP]
  data collected                    Mon, 22 Dec 2014 16:50:42

File 'nginx-error'
  status                            Accessible
  monitoring status                 Monitored
  permission                        644
  uid                               0
  gid                               0
  timestamp                         Mon, 22 Dec 2014 16:18:21
  size                              0 B
  data collected                    Mon, 22 Dec 2014 16:50:42

Remote Host 'example.com'
  status                            Online with all services
  monitoring status                 Monitored
  icmp response time                0.021s [Echo Request]
  port response time                0.107s to example.com:443 [HTTP via TCPSSL]
  port response time                0.062s to example.com:80 [HTTP via TCP]
  data collected                    Mon, 22 Dec 2014 16:50:42

使用这些数据检查您的服务的运行状况并查看有用的统计数据。

故障排除

如果出现任何问题，请首先检查位于 /var/log/monit.log 的 Monit 日志。这将为您提供有关问题性质的更多信息。

错误日志条目示例：

[UTC Dec 22 13:59:54] error    : ICMP echo response for example.com 1/3 timed out -- no response within 5 seconds
[UTC Dec 22 14:10:16] error    : ICMP echo response for example.com 1/3 timed out -- no response within 5 seconds
[UTC Dec 22 15:24:19] error    : 'example.com' failed protocol test [HTTP] at INET[example.com:80] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable
[UTC Dec 22 15:57:15] error    : ICMP echo response for example.com 1/3 timed out -- no response within 5 seconds
[UTC Dec 22 17:00:57] error    : ICMP echo response for example.com 1/3 timed out -- no response within 5 seconds
[UTC Dec 22 17:49:00] error    : 'example.com' failed, cannot open a connection to INET[example.com:443/API] via TCPSSL

结论

完成本指南后，您现在应该已将 Monit 配置为在 Ubuntu 14.04 上监控 LEMP 堆栈。 Monit 具有很强的可扩展性，可以轻松定制或扩展，用于监控小型和大型网络的各种服务。

以下是 Monit 的一些附加链接：