Sunday, May 25, 2008

Installing LSI Logic RAID monitoring tools under the ESX service console

As I discussed in a recent post, I used a Dell Perc 5i SAS controller in my ESX whitebox server. One of the nice features of this controller is that it is a rebranded LSI Logic controller (with a different board layout!), supported by LSI Logic firmwares and the excellent monitoring tools that LSI offers.

Of course, it is important to keep track of your RAID array status, so I decided to install the MegaCLI monitoring software under the ESX Server 3.5 Service Console. Here's how I did it and configured the monitoring on my system:
  • The MegaCLI software can be downloaded from the LSI Logic website. I used version 1.01.39 for Linux, which comes in a RPM file.

  • After uploading the RPM file to the service console, it was a matter of installing it using the "rpm" command:

    rpm -i -v MegaCli-1.01.39-0.i386.rpm

    This installs the "MegaCli" and "MegaCli64" commands in the /opt/MegaRAID/MegaCli/ directory of the service console.
That's it, MegaCLI is ready to be used now. Some useful commands are the following:
  • /opt/MegaRAID/MegaCli/MegaCli -AdpAllInfo -aALL
    This lists the adapter information for all LSI Logic adapters found in your system.

  • /opt/MegaRAID/MegaCli/MegaCli -LDInfo -LALL -aALL
    This lists the logical drives for all LSI Logic adapters found in your system. The "State" should be set to "optimal" in order to have a fully operational array.

  • /opt/MegaRAID/MegaCli/MegaCli -PDList -aALL
    This lists all the physical drives for the adapters in your system; the "Firmware state" indicates whether the drive is online or not.
The next step is to automate the analysis of the drive status and to alert when things go bad. To do this, I added an hourly cron job that lists the physical drives and then analyzes the output of the MegaCLI command.
  • I created a file called "analysis.awk" in the /opt/MegaRAID/MegaCLI directory with the following contents:

    # This is a little AWK program that interprets MegaCLI output

    /Device Id/ { counter += 1; device[counter] = $3 }
    /Firmware state/ { state_drive[counter] = $3 }
    /Inquiry/ { name_drive[counter] = $3 " " $4 " " $5 " " $6 }
    END {
    for (i=1; i<=counter; i+=1) printf ( "Device %02d (%s) status is: %s <br/>\n", device[i], name_drive[i], state_drive[i]); }

    This awk program processes the output of MegaCli, as you can test by running the following command:

    ./MegaCli -PDList -aALL | awk -f analysis.awk

    when being in the /opt/MegaRAID/MegaCLI directory.

  • Then I created the cron job by placing a file called raidstatus in /etc/cron.hourly, with the following contents:

    #!/bin/sh

    /opt/MegaRAID/MegaCli/MegaCli -PdList -aALL| awk -f /opt/MegaRAID/MegaCli/analysis.awk >/tmp/megarc.raidstatus

    if grep -qEv "*: Online" /tmp/megarc.raidstatus
    then
    /usr/local/bin/smtp_send.pl -t tim@pretnet.local -s "Warning: RAID status no longer optimal" -f esx@pretnet.local -m "`cat /tmp/megarc.raidstatus`" -r exchange.pretnet.local
    fi

    rm -f /tmp/megarc.raidstatus
    exit 0

    Don't forget to run a chmod a+x /etc/cron.hourly/raidstatus in order to make the file executable by all users.
In order to send an e-mail when things go wrong, I used the SMTP_Send Perl script smtp_send.pl that was discussed by Duncan Epping on his blog.

Thursday, May 22, 2008

Renaming a VirtualCenter 2.5 server

After running my VirtualCenter server on a standalone host for quite some time, I decided to join it into the domain that I am running on my ESX box (in order to let it participate in the automated WSUS patching mechanism). This also seemed like a perfect opportunity to rename the server's hostname from W2K3-VC.pretnet.local to virtualcenter.pretnet.local. However, after the hostname change, the VMWare VirtualCenter service would no longer start with an Event ID 1000 in the eventlog.

Somehow, this didn't come as a surprise ;). This has been discussed before on the VMWare forums (here and here), but I post it here because I did not immediatelly find a step-by-step walkthrough.

The problem was in fact twofold, the solution rather simple:
  • Renaming SQL servers is a bad idea in general (so it appears). For my small, nonproduction environment, I use SQL Server 2005 Express edition that comes with the VirtualCenter installation. If you rename a SQL server, you need to internally update the system tables using a set of stored procedures in order to make everything consist again. This is done by installing the "SQL Server Management Studio Express" and then executing the following TSQL statements:

    sp_dropserver 'W2K3-VC\SQLEXP_VIM'
    GO
    sp_addserver 'VIRTUALCENTER\SQLEXP_VIM', local
    GO
    sp_helpserver
    SELECT @@SERVERNAME, SERVERPROPERTY('ServerName')


    The first statement removes the old server instance (replace W2K3-VC with your old server name), the second statement adds the new server instance (replace VIRTUALCENTER with your new server name). The sp_helper and SELECT statement query the internal database and variables for the actually recognized SQL server instances. You need to perform a reboot in order to get the proper instances with the last two statements.

  • Secondly, the System ODBC connection that is used by VMWare required an update to point to the new SQL Server instance. This was of course done using the familiar "Data Sources (ODBC)" management console.
Afterwards, the VMWare Virtual Center Server service started just fine again.

Friday, May 2, 2008

Enabling Subject Alternate Name certificates

When requesting certificates from your freshly installed Certification Authority, it can come in handy to specify multiple DNS names that this certificate should be valid for. This principle is known as specifying a list of "subject alternate names" that the server is also reachable under.

Unfortunately, this mechanism doesn't work out of the box with Windows CA's. On your CA, you first need to enable a setting that allows the usage of SAN attributes. Open a command box and type (on one line):

certutil -setreg policy\EditFlags +EDITF_ATTRIBUTESUBJECTALTNAME2

net stop CertSvc & net start CertSvc

Afterwards, use the SAN:dns=&dns= attribute when requesting certificates to enable multiple DNS names.