Tuesday 19 February 2019

Troubleshooting disk addition in multipath configuration in Linux


Introduction
Let me start by saying that the issue in this scenario was actually the result of a human error but I would still like to share the setup and steps I followed that finally helped me understand the problem and eventually resolve it.

The scenario
We received a new LUN of size 64GB from our storage team and it was to be scanned on the server, added to the multipath configuration and subsequently be used within LVM.

Step 1: Detect new disk
I executed the scsi-rescan -r and rescan-scsi-bus.sh commands so that the disk gets detected by the OS. I then used a script to get the disk device name to get the disk WWID mapping. The was basically a wrapper around the scsi_id command. For example:

[ssuri@example:~] $ sudo scsi_id -g -u -d /dev/sdamd
36005076801810718a800000000002a3b
[ssuri@example:~] $


Step 2: Add disk to udev and multipath
The disk was intended to be used for the /grid file system on a DB node. So I added the following lines in the /etc/udev/rules.d/99-dell-oracle-asm.rules file

KERNEL=="dm-*", PROGRAM="scsi_id --page=0x83 --whitelisted --device=/dev/%k", RESULT=="36005076801810718a800000000002c12", OWNER="root", GROUP="root"
ENV{DM_NAME}=="grid_pv1", OWNER="root", GROUP="root"

I wanted the disk alias name to be grid_pv1 and so I proceeded to add the corresponding entry in the /etc/multipath.conf file. I first added a line for the disk WWID in the blacklist_exceptions stanza.

wwid "36005076801810718a800000000002c12"

I then added the below stanza for the grid_pv1 disk in the multipaths stanza.

        multipath {
                uid 0
                gid 0
                wwid "36005076801810718a800000000002c12"
                alias grid_pv1
                mode 0600
        }

I then reloaded the required services with the following commands:

/sbin/service multipathd reload
/sbin/udevadm control --reload-rules
/sbin/udevadm trigger --type=subsystems --action=change
/sbin/udevadm trigger --type=devices --action=change


Step 3: Check that disk is visible in multipath configuration
After doing all the above, to my astonishment I could not see the disk in the multipath -ll output. I frantically rescanned the disk again, removed and added the udev and multipath configuration and reloaded the services numerous times. What finally caught my eye was when I saw that the disk was showing up as being blacklisted in the multipath -v3 command output.

 [ssuri@example:/var/ ] $ sudo multipath -v3 | grep -i 36005076801810718A800000000002c12
 Feb 16 05:56:30 | sdbgs: uid = 36005076801810718a800000000002c12 (callout)
 Feb 16 05:56:30 | (null): (36005076801810718a800000000002c12) wwid blacklisted
 Feb 16 05:56:30 | sdbgt: uid = 36005076801810718a800000000002c12 (callout)
 Feb 16 05:56:30 | (null): (36005076801810718a800000000002c12) wwid blacklisted
 Feb 16 05:56:30 | sdbgu: uid = 36005076801810718a800000000002c12 (callout)
 Feb 16 05:56:30 | (null): (36005076801810718a800000000002c12) wwid blacklisted
 Feb 16 05:56:30 | sdbgv: uid = 36005076801810718a800000000002c12 (callout)
 Feb 16 05:56:30 | (null): (36005076801810718a800000000002c12) wwid blacklisted
 Feb 16 05:56:30 | sdbgw: uid = 36005076801810718a800000000002c12 (callout)
 Feb 16 05:56:30 | (null): (36005076801810718a800000000002c12) wwid blacklisted
 Feb 16 05:56:30 | sdbgx: uid = 36005076801810718a800000000002c12 (callout)
 Feb 16 05:56:30 | (null): (36005076801810718a800000000002c12) wwid blacklisted
 Feb 16 05:56:30 | sdbgy: uid = 36005076801810718a800000000002c12 (callout)
 Feb 16 05:56:30 | (null): (36005076801810718a800000000002c12) wwid blacklisted
 Feb 16 05:56:30 | sdbgz: uid = 36005076801810718a800000000002c12 (callout)
 Feb 16 05:56:30 | (null): (36005076801810718a800000000002c12) wwid blacklisted

In our multipath configuration we have a blanket wwid '*' as a blacklist rule. On seeing the above output I finally realized my mistake that the WWID entry I had added in udev and multipath configuration files actually had the alphabets 'a' and 'c' capitalized which was causing the disk WWID to still show up as blacklisted. 
On this realization I immediately corrected my mistake and restarted the services and rest assured the disk was now available in the multipath -ll output.

[ssuri@example:~] $ sudo multipath -ll grid_pv1
grid_pv1 (36005076801810718a800000000002c12) dm-384 IBM,2145
size=64G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 4:0:2:194 sdep  129:16   active ready running
  |- 6:0:0:194 sdaev 67:944   active ready running
  |- 5:0:0:194 sdpv  131:336  active ready running
  |- 7:0:3:194 sdazl 69:1328  active ready running
  |- 4:0:3:194 sdgl  132:16   active ready running
  |- 6:0:1:194 sdagr 70:944   active ready running
  |- 5:0:2:194 sdto  65:608   active ready running
  `- 7:0:7:194 sdbgz 65:1648  active ready running
[ssuri@example:~] $


Conclusion:
Although this issue was the result of a typing issue I do hope this helps you in diagnosing multipath related issues that you might be facing.

No comments:

Post a Comment

Using capture groups in grep in Linux

Introduction Let me start by saying that this article isn't about capture groups in grep per se. What we are going to do here with gr...