A system engineer's notes: February 2017

Sunday, 26 February 2017

Script to calculate storage space allocated to a Solaris zone

In this article I'd like to share a quick script to calculate the total storage RAW plus LOFS allocated to a Solaris zone.

The script would need to be run from the global zone and it's usage will be:

./zone_storage_cal.sh <zone name>

Here is the script:

[ssuri@localhost:~] $ cat zone_storage_cal.sh
#!/usr/bin/bash

ZONE=$1

DISK_LIST=$(cat /etc/zones/${ZONE}.xml | grep "device match" | awk -F/ '{print $4}')

FORMAT_OUT="/export/home/`whoami`/format_out"

>$FORMAT_OUT

echo | sudo format >> $FORMAT_OUT

DISK_OUT="/export/home/`whoami`/disk_out"
>$DISK_OUT

for DISK_NAME in ${DISK_LIST}

do

DISK=$(cat $FORMAT_OUT | grep -w ${DISK_NAME%s0\"} | awk '{print $2}')
SIZE=$(cat $FORMAT_OUT | grep -w ${DISK_NAME%s0\"} | awk '{print $3}' | tr -d "<>" | awk -F- '{print $4}' | tr -d "GB")

echo "${DISK} ${SIZE}" >> $DISK_OUT

done

RAW_DISK=$(cat $DISK_OUT | awk '{sum+=$2} END {print sum}')

echo "total RAW disk device size is ${RAW_DISK} GB"

##LOFS mount size calculation##

LOFS_LIST=$(cat /etc/zones/${ZONE}.xml | grep lofs | egrep -v "/var|/usr" | awk '{print $2}' | awk -F= '{print $2}' | tr -d \")

TOTAL_LOFS_SIZE=0

for LOFS in ${LOFS_LIST}
do

SIZE=$(df -k | cat | egrep "${LOFS}" | awk '{print $2}')

let "TOTAL_LOFS_SIZE+=$SIZE"

done

TOTAL_LOFS_SIZE_GB=$( echo "scale=2; ${TOTAL_LOFS_SIZE}/1024/1024" | bc -l)

if [ ! -z ${TOTAL_LOFS_SIZE_GB} ]
then
echo "Total size of LOFS mounts is ${TOTAL_LOFS_SIZE_GB} GB"
fi

##end LOFS calculation##

##print total size##

if [ ! -z ${TOTAL_LOFS_SIZE_GB} ] && [ ! -z ${RAW_DISK} ]
then
TOTAL_SIZE=$(echo "scale=2; ${TOTAL_LOFS_SIZE_GB} + ${RAW_DISK}" | bc -l)
echo "RAW disk and LOFS size combined is ${TOTAL_SIZE} GB"
fi

The script uses the zones's XML file to get the details of any raw disk devices exported to the zone and also gather information about loop back file systems mounted on the zones.

The addition part is done using let. I didn't have to use bc because I opted not to be floating point data for initial calculations.

In the final calculations I've used bc to provide a more accurate final result.

Fixing errors while starting networker service in Linux

I received a concern from backup admins that they were unable to start EMC networker service on a Linux VM.

I tried to start the service, the command ran but there was no output and the service did not start.

/etc/init.d/networker start

I then checked the log file daemon.raw under the path /nsr/logs and found the below error logs:

nsrexecd RAP critical 162 Attributes '%s' and/or '%s' of the %s resource do not resolve to the machine's hostname '%s'. To correct the error, it may be necessary to delete the %s database
138906 1487648740 5 3 20 3141699360 60636 0 net-client.emrsn.org nsrexecd RAP critical 162 Attributes '%s' and/or '%s' of the %s resource do not resolve to the machine's hostname '%s'. To correct the error, it may be necessary to delete the %s database. 5 11 9 2335:name 11 17 51954:my hostname 11 11 89897:NSRLA 12 25 net-client.emrsn.org 11 11 89897:NSRLA
6919 1487648740 5 1 2 2885678848 60636 0 net-client.emrsn.org nsrexecd SYSTEM critical 52 Unable to register %ld version %ld on tcp. Aborting. 2 2 6 390436 2 1 1
90307 1487648740 5 5 0 3141699360 60636 0 net-client.emrsn.org nsrexecd NSR critical 43 Unable to start the authentication service. 0
138906 1487648790 5 3 20 2653454400 60673 0 net-client.emrsn.org nsrexecd RAP critical 162 Attributes '%s' and/or '%s' of the %s resource do not resolve to the machine's hostname '%s'. To correct the error, it may be necessary to delete the %s database. 5 11 9 2335:name 11 17 51954:my hostname 11 11 89897:NSRLA 12 25 net-client.emrsn.org 11 11 89897:NSRLA
6919 1487648790 5 1 2 2726954752 60673 0 net-client.emrsn.org nsrexecd SYSTEM critical 52 Unable to register %ld version %ld on tcp. Aborting. 2 2 6 390436 2 1 1
90307 1487648790 5 5 0 2653454400 60673 0 net-client.emrsn.org nsrexecd NSR critical 43 Unable to start the authentication service. 0
6919 1487649419 5 1 2 2456004352 63410 0 net-client.emrsn.org nsrexecd SYSTEM critical 52 Unable to register %ld version %ld on tcp. Aborting. 2 2 6 390436 2 1 1
90307 1487649419 5 5 0 2632775456 63410 0 net-client.emrsn.org nsrexecd NSR critical 43 Unable to start the authentication serv ice. 0

After some investigation, I came up with the following steps that finally resolved all the above mentioned errors:

Step1: Recreate the /nsr folder.

[root@net-client ~]# ls -ld /nsr
drwxr-xr-x 10 root root 4096 Feb 21 03:46 /nsr
[root@net-client ~]# mv /nsr /nsr.bkp
[root@net-client ~]# mkdir /nsr
[root@net-client ~]# ls -ld /nsr
drwxr-xr-x 2 root root 4096 Feb 21 04:51 /nsr
[root@net-client ~]#

Step 2: Make sure /etc/hosts file entries are correct.

The /etc/hosts file on the problematic server had the loopback entry missing because of which the service was unable to start. I added the entry "127.0.0.1 localhost" and was able to start the service successfully.

[ssuri@net-client:/etc/init.d] $ sudo bash -x /etc/init.d/networker start
+ NSRRC=/nsr/nsrrc
+ NSR_ENVEXEC=/opt/nsr/admin/nsr_envexec
+ NETWORKERRC=/opt/nsr/admin/networkerrc
+ case $1 in
+ echo 'starting NetWorker daemons:'
+ '[' -f /usr/sbin/nsrexecd ']'
+ '[' -f /usr/sbin/NetWorker.clustersvr ']'
+ /opt/nsr/admin/nsr_envexec -u /nsr/nsrrc -s /opt/nsr/admin/networkerrc /usr/sbin/nsrexecd
+ /usr/bin/tee /dev/console
+ echo ' nsrexecd'
+ '[' -f /usr/sbin/lgtolmd ']'
+ '[' -f /usr/sbin/nsrd -a '!' -f /usr/sbin/NetWorker.clustersvr ']'
+ '[' -d /var/lock/subsys ']'
+ touch /var/lock/subsys/networker
[ssuri@net-client:/etc/init.d] $ ps -ef | grep -i nsr
root 65329 1 0 04:59 ? 00:00:00 /usr/sbin/nsrexecd

I'm sure that a lot more diagnostics can be done from the backup server side but since this was clearly a client side misconfiguration, a deep investigation wasn't needed.

Friday, 17 February 2017

Perl script to calculate total storage allocated to a Linux server

In a previous article I created a small shell script to calculate the total disk space allocated to a Linux server. I used the lsblk command to gather the disk sizes.

In this article I'd like to accomplish the same task via perl.

This is mainly to demonstrate how we can accommodate perl arrays and regular expressions to accomplish a task which would require the use of awk in case we were using bash shell.

So, here is the perl program I wrote:

#!/usr/bin/perl -w
#
use strict;
use Data::Dumper;

$|=1;

my @lsblk = `lsblk` ;
my $disk_total = 0;

foreach (@lsblk) {
chomp;

if ($_ =~ m|\w* #match 0 or more alphanumeric characters
\s* #match 0 or more spcaces
disk #match the word disk
\s* # match 0 or more spaces after the word disk
|sigx)
{

my @out = split /\s+/, $_;
$out[3] =~ /(\d*)G/sig ;
print "disk $out[0] is of size $1 GB \n";
$disk_total+=$1;
}

}

print "total disk space allocated is $disk_total GB \n";

The execution of the program gave the following output:

[root@alive ~]# ./sum.pl

disk sda is of size 20 GB

disk sdb is of size 2 GB

total disk space allocated is 22 GB

Something interesting in the program is the presence of comments and white spaces within the regular expression. This is accomplished by adding the x option at the end of the regular expression.

In m| | the m character stands for match and is used to enclose the regular expression. We needed to mention m because we did not use the default regular expression enclosures which are two forward slashes (/ /).

Among the remaining options:

s allows matching across new lines
i allows case insensitive matches
g allows global matches.

Sunday, 12 February 2017

Inter-process communication with named pipes

As system administrators we frequently use pipes while working on the command line. Before getting to named pipes I'd like to start by describing unnamed pipes.
An unnamed pipe is the pipe (|) symbol we use to serve the output of one command as the input of another command. In system admin talk we commonly refer the usage of pipes as "piping out the output of a command".
For example, if I need to get the number of files/directories in my current working directory I could type ls | wc -l.

Unnamed pipes are short lived as they exist only for the duration of the command execution.

When we use named pipes however, we actually create a file of type pipe with the mkfifo command.
For example mkfifo myfile.

When I ran the ls -l command to view information on my named pipe, you can see the letter "p" at the very start of the output denoting that it is a pipe file and the color code of the file is also different from that of a regular file. When I run ls -F command we can see a pipe (|) symbol after the file name again indicative that the file is a named pipe.

Now that we've created our named pipe lets use it.

If I need to get a list of processes spawned by the user sa, I could type ps -ef | egrep "sa\s+" and pipe it to wc -l to get a count of the number of processes.
Instead of using an unnamed pipe, let's redirect the output of ps -ef | egrep "sa\s+" to our named pipe.

From the output you can ascertain that we did not get our prompt back after we directed the output of the command to the named pipe.
That is because we have yet to use the pipe as input to another process. Once we do that we'll get our prompt back.
So let's use our named pipe as standard input to the wc -l command in another terminal window and see what happens.

As soon as we hit enter after serving our named pipe as stdin to the wc -l command, we got our prompt back on the previous terminal window where we had redirected the output of our ps command to the named pipe. So the pipe serves as the link between the two processes such that output from one command/process is considered as the input to the second command/process thereby establishing inter-process communication.
Since our named pipe is not a regular file it will not store the output of the ps command we redirected to it earlier and it's size will be zero. So if we do a cat of the pipe file nothing will be displayed on screen and we won't get our prompt back unless we hit ctrl+c.

This article was an introduction to inter-process communication using named pipes and intended to serve as some food for thought. We can do more interesting things with named pipes.

Saturday, 11 February 2017

Accepting user input in Python

The ability to accept user input is a must in any programming language and the python interpreter is no different such that it provides us with a neat and easy ways to accept and process user input.

In this article I'll demonstrate the use of two functions one to gather string input and the other to gather integer input.

input():
We use the input function to gather integer input from the user. To use it we simply type the input function with the text to be displayed on the screen. For example: input("display this on screen").
Generally we'd assign the value obtained from the input function to a variable for future use.

raw_input():
We use the raw_input function to gather string input from the user. It's usage is exactly same as that of the input function. For example, we type raw_input("text to be displayed"). The string "text to be displayed" is output to the terminal. The user will feed some input which will be stored in a variable for future use.

Given below is a simple sample script:

root@buntu:~/py_scripts# cat in.py
#!/usr/bin/python

###get input from user

name = raw_input("What is your name?? ")

age = input("what is your age? ")

print "so", name,
print "your name is", name, "and age is", age, "years"

The script will prompt the user for name and age. It will store the input provided by the user in variables name and age and will then print the result.

When we print variable names along with other strings in a print statement we'll exclude the variables from the double quotes and use commas as separaters.

Here is a sample execution:

root@buntu:~/py_scripts# ./in.py

What is your name?? sahil

what is your age? 27

so sahil your name is sahil and age is 27 years

Thursday, 9 February 2017

File override protection with lockfile

I recently came across a requirement wherein I needed to ensure that only a single user could make permanent changes to a file at a given point of time and that changes made by other users trying to access the file concurrently should be discarded.

One method of doing this in vim is by turning "lock on" while you are editing the file. This tries to put an exclusive lock on the file. If another user opens up to edit the same file, vim presents them with a warning but lets the users edit the files if they type in "edit anyway". To find a way around it I did some research and came across the lockfile command whose usage I'll demonstrate in this article.

The lockfile command is provided by the procmail package. So, let's install that first.

[root@still ~]# yum install procmail
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirror.fibergrid.in
* extras: mirror.fibergrid.in
* updates: mirrors.vhost.vn
Resolving Dependencies
--> Running transaction check
---> Package procmail.x86_64 0:3.22-35.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==========================================================================================================================================================================
Package Arch Version Repository Size
=========================================================================================================================================================================
Installing:
procmail x86_64 3.22-35.el7 base 171 k

Transaction Summary
=========================================================================================================================================================================
Install 1 Package

Total download size: 171 k
Installed size: 349 k
Is this ok [y/d/N]: y
Downloading packages:
procmail-3.22-35.el7.x86_64.rpm | 171 kB 00:00:05
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : procmail-3.22-35.el7.x86_64 1/1
Verifying : procmail-3.22-35.el7.x86_64 1/1

Installed:
procmail.x86_64 0:3.22-35.el7

Complete!

An in depth discussion of what procmail is and what it does isn't of relevance to this post but still I'll share a brief description of the package which is as follows:

Procmail can be used to create mail-servers, mailing lists, sort your incoming mail into separate folders/files, preprocess your mail, start any programs upon mail arrival (e.g. to generate different chimes on your workstation for different types of mail) or selectively forward certain incoming mail automatically to someone.

The lockfile command is a conditional semaphore creator. When we want to lock a file for a process, we create it's lock file via execution of the lockfile command followed by the filename with the .lock extension. After that we can perform required actions on the file. Once our task with the file is completed, we can remove the lock file via the rm command and it will effectively unlock the file.

Here is a demonstration:

Create test file:

[sa@still ~]$ cat filename

a test file

Lock the file and edit again:

[sa@still ~]$ lockfile filename.lock

[sa@still ~]$ vim filename

Open another instance of vi and try to edit file. The following warning is displayed:

Click edit any way, make changes and save file.

Now write changes and save the original vi session. You see the following prompt:

Press yes and save the file.

View file contents & remove lock file:

[sa@still ~]$ cat filename

a test file

still testing

[sa@still ~]$ rm filename.lock

rm: remove write-protected regular file ‘filename.lock’? y

Note: only the changes which were made by the process that locked the file will be saved!

Wednesday, 8 February 2017

Getting disk size info from iostat in Solaris/Using lsblk to find total size of disks allocated in Linux

I know the title of the post is long. With that out of the way lets get started.

Today I'll share a quick one liner to provide the size of attached disks in a neat format with iostat command in Solaris.

So, here's what the iostat -En command output looks like :

[ssuri@:~] $ sudo iostat -En
c21d0 Soft Errors: 0 Transport Errors: 0 Protocol Errors: 0
Vendor: SUN Product: VDSK Size: 137.67GB <137665445888 bytes>

c21d1 Soft Errors: 0 Transport Errors: 0 Protocol Errors: 0
Vendor: SUN Product: VDSK Size: 137.67GB <137665445888 bytes>

And this is the one liner to format it so that we get a two column output consisting of the disk name and disk size only.

sudo iostat -En | awk '{print $1, $6}' | tr -d "Errors:" | sed 's/Vend//g' | sed '$!N;$!N;s/\n/ /g'

Given below is the output of the above one liner:

c21d0 137.67GB

c21d1 137.67GB

c21d2 274.88GB

c21d11 34.36GB

c21d12 34.36GB

c21d13 34.36GB

c21d14 34.36GB

c21d15 34.36GB

c21d16 34.36GB

c21d17 34.36GB

c21d18 68.72GB

c21d19 274.88GB

Now coming to the Linux part.
I was recently tasked with getting information on total connected SAN storage excluding the root & swap disks for some Linux servers.
I could've used fdisk but I used lsblk instead.

Here is the default output of lsblk without any filters:

[ssuri@:~] $ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
sdb 8:16 0 32G 0 disk
`-grid_vg-grid_lv (dm-4) 253:4 0 32G 0 lvm /grid
sdc 8:32 0 24G 0 disk
`-kdump_vg-kdump_lv (dm-3) 253:3 0 24G 0 lvm
sdd 8:48 0 32G 0 disk
`-elmerd5db_vg-lvol1 (dm-2) 253:2 0 32G 0 lvm /elmerd5db/db
sde 8:64 0 32G 0 disk
`-sde1 8:65 0 32G 0 part
sdf 8:80 0 256G 0 disk
`-sdf1 8:81 0 256G 0 part
sdg 8:96 0 32G 0 disk
`-sdg1 8:97 0 32G 0 part
sda 8:0 0 60G 0 disk
|-sda1 8:1 0 512M 0 part /boot
`-sda2 8:2 0 59.5G 0 part
|-os_vg-root_lv (dm-0) 253:0 0 16G 0 lvm /
|-os_vg-swap_01_lv (dm-1) 253:1 0 4G 0 lvm [SWAP]
|-os_vg-tmp_lv (dm-5) 253:5 0 2G 0 lvm /tmp
|-os_vg-var_lv (dm-6) 253:6 0 10G 0 lvm /var
`-os_vg-hpds_lv (dm-7) 253:7 0 12G 0 lvm /var/opt/perf/datafiles
sdi 8:128 0 32G 0 disk
`-sdi1 8:129 0 32G 0 part
sdh 8:112 0 32G 0 disk

`-sdh1 8:113 0 32G 0 part

I know that I'm using disk sda and sdc for root and swap space respectively.

So lets limit the output to disk devices and filter out disks /dev/sda and /dev/sdc.

[ssuri@:~] $ lsblk | egrep disk | grep -v "sd[ac]"

sdb 8:16 0 32G 0 disk

sdd 8:48 0 32G 0 disk

sde 8:64 0 32G 0 disk

sdf 8:80 0 256G 0 disk

sdg 8:96 0 32G 0 disk

sdi 8:128 0 32G 0 disk

sdh 8:112 0 32G 0 disk

Now, I'll use awk to calculate the total size of these disks:

[ssuri@:~] $ lsblk | egrep disk | grep -v "sd[ac]" | tr -d "G" | awk '{sum+=$4} END {print sum}'

448

A quick and easy time saver.

Script to check if File Systems are in read only mode

Today I'll share a quick script with which you can check if the file systems on your server have gone into read only mode. One way would be to touch a file and check but doing that on every file system on every server would involve a lot of time and overhead.
From some experimentation I've learned that if the file systems go into read only mode while a process is writing to it then the read only mode gets reflected in /proc/mounts command.

Here is a quick script to check the status and send out an email in case of issues:

#!/bin/bash

##########################################

#date: 01/02/2017 #

#purpse: check Linux file systems are RW #

##########################################

cat /proc/mounts | grep ext4| awk '{print $2, $4}' | awk -F "," '{print $1}' | while read FS_NAME PERM

if [ ${PERM} != "rw" ] ; then

echo -e "FS $FS_NAME is mounted with $PERM permissions. Please check \n" | mail -s "Read only FS on `hostname`" unixadmins@example.com

done

You can integrate this script with a monitoring tool like HP OV or nagios.

A system engineer's notes