Thursday, 16 November 2017

How to panic a guest domain in Solaris

I recently came across a Solaris 10 guest domain in a hung state.
I accessed its console from the primary domain but I was not able to see a login prompt or anything for that matter.

I was able to ping the server but unable to login to it via ssh.

Hence we decided to reboot the guest domain but we wanted to make sure that a crash dump was generated which could be shared with Oracle support for further analysis.

We decided to induce a kernel panic in the guest domain to ensure the generation of a crash dump on system restart.

The command used to accomplish this is ldm panic-domain.

[sahil@primary-domain-p:~] $ sudo ldm panic-domain test-domain-g
[sahil@primary-domain-p:~] $ sudo ldm list test-domain-g
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
test-domain-g active     -t----  5002    64    158G     100%  100%  157d 1h
[sahil@primary-domain-p:~] $ sudo console test-domain-g
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Connecting to console "test-domain-g" in group "test-domain-g" ....
Press ~? for control options ..
 6:21 100% done
100% done: 1523171 pages dumped, dump succeeded
rebooting...
Resetting...
NOTICE: Entering OpenBoot.
NOTICE: Fetching Guest MD from HV.
NOTICE: Starting additional cpus.
NOTICE: Initializing LDC services.
NOTICE: Probing PCI devices.
NOTICE: Finished PCI probing.


The ldm panic-domain command ensured that a crash dump was generated when the guest domain underwent a reboot.

I hope this quick tip was helpful.

Monday, 6 November 2017

Shutdown a zone stuck in down state

While working on a patching activity I came across an issue wherein the swap file system temporarily mounted for a zone did not get unmounted properly during the installpatchset phase.

swap                   295G     8K   295G     1%    /zones/lab-zone/lu


From the zoneadm list output, I observed that the zone to which the above file system belonged to was somehow stuck in down state.

usport-lab-g# zoneadm list -icv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   7 lab-zone       down       /zones/lab-zone              native   shared

Further investigation revealed that the zoneadmd process for the zone was still active.

usport-lab-g# ps -ef | grep zoneadmd
    root  6962  6863   0 06:31:49 pts/1       0:00 grep zoneadmd
    root 21890     1   0 06:12:45 ?           0:01 zoneadmd -z lab-zone

I forcefully terminated this process with the kill command.

usport-lab-g# kill -9 21890
usport-lab-g# ps -ef | grep zoneadmd
    root  7025  6863   0 06:32:00 pts/1       0:00 grep zoneadmd

This fixed the problem and the zone was now in the installed state as I anticipated.

[ssuri@usport-lab-g:~] $ sudo zoneadm list -icv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   - lab-zone       installed  /zones/lab-zone              native   shared


I hope this quick tip was helpful for you and I thank you for reading.

Monday, 16 October 2017

Installing openssh on Ubuntu 16.04 (with/without internet access)

I recently installed Ubuntu 16.04 server edition and found that there was no ssh access available. On checking from the VM console I found out that openssh was not installed. So in this article, I'll share the steps I followed to install openssh server on my machine.

Once the VM was up it was on the network with an IP address so it had access to the default repositories available on the internet. Simply type apt-get install ssh to install openssh-server and its dependent packages.



Once done verify that the required service file is now available in /etc/systemd/system directory.



Finally, start and enable the service.



That's fine and easy but in the case, we're behind a corporate firewall without internet access then let's see what we can do.

Mount the ubuntu iso on a temporary mount point like /mnt as I've done and then go /mnt/pool/main/o/openssh directory.



From here you install the package using dpkg -i openssh-server_7.2p2-4ubuntu1_amd64.deb.

You can check it's dependencies and install them first. to check dependencies type dpkg -I openssh-server_7.2p2-4ubuntu1_amd64.deb



I hope this article has been helpful and I thank you for reading through it.

Sunday, 15 October 2017

Avoid extra typing "grep -v grep"

We frequently use grep filter to filter and print strings of characters that we look for in a file or the output of a command.

We might be searching for a process in the ps- ef command's output and we end up with the grep command itself being displayed in the results.

For example, if I use grep to search for sshd processes in the 'ps -ef' output I get the following result:


[root@pbox6 ~]# ps -ef | grep ssh
root       1823      1  0 10:20 ?        00:00:00 /usr/sbin/sshd root       2656   1823  1 10:22 ?        00:00:00 sshd: root@pts/1 root       2660   1823  0 10:22 ?        00:00:00 sshd: root [priv] sshd       2661   2660  0 10:22 ?        00:00:00 sshd: root [net] root       2683   2662  0 10:22 pts/1    00:00:00 grep ssh



This could be an issue if we intend to count the number of processes and use the subsequent result in a script.

An option to remove grep from the search result would be to pipe the output to "grep -v grep".

[root@pbox6 ~]# ps -ef | grep ssh | grep -v grep
root       1823      1  0 10:20 ?        00:00:00 /usr/sbin/sshd
root       2656   1823  0 10:22 ?        00:00:00 sshd: root@pts/1
root       2660   1823  0 10:22 ?        00:00:00 sshd: root@notty
root       2686   2660  0 10:22 ?        00:00:00 /usr/libexec/openssh/sftp-server

But in an effort to avoid typing more than we need to, we could just enclose the first or last character of the string being searched for in square brackets to denote a character class and doing so would omit the grep command itself from showing up in the search result.

[root@pbox6 ~]# ps -ef | grep [s]sh
root       1823      1  0 10:20 ?        00:00:00 /usr/sbin/sshd
root       2656   1823  0 10:22 ?        00:00:00 sshd: root@pts/1
root       2660   1823  0 10:22 ?        00:00:00 sshd: root@notty
root       2686   2660  0 10:22 ?        00:00:00 /usr/libexec/openssh/sftp-server
[root@pbox6 ~]#
[root@pbox6 ~]# ps -ef | grep ss[h]
root       1823      1  0 10:20 ?        00:00:00 /usr/sbin/sshd
root       2656   1823  0 10:22 ?        00:00:00 sshd: root@pts/1
root       2660   1823  0 10:22 ?        00:00:00 sshd: root@notty
root       2686   2660  0 10:22 ?        00:00:00 /usr/libexec/openssh/sftp-server


I hope this quick type has been helpful for you.

Saturday, 14 October 2017

Workaround for "Error getting private key" error while starting realvnc on Linux

Recently we received a user complaint that they were unable to access a server through vncviewer.
When I checked I found that the service was not running and when I attempted to start the service I got the below message:

[root@pbox bin]# /etc/init.d/vncserver start
Starting VNC server: 1:vncuser xhost:  unable to open display ""
chmod: cannot access `/tmp/.Xauthority-vncuser': No such file or directory
VNC(R) Server 5.3.2 (r19179) x64 (Jun 6 2016 19:59:17)
Copyright (C) 2002-2016 RealVNC Ltd.
RealVNC and VNC are trademarks of RealVNC Ltd and are protected by trademark
registrations and/or pending trademark applications in the European Union,
United States of America and other jurisdictions.
Protected by UK patent 2481870; US patent 8760366.
See http://www.realvnc.com for information on VNC.
For third party acknowledgements see:
http://www.realvnc.com/products/vnc/documentation/5.3/acknowledgements.txt

Error getting private key from /var/home/vncuser/.vnc/private.key: End of stream
Underlying X server release 609000, The X.Org Foundation

error opening security policy file /usr/X11R6/lib/X11/xserver/SecurityPolicy
Could not init font path element /usr/X11R6/lib/X11/fonts/misc/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/TTF/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/Type1/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/75dpi/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/100dpi/, removing from list!
FreeFontPath: FPE "/usr/share/vnc/fonts/" refcount is 2, should be 1; fixing.
2:vncuser xhost:  unable to open display ""
chmod: cannot access `/tmp/.Xauthority-vncuser': No such file or directory
VNC(R) Server 5.3.2 (r19179) x64 (Jun 6 2016 19:59:17)
Copyright (C) 2002-2016 RealVNC Ltd.
RealVNC and VNC are trademarks of RealVNC Ltd and are protected by trademark
registrations and/or pending trademark applications in the European Union,
United States of America and other jurisdictions.
Protected by UK patent 2481870; US patent 8760366.
See http://www.realvnc.com for information on VNC.
For third party acknowledgements see:
http://www.realvnc.com/products/vnc/documentation/5.3/acknowledgements.txt

Error getting private key from /var/home/vncuser/.vnc/private.key: End of stream
Underlying X server release 609000, The X.Org Foundation

error opening security policy file /usr/X11R6/lib/X11/xserver/SecurityPolicy
Could not init font path element /usr/X11R6/lib/X11/fonts/misc/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/TTF/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/Type1/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/75dpi/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/100dpi/, removing from list!
FreeFontPath: FPE "/usr/share/vnc/fonts/" refcount is 2, should be 1; fixing.
                                                           [  OK  ]


I wasn't able to query the status of the service either:

[root@pbox init.d]# service vncserver status
Xvnc dead but subsys locked

This error pointed me to the /var/lock/subsys/Xvnc file which I removed and then restarted the service but it did not work.

If I attempted to stop the service then that also did not succeed in the first attempt.

[root@pbox ~]# service vncserver stop
Shutting down VNC server: 1:vncuser 2:vncuser              [FAILED]

Although it did work the second time I tried.

[root@pbox subsys]# service vncserver stop
Shutting down VNC server:                                  [  OK  ]



When I checked the /var/home/vncuser/.vnc/private.key I found it to be empty.

[root@pbox .vnc]# ls -l private.key
-rw------- 1 vncuser vncuser 0 Oct  5 04:12 private.key

I restarted the vnc service multiple times and even installed realvnc again but that did not work. According to the documentation I found on realvnc, the private key should've been regenerated after a restart of the service or created at least when I re-installed the software but that did not happen.

I finally ended up copying the private key from a server on which realvnc was already running and started the service on the problematic server and it finally worked.

[root@pbox ~]# service vncserver restart
Shutting down VNC server: 1:vncuser 2:vncuser              [  OK  ]
Starting VNC server: 1:vncuser xhost:  unable to open display ""
chmod: cannot access `/tmp/.Xauthority-vncuser': No such file or directory
VNC(R) Server 5.3.2 (r19179) x64 (Jun 6 2016 19:59:17)
Copyright (C) 2002-2016 RealVNC Ltd.
RealVNC and VNC are trademarks of RealVNC Ltd and are protected by trademark
registrations and/or pending trademark applications in the European Union,
United States of America and other jurisdictions.
Protected by UK patent 2481870; US patent 8760366.
See http://www.realvnc.com for information on VNC.
For third party acknowledgements see:
http://www.realvnc.com/products/vnc/documentation/5.3/acknowledgements.txt

If a desktop environment fails to load for this virtual desktop, please see:
 http://www.realvnc.com/doclink/kb-345?version=5.3.2.19179
Running applications in /var/home/vncuser/.vnc/xstartup

VNC Server catchphrase: "Member barcode connect. Desire college gong."
             signature: 88-c7-cb-1a-2c-9b-90-31

Log file is /var/home/vncuser/.vnc/pbox.dev.test.org:1.log
New desktop is pbox.dev.test.org:1 (10.22.217.69:1)
2:vncuser xhost:  unable to open display ""
chmod: cannot access `/tmp/.Xauthority-vncuser': No such file or directory
VNC(R) Server 5.3.2 (r19179) x64 (Jun 6 2016 19:59:17)
Copyright (C) 2002-2016 RealVNC Ltd.
RealVNC and VNC are trademarks of RealVNC Ltd and are protected by trademark
registrations and/or pending trademark applications in the European Union,
United States of America and other jurisdictions.
Protected by UK patent 2481870; US patent 8760366.
See http://www.realvnc.com for information on VNC.
For third party acknowledgements see:
http://www.realvnc.com/products/vnc/documentation/5.3/acknowledgements.txt

If a desktop environment fails to load for this virtual desktop, please see:
 http://www.realvnc.com/doclink/kb-345?version=5.3.2.19179
Running applications in /var/home/vncuser/.vnc/xstartup

VNC Server catchphrase: "Member barcode connect. Desire college gong."
             signature: 88-c7-cb-1a-2c-9b-90-31

Log file is /var/home/vncuser/.vnc/pbox.dev.test.org:2.log
New desktop is pbox.dev.test.org:2 (10.22.217.69:2)
                                                           [  OK  ]
[root@pbox ~]#
[root@pbox ~]#
[root@pbox ~]# ps -ef | grep [v]nc
vncuser   49576      1  0 02:05 ?        00:00:00 /usr/bin/Xvnc-core :1 -auth /var/home/vncuser/.Xauthority -pn -geometry 800x600 -nolisten tcp
root      49577  49576  0 02:05 ?        00:00:00 /usr/bin/Xvnc-realvnc -rootHelper 816219 4
vncuser   49608      1  0 02:05 ?        00:00:00 /bin/sh /etc/vnc/xstartup
vncuser   49630  49608  0 02:05 ?        00:00:00 xterm -geometry 80x24+10+10 -ls
vncuser   49632  49608  0 02:05 ?        00:00:00 twm
vncuser   49654  49630  0 02:05 pts/2    00:00:00 -bash
vncuser   49667      1  0 02:05 ?        00:00:00 /usr/bin/Xvnc-core :2 -auth /var/home/vncuser/.Xauthority -pn -geometry 1600x1200 -nolisten tcp
root      49671  49667  0 02:05 ?        00:00:00 /usr/bin/Xvnc-realvnc -rootHelper 816219 4
vncuser   49711      1  0 02:05 ?        00:00:00 /bin/sh /etc/vnc/xstartup
vncuser   49731  49711  0 02:05 ?        00:00:00 xterm -geometry 80x24+10+10 -ls
vncuser   49732  49711  0 02:05 ?        00:00:00 twm
vncuser   49734  49731  0 02:05 pts/3    00:00:00 -bash
vncuser   49753  49576  0 02:05 ?        00:00:00 /usr/bin/vncserverui virtual 13
vncuser   49799  49753  0 02:05 ?        00:00:00 /usr/bin/vncserverui -statusicon 5
vncuser   49800  49667  0 02:05 ?        00:00:00 /usr/bin/vncserverui virtual 13
vncuser   49822  49800  0 02:05 ?        00:00:00 /usr/bin/vncserverui -statusicon 5


This is definitely not ideal but a quick fix just to keep things going.

Sunday, 8 October 2017

Fixing NTP sync issues in RHEL 6

In this article I'll be exploring two distinct issues I faced with ntp sync wherein the servers were not able to properly sync their time with ntp servers.

We observed a high offset when we checked the ntpq -p output.

I did a couple of stop-start operations followed by troubleshooting procedure outlined in RedHat KB articles 35640 and 64868 whose logs I'm sharing below:

[ssuri@usporiainfrar00:~]' $ sudo ntpq
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+ntp.nova.org   .GPS.            1 u   43   64  377    0.189  309.823 175.931
*mtntime.emrets. .GPS.            1 u   44   64  377   13.401  309.833 174.006
xusporz-infrac15 10.16.64.15      3 u   43   64  377    0.208  286.078 169.984
xgblonz-infrac07 10.16.64.15      3 u   39   64  377   91.365  287.473 167.180
ntpq> as

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 57822  941a   yes   yes  none candidate    sys_peer  1
  2 57823  961a   yes   yes  none  sys.peer    sys_peer  1
  3 57824  9124   yes   yes  none falsetick   reachable  2
  4 57825  9124   yes   yes  none falsetick   reachable  2
ntpq> rv 57822
associd=57822 status=941a conf, reach, sel_candidate, 1 event, sys_peer,
srcadr=ntp.nova.org, srcport=123, dstadr=10.16.216.184, dstport=123,
leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.290,
refid=GPS, reftime=dd6328c0.a7684541  Wed, Sep 13 2017  3:47:12.653,
rec=dd6328c3.fc501541  Wed, Sep 13 2017  3:47:15.985, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=29, flash=00 ok,
keyid=0, offset=504.129, delay=0.171, dispersion=1.525, jitter=183.751,
xleave=0.028,
filtdelay=     0.19    0.17    0.19    0.22    0.21    0.19    0.20    0.21,
filtoffset=  553.93  504.13  454.51  405.18  356.99  309.82  250.70  191.10,
filtdisp=      0.00    0.96    1.94    2.93    3.92    4.91    5.88    6.87
ntpq> q

[ssuri@usporiainfrar00:~]' $ sudo ntpdate -u ntp.nova.org
13 Sep 03:51:50 ntpdate[87098]: step time server 10.16.64.124 offset 0.757074 sec
[ssuri@usporiainfrar00:~]' $ sudo ntpdate -d ntp.nova.org
13 Sep 03:51:59 ntpdate[87462]: ntpdate 4.2.6p5@1.2349-o' Tue May  3 15:12:51 UTC 2016 (1)
Looking for host ntp.nova.org and service ntp
host found : ntp.nova.org
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
server 10.16.64.124, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.02580, dispersion 0.00000
transmitted 4, in filter 4
reference time:    dd6329d6.e0896c6a  Wed, Sep 13 2017  3:51:50.877
originate timestamp: dd6329df.4e7f4739  Wed, Sep 13 2017  3:51:59.306
transmit timestamp:  dd6329df.4d0d781a  Wed, Sep 13 2017  3:51:59.300
filter delay:  0.02585  0.02580  0.02580  0.02585
         0.00000  0.00000  0.00000  0.00000
filter offset: 0.005469 0.005474 0.005467 0.005453
         0.000000 0.000000 0.000000 0.000000
delay 0.02580, dispersion 0.00000
offset 0.005474

13 Sep 03:51:59 ntpdate[87462]: adjust time server 10.16.64.124 offset 0.005474 sec


We also did some comparisons with the system hardware clock and system time set by ntp by running the following command snippet:

 s(){ printf "\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n";}; s; printf '`date`:\t '; date; printf '`sudo hwclock`: ';sudo hwclock; s; printf '`sudo ntpq -pn`:\n\n'; sudo ntpq -pn; s; printf '`sudo ntpq -c as`:\n'; sudo ntpq -c as; for id in $(sudo ntpq -c as | awk '/^ ./{print $2}'); do s; printf "\`sudo ntpq -c \"rv $id\"\`:\n\n"; sudo ntpq -c "rv $id"; done

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `date`: Wed Sep 13 05:23:47 UTC 2017 `sudo hwclock`: Wed 13 Sep 2017 05:23:48 AM UTC -0.343942 seconds ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -pn`: remote refid st t when poll reach delay offset jitter ============================================================================== *10.16.64.124 .GPS. 1 u 14 64 377 0.160 428.657 142.242 +10.20.64.124 .GPS. 1 u 40 64 377 13.456 416.054 136.215 +10.16.64.11 10.16.64.15 3 u 30 64 377 0.214 411.056 141.797 +10.24.64.11 10.16.64.15 3 u 28 64 377 91.317 419.916 141.874 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c as`: ind assid status conf reach auth condition last_event cnt =========================================================== 1 33492 961a yes yes none sys.peer sys_peer 1 2 33493 943a yes yes none candidate sys_peer 3 3 33494 9414 yes yes none candidate reachable 1 4 33495 9414 yes yes none candidate reachable 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33492"`: associd=33492 status=961a conf, reach, sel_sys.peer, 1 event, sys_peer, srcadr=ntp.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.305, refid=GPS, reftime=dd633f51.48ffa52c Wed, Sep 13 2017 5:23:29.285, rec=dd633f55.f6c4f89f Wed, Sep 13 2017 5:23:33.963, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=50, flash=00 ok, keyid=0, offset=428.657, delay=0.160, dispersion=0.980, jitter=142.242, xleave=0.033, filtdelay= 0.16 0.20 0.18 0.15 0.25 0.23 0.19 0.20, filtoffset= 428.66 397.34 365.10 333.82 302.01 269.84 237.60 205.38, filtdisp= 0.00 1.01 2.04 3.05 4.07 5.10 6.14 7.17 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33493"`: associd=33493 status=943a conf, reach, sel_candidate, 3 events, sys_peer, srcadr=mtntime.emrets.net, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.458, refid=GPS, reftime=dd633f2b.eeaa5a6e Wed, Sep 13 2017 5:22:51.932, rec=dd633f3b.031f6da1 Wed, Sep 13 2017 5:23:07.012, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=21, flash=00 ok, keyid=0, offset=416.054, delay=13.456, dispersion=0.947, jitter=136.215, xleave=0.026, filtdelay= 13.46 13.42 13.43 13.45 13.42 13.50 13.50 13.47, filtoffset= 416.05 385.25 355.33 324.97 294.15 264.31 232.99 202.63, filtdisp= 0.00 0.99 1.95 2.93 3.92 4.88 5.88 6.86 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33494"`: associd=33494 status=9414 conf, reach, sel_candidate, 1 event, reachable, srcadr=usstlz-pinfdc15.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=3, precision=-6, rootdelay=62.500, rootdisp=121.170, refid=10.16.64.15, reftime=dd633ccb.357101ca Wed, Sep 13 2017 5:12:43.208, rec=dd633f45.f8454467 Wed, Sep 13 2017 5:23:17.969, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=32, flash=00 ok, keyid=0, offset=411.056, delay=0.214, dispersion=16.513, jitter=141.797, xleave=0.018, filtdelay= 0.21 0.27 0.27 0.22 0.23 0.26 0.27 0.26, filtoffset= 411.06 377.74 348.50 307.74 276.47 246.26 226.04 197.64, filtdisp= 15.63 16.60 17.61 18.57 19.57 20.58 21.57 22.54 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33495"`: associd=33495 status=9414 conf, reach, sel_candidate, 1 event, reachable, srcadr=gblonz-pinfdc07.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=3, precision=-6, rootdelay=109.375, rootdisp=122.223, refid=10.16.64.15, reftime=dd633c4d.a4ae07e5 Wed, Sep 13 2017 5:10:37.643, rec=dd633f47.0cce5653 Wed, Sep 13 2017 5:23:19.050, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=33, flash=00 ok, keyid=0, offset=419.916, delay=91.317, dispersion=16.524, jitter=141.874, xleave=0.010, filtdelay= 91.32 91.61 91.40 91.13 91.36 92.86 91.49 91.38, filtoffset= 419.92 388.51 354.44 316.21 298.64 270.20 226.80 195.30, filtdisp= 15.63 16.62 17.62 18.61 19.60 20.58 21.58 22.57


 We disabled slew mode for ntp as shown below by updating the OPTIONS parameter in etc/sysconfig/ntpd  file:

# grep -i ntpd ps ntp 1082 0.0 0.0 26520 1980 ? Ss 03:09 0:00 ntpd -x -u ntp:ntp -p /var/run/ntpd.pid -g -4 

 # grep -v "^#" /etc/sysconfig/ntpd 
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid -g -4" 

After that we restarted the service:

# ps -ef | grep -E ^ntp 
ntp 59539 1 0 04:48 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g -4 


After this we did a stop and start of the service a few times and a manual update with the ntp server to finally fix the high offset problem.

sudo ntpdate -u ntp.nova.org 
13 Sep 02:20:49 ntpdate[19667]: step time server 10.16.64.124 offset -42.674761 sec 

Just a note that apart from the troubleshooting we did here to fix the high offset, this problem could also be caused by network latency so that is worth checking out.


In the second scenario we were again facing high offsets on one of our linux servers but in this case the time displayed by the hardware clock was accurate.

We did a few stop, start and restart operations along with a manual update with the ntp server but with no success.

[ssuri@usporinfradr00:~] $ sudo nptdate -u ntpls.nova.org
  7 Oct 09:13:38 ntpdate[14262]: step time server 10.20.64.124 offset -173.793480 sec
 [ssuri@usporinfradr00:~] $ sudo ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
 *ntp.nova.org   .GPS.            1 u 4971   64    1    0.087  -173811   0.000
  10.16.64.124    .INIT.          16 u    -   64    0    0.000    0.000   0.000
  usporz-infrac21 10.16.64.15      3 u 4971   64    1    0.183  -173819   0.000
  gblonz-infrac07 10.16.64.15      3 u 4971   64    1  101.189  -173806   0.000


The system hardware clock was accurate though.

[ssuri@usporinfradr00:~] $ sudo hwclock -r
 Sat 07 Oct 2017 09:18:34 AM UTC  -0.485089 seconds
 [ssuri@usporinfradr00:~] $ date
 Sat Oct  7 09:16:05 UTC 2017


We then tried to sync the system and hardware clocks by executing hwclock -s but to our surprise now the hardware clock was also out of sync.

[ssuri@usporinfradr00:~] $ sudo hwclock -s
 [ssuri@usporinfradr00:~] $ date
 Sat Oct  7 09:20:18 UTC 2017

[root@usporinfradr00 ~]# hwclock -r
 Sat 07 Oct 2017 09:20:38 AM UTC  -0.265782 seconds

We did a manual update with our ntp server again followed by a restart of the service.

[root@usporinfradr00 ~]# sudo ntpdate -u ntpls.nova.org
  7 Oct 09:18:05 ntpdate[88723]: step time server 10.20.64.124 offset -173.991557 sec
 [root@usporinfradr00 ~]# sudo ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
  ntpserver.nova .GPS.            1 u 4971   64   37    0.076   82.388 123032.
  10.16.64.124    .INIT.          16 u    -   64    0    0.000    0.000   0.000
  usporz-infrac21 10.16.64.15      3 u 4971   64   37    0.161   70.839 123031.
  gblonz-infrac07 10.16.64.15      3 u 4971   64   37  101.189  -173806 150569.

 [root@usporinfradr00 ~]# sudo ntpq -pservice ntpd restart
 Shutting down ntpd: [  OK  ]

 Starting ntpd: [  OK  ]


This finally corrected the offset and brought the system time in sync with the hardware clock as well.

[root@usporinfradr00 ~]# date
 Sat Oct  7 09:18:56 UTC 2017
 [root@usporinfradr00 ~]#  hwclock -r
 Sat 07 Oct 2017 09:19:26 AM UTC  -0.031508 seconds
 [root@usporinfradr00 ~]# date
 Sat Oct  7 09:19:30 UTC 2017
 [root@usporinfradr00 ~]# sudo ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
 *ntpls.nova.org .GPS.            1 u   14   64    3    0.081    9.357  32.137
  10.16.64.124    .INIT.          16 u    -   64    0    0.000    0.000   0.000
  usporz-infrac21 10.16.64.15      3 u    9   64    3    0.170   10.491  32.264
  gblonz-infrac07 10.16.64.15      3 u   10   64    3  101.359   22.452  32.468
 [root@usporinfradr00 ~]# 
 [root@usporinfradr00 ~]# date
 Sat Oct  7 09:20:21 UTC 2017

About ping timeouts in Solaris and Linux

While writing a script for checking ping response from a couple of servers I ran into some issues while setting timeouts for the pings. I was setting a timeout of 2 or 3 seconds but the ping command was still taking much longer to time out for the unreachable hosts.

Finally I realised that this was because of the time spent on name resolution. The ping responses came into affect only after name resolution or DNS query timed out.

In this article I'll demonstrate what I mentioned above.

Solaris:
The default ping timeout is 20 seconds. We can set a custom timeout by specifying it in seconds in the ping command as: ping <host> <timeout>

root@sandbox:/# time ping google 2
ping: unknown host google

real    0m21.452s
user    0m0.001s
sys     0m0.002s

In the above example the ping should've ideally timed out in just 2 seconds but it actually took almost 22 seconds. The reason being name resolution time out.

The workaround is to use IP addresses instead of names or specify a timeout in the /etc/resolv.conf file.

Here's an example of trying to ping a non-reachable IP address instead of hostname:

root@sandbox:/# time ping 1.2.3.4
no answer from 1.2.3.4

real    0m20.002s
user    0m0.002s
sys     0m0.008s
root@sandbox:/# time ping 1.2.3.4 2
no answer from 1.2.3.4

real    0m2.002s
user    0m0.001s
sys     0m0.003s


Linux:
The same name resolution delay is encountered while specifying a timeout with -w while working on Linux.

[root@pbox6 ~]# time ping  -w 1 google
ping: unknown host google

real    0m10.013s
user    0m0.001s
sys     0m0.001s

The ping should've timed out after 1 second but took 10 seconds instead.

The fix is the same as in case of solaris. Either use IP addresses or specify a timeout for DNS resolution in the /etc/resolv.conf file.

[root@pbox6 ~]# time ping -w 1 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data.

--- 1.2.3.4 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms


real    0m1.009s
user    0m0.000s
sys     0m0.007s

Saturday, 7 October 2017

Installing oracle XE in CentOS 6

In this article I'll describe how we can install oracle 11g express edition in CentOS 6. The download is a zip file oracle-xe-11.2.0-1.0.x86_64.rpm.zip.

We get started by extracting the zip file and installing the rpm.

[root@walk XE]# ls
oracle-xe-11.2.0-1.0.x86_64.rpm.zip
[root@walk XE]# unzip oracle-xe-11.2.0-1.0.x86_64.rpm.zip
Archive:  oracle-xe-11.2.0-1.0.x86_64.rpm.zip
   creating: Disk1/
   creating: Disk1/upgrade/
  inflating: Disk1/upgrade/gen_inst.sql
   creating: Disk1/response/
  inflating: Disk1/response/xe.rsp
  inflating: Disk1/oracle-xe-11.2.0-1.0.x86_64.rpm
[root@walk XE]#

[root@walk Disk1]# rpm -ivh oracle-xe-11.2.0-1.0.x86_64.rpm
Preparing...                ########################################### [100%]
   1:oracle-xe              ########################################### [100%]
Executing post-install steps...
You must run '/etc/init.d/oracle-xe configure' as the root user to configure the database.

Next, as directed we proceed to launch the database configuration wizard.

[root@walk ~]# /etc/init.d/oracle-xe configure

Oracle Database 11g Express Edition Configuration
-------------------------------------------------
This will configure on-boot properties of Oracle Database 11g Express
Edition.  The following questions will determine whether the database should
be starting upon system boot, the ports it will use, and the passwords that
will be used for database accounts.  Press <Enter> to accept the defaults.
Ctrl-C will abort.

Specify the HTTP port that will be used for Oracle Application Express [8080]:

Specify a port that will be used for the database listener [1521]:

Specify a password to be used for database accounts.  Note that the same
password will be used for SYS and SYSTEM.  Oracle recommends the use of
different passwords for each database account.  This can be done after
initial configuration:
Confirm the password:

Do you want Oracle Database 11g Express Edition to be started on boot (y/n) [y]:y

Starting Oracle Net Listener...Done
Configuring database...Done
Starting Oracle Database 11g Express Edition instance...Done
Installation completed successfully.


This takes care of creating the required oracle user and dba group along with creating and populating the /u01 directory with all the content that the DB will need.

This also installs oracle-xe script in /etc/init.d to control the database through init.

[root@walk ~]# ls -l /etc/init.d/oracle-xe
-rwxr-xr-x. 1 root root 19592 Aug 29  2011 /etc/init.d/oracle-xe

We can treat the DB instance as a service and view it's status like any other init service:

[root@walk ~]# /etc/init.d/oracle-xe status

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 07-OCT-2017 11:18:15

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC_FOR_XE)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date                07-OCT-2017 10:18:59
Uptime                    0 days 0 hr. 59 min. 16 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Default Service           XE
Listener Parameter File   /u01/app/oracle/product/11.2.0/xe/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/walk/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC_FOR_XE)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=walk)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "XE" has 1 instance(s).
  Instance "XE", status READY, has 1 handler(s) for this service...
Service "XEXDB" has 1 instance(s).
  Instance "XE", status READY, has 1 handler(s) for this service...
The command completed successfully


Now we will switch to the oracle user, set up our environment and connect to the database via SQLplus.

-bash-4.1$ id -a oracle
uid=500(oracle) gid=500(dba) groups=500(dba)


-bash-4.1$ cat /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/xe
export ORACLE_SID=XE
export NLS_LANG=`$ORACLE_HOME/bin/nls_lang.sh`
export PATH=$ORACLE_HOME/bin:$PATH
-bash-4.1$ . /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh

As shown the environment variables are available in the script oracle_env.sh and we need to source it for the variables to come into effect. To avoid running the script at each login we could just add these variables to the .bash_profile file for the oracle user.

Now let's launch SQLplus.

-bash-4.1$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Sat Oct 7 10:27:42 2017

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production

SQL>


Connecting to the instance via "/ as sysdba" is like connecting as root user in Linux. This denotes full administrative access over the instance.

Let's query the v$instance and v$database views to take a look at the instance status:

SQL> SELECT INSTANCE_NAME, STATUS, DATABASE_STATUS FROM V$INSTANCE;

INSTANCE_NAME    STATUS       DATABASE_STATUS
---------------- ------------ -----------------
XE               OPEN         ACTIVE


SQL> SELECT NAME,CREATED,LOG_MODE,OPEN_MODE FROM V$DATABASE;

NAME      CREATED            LOG_MODE     OPEN_MODE
--------- ------------------ ------------ --------------------
XE        07-OCT-17          NOARCHIVELOG READ WRITE


From a system admin's perspective we tend to look for the pmon process to confirm if a DB is running or not.

[root@walk ~]# ps -ef | grep oracle | grep pmon
oracle     4853      1  0 10:22 ?        00:00:00 xe_pmon_XE

Friday, 6 October 2017

Changing a Solaris 10 zone's ip type from shared to exclusive

Zones in Solaris 10 are configured with IP type as shared by default whereas in case of Solaris 11 the default IP type is exclusive but that's a completely different story.

In Solaris 10 zones can have one of two IP types:

Shared-ip:
In this type of network setup the zone shares a network interface or data link with the global zone. When the zone boots a logical interface is created on top of the physical interface with the IP address we specify in the zonecfg configuration for the net resource. This logical interface stays as long as the zone is running and is removed once the zone halts and is re-created at next boot and so forth. In this way the zone itself doesn't really control it's networking stack.


Exclusive-ip:
In this setup the zone is given dedicated control of a physical network interface. We set the IP address and default route from within the zone and not through the zone's configuration done via zonecfg.
Here are some of the features bestowed upon the non-global zone through this method of zone networking:

  • DHCPv4 and IPv6 stateless address autoconfiguration
  • IP Filter, including network address translation (NAT) functionality
  • IP Network Multipathing (IPMP)
  • IP routing
  • ndd for setting TCP/UDP/SCTP as well as IP/ARP-level knobs
  • IP security (IPsec) 



Now getting to the actual purpose of the article. The conversion of a zone network configuration from shared-ip to exclusive-ip.

So, here we have a zone configured with shared-ip networking:

root@sandbox:/# zonecfg -z auto-zone info
zonename: auto-zone
zonepath: /zones/auto-zone
brand: native
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
net:
        address: 192.168.87.144/24
        physical: e1000g0
        defrouter: 192.168.87.2


To modiy the IP type, enter the configuration menu/setup by typing zonecfg -z <zone_name> and type:

zonecfg:auto-zone> set ip-type=exclusive

I tried to modify the existing net resource to make it exclusive-ip but it didn't work.

zonecfg:auto-zone> select net address=192.168.87.144/24
zonecfg:auto-zone:net> info
net:
        address: 192.168.87.144/24
        physical: e1000g0
        defrouter: 192.168.87.2
zonecfg:auto-zone:net> remove defrouter 192.168.87.2
zonecfg:auto-zone:net> set physical=e1000g1

I couldn't get rid of the address property therefore I removed the net resource and added it again.

zonecfg:auto-zone> remove net address=192.168.87.144/24
zonecfg:auto-zone> info ip-type
ip-type: exclusive


zonecfg:auto-zone> add net
zonecfg:auto-zone:net> set physical=e1000g1
zonecfg:auto-zone:net> end
zonecfg:auto-zone> info
zonename: auto-zone
zonepath: /zones/auto-zone
brand: native
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: exclusive
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
net:
        address not specified
        physical: e1000g1
        defrouter not specified
zonecfg:auto-zone> verify
zonecfg:auto-zone> commit
zonecfg:auto-zone> exit


to verify that the NIC e1000g1 is indeed exclusively assigned to the zone we can use the following command to verify:

oot@sandbox:/# dladm show-linkprop
LINK         PROPERTY        VALUE          DEFAULT        POSSIBLE
e1000g0      zone            --             --             --
e1000g0      tagmode         vlanonly       vlanonly       vlanonly,normal
e1000g1      zone            auto-zone      --             --
e1000g1      tagmode         vlanonly       vlanonly       vlanonly,normal
e1000g2      zone            --             --             --
e1000g2      tagmode         vlanonly       vlanonly       vlanonly,normal
root@sandbox:/#


Next we login to the zone and configure the IP address on the interface:

bash-3.00# ifconfig e1000g1 plumb
bash-3.00# ifconfig e1000g1 192.168.87.144 netmask 255.255.255.0 up
bash-3.00# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 192.168.87.144 netmask ffffff00 broadcast 192.168.87.255
        ether 0:c:29:59:30:ba
bash-3.00# route -p add default 192.168.87.2
add net default: gateway 192.168.87.2
add persistent net default: gateway 192.168.87.2

bash-3.00# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              192.168.87.2         UG        1          0
192.168.87.0         192.168.87.144       U         1          0 e1000g1
127.0.0.1            127.0.0.1            UH        5        126 lo0
bash-3.00#

Let's verify the correctness of our setup by attempting to get a successful ping off the default route:

bash-3.00# ping 192.168.87.2
192.168.87.2 is alive

Everything appears to be in order.

Let's try to connect to the zones' IP from outside the zone.

[user.DESKTOP-4NUE93O] ➤ ssh 192.168.87.144
Warning: Permanently added '192.168.87.144' (RSA) to the list of known hosts.
user@192.168.87.144's password:

Looks good. Now let's make the IP address configuration persistent followed by a reboot and verification.

bash-3.00# echo "192.168.87.144" > /etc/hostname.e1000g1
bash-3.00# cat /etc/hostname.e1000g1
192.168.87.144
bash-3.00# init 6
bash-3.00#
[Connection to zone 'auto-zone' pts/4 closed]
root@sandbox:/# zlogin auto-zone
[Connected to zone 'auto-zone' pts/4]
Last login: Fri Oct  6 22:11:27 on pts/4
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# bash
bash-3.00# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 192.168.87.144 netmask ffffff00 broadcast 192.168.87.255
        ether 0:c:29:59:30:ba
bash-3.00#

Using capture groups in grep in Linux

Introduction Let me start by saying that this article isn't about capture groups in grep per se. What we are going to do here with gr...