A system engineer's notes: 2017

Thursday, 16 November 2017

How to panic a guest domain in Solaris

I recently came across a Solaris 10 guest domain in a hung state.
I accessed its console from the primary domain but I was not able to see a login prompt or anything for that matter.

I was able to ping the server but unable to login to it via ssh.

Hence we decided to reboot the guest domain but we wanted to make sure that a crash dump was generated which could be shared with Oracle support for further analysis.

We decided to induce a kernel panic in the guest domain to ensure the generation of a crash dump on system restart.

The command used to accomplish this is ldm panic-domain.

[sahil@primary-domain-p:~] $ sudo ldm panic-domain test-domain-g
[sahil@primary-domain-p:~] $ sudo ldm list test-domain-g
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
test-domain-g active -t---- 5002 64 158G 100% 100% 157d 1h
[sahil@primary-domain-p:~] $ sudo console test-domain-g
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Connecting to console "test-domain-g" in group "test-domain-g" ....
Press ~? for control options ..
6:21 100% done
100% done: 1523171 pages dumped, dump succeeded
rebooting...
Resetting...
NOTICE: Entering OpenBoot.
NOTICE: Fetching Guest MD from HV.
NOTICE: Starting additional cpus.
NOTICE: Initializing LDC services.
NOTICE: Probing PCI devices.
NOTICE: Finished PCI probing.

The ldm panic-domain command ensured that a crash dump was generated when the guest domain underwent a reboot.

I hope this quick tip was helpful.

Monday, 6 November 2017

Shutdown a zone stuck in down state

While working on a patching activity I came across an issue wherein the swap file system temporarily mounted for a zone did not get unmounted properly during the installpatchset phase.

swap 295G 8K 295G 1% /zones/lab-zone/lu

From the zoneadm list output, I observed that the zone to which the above file system belonged to was somehow stuck in down state.

usport-lab-g# zoneadm list -icv

ID NAME STATUS PATH BRAND IP

0 global running / native shared

7 lab-zone down /zones/lab-zone native shared

Further investigation revealed that the zoneadmd process for the zone was still active.

usport-lab-g# ps -ef | grep zoneadmd

root 6962 6863 0 06:31:49 pts/1 0:00 grep zoneadmd

root 21890 1 0 06:12:45 ? 0:01 zoneadmd -z lab-zone

I forcefully terminated this process with the kill command.

usport-lab-g# kill -9 21890

usport-lab-g# ps -ef | grep zoneadmd

root 7025 6863 0 06:32:00 pts/1 0:00 grep zoneadmd

This fixed the problem and the zone was now in the installed state as I anticipated.

[ssuri@usport-lab-g:~] $ sudo zoneadm list -icv

ID NAME STATUS PATH BRAND IP

0 global running / native shared

- lab-zone installed /zones/lab-zone native shared

I hope this quick tip was helpful for you and I thank you for reading.

Monday, 16 October 2017

Installing openssh on Ubuntu 16.04 (with/without internet access)

I recently installed Ubuntu 16.04 server edition and found that there was no ssh access available. On checking from the VM console I found out that openssh was not installed. So in this article, I'll share the steps I followed to install openssh server on my machine.

Once the VM was up it was on the network with an IP address so it had access to the default repositories available on the internet. Simply type apt-get install ssh to install openssh-server and its dependent packages.

Once done verify that the required service file is now available in /etc/systemd/system directory.

Finally, start and enable the service.

That's fine and easy but in the case, we're behind a corporate firewall without internet access then let's see what we can do.

Mount the ubuntu iso on a temporary mount point like /mnt as I've done and then go /mnt/pool/main/o/openssh directory.

From here you install the package using dpkg -i openssh-server_7.2p2-4ubuntu1_amd64.deb.

You can check it's dependencies and install them first. to check dependencies type dpkg -I openssh-server_7.2p2-4ubuntu1_amd64.deb

I hope this article has been helpful and I thank you for reading through it.

Sunday, 15 October 2017

Avoid extra typing "grep -v grep"

We frequently use grep filter to filter and print strings of characters that we look for in a file or the output of a command.

We might be searching for a process in the ps- ef command's output and we end up with the grep command itself being displayed in the results.

For example, if I use grep to search for sshd processes in the 'ps -ef' output I get the following result:

[root@pbox6 ~]# ps -ef | grep ssh
root       1823      1  0 10:20 ?        00:00:00 /usr/sbin/sshd

root       2656   1823  1 10:22 ?        00:00:00 sshd: root@pts/1

root       2660   1823  0 10:22 ?        00:00:00 sshd: root [priv]

sshd       2661   2660  0 10:22 ?        00:00:00 sshd: root [net]

root       2683   2662  0 10:22 pts/1    00:00:00 grep ssh

This could be an issue if we intend to count the number of processes and use the subsequent result in a script.

An option to remove grep from the search result would be to pipe the output to "grep -v grep".


[root@pbox6 ~]# ps -ef | grep ssh | grep -v grep

root       1823      1  0 10:20 ?        00:00:00 /usr/sbin/sshd

root       2656   1823  0 10:22 ?        00:00:00 sshd: root@pts/1

root       2660   1823  0 10:22 ?        00:00:00 sshd: root@notty

root       2686   2660  0 10:22 ?        00:00:00 /usr/libexec/openssh/sftp-server

But in an effort to avoid typing more than we need to, we could just enclose the first or last character of the string being searched for in square brackets to denote a character class and doing so would omit the grep command itself from showing up in the search result.



[root@pbox6 ~]# ps -ef | grep [s]sh

root       1823      1  0 10:20 ?        00:00:00 /usr/sbin/sshd

root       2656   1823  0 10:22 ?        00:00:00 sshd: root@pts/1

root       2660   1823  0 10:22 ?        00:00:00 sshd: root@notty

root       2686   2660  0 10:22 ?        00:00:00 /usr/libexec/openssh/sftp-server

[root@pbox6 ~]#

[root@pbox6 ~]# ps -ef | grep ss[h]

root       1823      1  0 10:20 ?        00:00:00 /usr/sbin/sshd

root       2656   1823  0 10:22 ?        00:00:00 sshd: root@pts/1

root       2660   1823  0 10:22 ?        00:00:00 sshd: root@notty

root       2686   2660  0 10:22 ?        00:00:00 /usr/libexec/openssh/sftp-server

I hope this quick type has been helpful for you.

Saturday, 14 October 2017

Workaround for "Error getting private key" error while starting realvnc on Linux

Recently we received a user complaint that they were unable to access a server through vncviewer.
When I checked I found that the service was not running and when I attempted to start the service I got the below message:

[root@pbox bin]# /etc/init.d/vncserver start
Starting VNC server: 1:vncuser xhost: unable to open display ""
chmod: cannot access `/tmp/.Xauthority-vncuser': No such file or directory
VNC(R) Server 5.3.2 (r19179) x64 (Jun 6 2016 19:59:17)
Copyright (C) 2002-2016 RealVNC Ltd.
RealVNC and VNC are trademarks of RealVNC Ltd and are protected by trademark
registrations and/or pending trademark applications in the European Union,
United States of America and other jurisdictions.
Protected by UK patent 2481870; US patent 8760366.
See http://www.realvnc.com for information on VNC.
For third party acknowledgements see:
http://www.realvnc.com/products/vnc/documentation/5.3/acknowledgements.txt

Error getting private key from /var/home/vncuser/.vnc/private.key: End of stream
Underlying X server release 609000, The X.Org Foundation

error opening security policy file /usr/X11R6/lib/X11/xserver/SecurityPolicy
Could not init font path element /usr/X11R6/lib/X11/fonts/misc/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/TTF/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/Type1/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/75dpi/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/100dpi/, removing from list!
FreeFontPath: FPE "/usr/share/vnc/fonts/" refcount is 2, should be 1; fixing.
2:vncuser xhost: unable to open display ""
chmod: cannot access `/tmp/.Xauthority-vncuser': No such file or directory
VNC(R) Server 5.3.2 (r19179) x64 (Jun 6 2016 19:59:17)
Copyright (C) 2002-2016 RealVNC Ltd.
RealVNC and VNC are trademarks of RealVNC Ltd and are protected by trademark
registrations and/or pending trademark applications in the European Union,
United States of America and other jurisdictions.
Protected by UK patent 2481870; US patent 8760366.
See http://www.realvnc.com for information on VNC.
For third party acknowledgements see:
http://www.realvnc.com/products/vnc/documentation/5.3/acknowledgements.txt

Error getting private key from /var/home/vncuser/.vnc/private.key: End of stream
Underlying X server release 609000, The X.Org Foundation

error opening security policy file /usr/X11R6/lib/X11/xserver/SecurityPolicy
Could not init font path element /usr/X11R6/lib/X11/fonts/misc/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/TTF/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/Type1/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/75dpi/, removing from list!
Could not init font path element /usr/X11R6/lib/X11/fonts/100dpi/, removing from list!
FreeFontPath: FPE "/usr/share/vnc/fonts/" refcount is 2, should be 1; fixing.
[ OK ]

I wasn't able to query the status of the service either:

[root@pbox init.d]# service vncserver status
Xvnc dead but subsys locked

This error pointed me to the /var/lock/subsys/Xvnc file which I removed and then restarted the service but it did not work.

If I attempted to stop the service then that also did not succeed in the first attempt.

[root@pbox ~]# service vncserver stop
Shutting down VNC server: 1:vncuser 2:vncuser [FAILED]

Although it did work the second time I tried.

[root@pbox subsys]# service vncserver stop
Shutting down VNC server: [ OK ]

When I checked the /var/home/vncuser/.vnc/private.key I found it to be empty.

[root@pbox .vnc]# ls -l private.key
-rw------- 1 vncuser vncuser 0 Oct 5 04:12 private.key

I restarted the vnc service multiple times and even installed realvnc again but that did not work. According to the documentation I found on realvnc, the private key should've been regenerated after a restart of the service or created at least when I re-installed the software but that did not happen.

I finally ended up copying the private key from a server on which realvnc was already running and started the service on the problematic server and it finally worked.

[root@pbox ~]# service vncserver restart
Shutting down VNC server: 1:vncuser 2:vncuser [ OK ]
Starting VNC server: 1:vncuser xhost: unable to open display ""
chmod: cannot access `/tmp/.Xauthority-vncuser': No such file or directory
VNC(R) Server 5.3.2 (r19179) x64 (Jun 6 2016 19:59:17)
Copyright (C) 2002-2016 RealVNC Ltd.
RealVNC and VNC are trademarks of RealVNC Ltd and are protected by trademark
registrations and/or pending trademark applications in the European Union,
United States of America and other jurisdictions.
Protected by UK patent 2481870; US patent 8760366.
See http://www.realvnc.com for information on VNC.
For third party acknowledgements see:
http://www.realvnc.com/products/vnc/documentation/5.3/acknowledgements.txt

If a desktop environment fails to load for this virtual desktop, please see:
http://www.realvnc.com/doclink/kb-345?version=5.3.2.19179
Running applications in /var/home/vncuser/.vnc/xstartup

VNC Server catchphrase: "Member barcode connect. Desire college gong."
signature: 88-c7-cb-1a-2c-9b-90-31

Log file is /var/home/vncuser/.vnc/pbox.dev.test.org:1.log
New desktop is pbox.dev.test.org:1 (10.22.217.69:1)
2:vncuser xhost: unable to open display ""
chmod: cannot access `/tmp/.Xauthority-vncuser': No such file or directory
VNC(R) Server 5.3.2 (r19179) x64 (Jun 6 2016 19:59:17)
Copyright (C) 2002-2016 RealVNC Ltd.
RealVNC and VNC are trademarks of RealVNC Ltd and are protected by trademark
registrations and/or pending trademark applications in the European Union,
United States of America and other jurisdictions.
Protected by UK patent 2481870; US patent 8760366.
See http://www.realvnc.com for information on VNC.
For third party acknowledgements see:
http://www.realvnc.com/products/vnc/documentation/5.3/acknowledgements.txt

If a desktop environment fails to load for this virtual desktop, please see:
http://www.realvnc.com/doclink/kb-345?version=5.3.2.19179
Running applications in /var/home/vncuser/.vnc/xstartup

VNC Server catchphrase: "Member barcode connect. Desire college gong."
signature: 88-c7-cb-1a-2c-9b-90-31

Log file is /var/home/vncuser/.vnc/pbox.dev.test.org:2.log
New desktop is pbox.dev.test.org:2 (10.22.217.69:2)
[ OK ]
[root@pbox ~]#
[root@pbox ~]#
[root@pbox ~]# ps -ef | grep [v]nc
vncuser 49576 1 0 02:05 ? 00:00:00 /usr/bin/Xvnc-core :1 -auth /var/home/vncuser/.Xauthority -pn -geometry 800x600 -nolisten tcp
root 49577 49576 0 02:05 ? 00:00:00 /usr/bin/Xvnc-realvnc -rootHelper 816219 4
vncuser 49608 1 0 02:05 ? 00:00:00 /bin/sh /etc/vnc/xstartup
vncuser 49630 49608 0 02:05 ? 00:00:00 xterm -geometry 80x24+10+10 -ls
vncuser 49632 49608 0 02:05 ? 00:00:00 twm
vncuser 49654 49630 0 02:05 pts/2 00:00:00 -bash
vncuser 49667 1 0 02:05 ? 00:00:00 /usr/bin/Xvnc-core :2 -auth /var/home/vncuser/.Xauthority -pn -geometry 1600x1200 -nolisten tcp
root 49671 49667 0 02:05 ? 00:00:00 /usr/bin/Xvnc-realvnc -rootHelper 816219 4
vncuser 49711 1 0 02:05 ? 00:00:00 /bin/sh /etc/vnc/xstartup
vncuser 49731 49711 0 02:05 ? 00:00:00 xterm -geometry 80x24+10+10 -ls
vncuser 49732 49711 0 02:05 ? 00:00:00 twm
vncuser 49734 49731 0 02:05 pts/3 00:00:00 -bash
vncuser 49753 49576 0 02:05 ? 00:00:00 /usr/bin/vncserverui virtual 13
vncuser 49799 49753 0 02:05 ? 00:00:00 /usr/bin/vncserverui -statusicon 5
vncuser 49800 49667 0 02:05 ? 00:00:00 /usr/bin/vncserverui virtual 13
vncuser 49822 49800 0 02:05 ? 00:00:00 /usr/bin/vncserverui -statusicon 5

This is definitely not ideal but a quick fix just to keep things going.

Sunday, 8 October 2017

Fixing NTP sync issues in RHEL 6

In this article I'll be exploring two distinct issues I faced with ntp sync wherein the servers were not able to properly sync their time with ntp servers.

We observed a high offset when we checked the ntpq -p output.

I did a couple of stop-start operations followed by troubleshooting procedure outlined in RedHat KB articles 35640 and 64868 whose logs I'm sharing below:

[ssuri@usporiainfrar00:~]' $ sudo ntpq
ntpq> peers
remote refid st t when poll reach delay offset jitter
==============================================================================
+ntp.nova.org .GPS. 1 u 43 64 377 0.189 309.823 175.931
*mtntime.emrets. .GPS. 1 u 44 64 377 13.401 309.833 174.006
xusporz-infrac15 10.16.64.15 3 u 43 64 377 0.208 286.078 169.984
xgblonz-infrac07 10.16.64.15 3 u 39 64 377 91.365 287.473 167.180
ntpq> as

ind assid status conf reach auth condition last_event cnt
===========================================================
1 57822 941a yes yes none candidate sys_peer 1
2 57823 961a yes yes none sys.peer sys_peer 1
3 57824 9124 yes yes none falsetick reachable 2
4 57825 9124 yes yes none falsetick reachable 2
ntpq> rv 57822
associd=57822 status=941a conf, reach, sel_candidate, 1 event, sys_peer,
srcadr=ntp.nova.org, srcport=123, dstadr=10.16.216.184, dstport=123,
leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.290,
refid=GPS, reftime=dd6328c0.a7684541 Wed, Sep 13 2017 3:47:12.653,
rec=dd6328c3.fc501541 Wed, Sep 13 2017 3:47:15.985, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=29, flash=00 ok,
keyid=0, offset=504.129, delay=0.171, dispersion=1.525, jitter=183.751,
xleave=0.028,
filtdelay= 0.19 0.17 0.19 0.22 0.21 0.19 0.20 0.21,
filtoffset= 553.93 504.13 454.51 405.18 356.99 309.82 250.70 191.10,
filtdisp= 0.00 0.96 1.94 2.93 3.92 4.91 5.88 6.87
ntpq> q

[ssuri@usporiainfrar00:~]' $ sudo ntpdate -u ntp.nova.org
13 Sep 03:51:50 ntpdate[87098]: step time server 10.16.64.124 offset 0.757074 sec
[ssuri@usporiainfrar00:~]' $ sudo ntpdate -d ntp.nova.org
13 Sep 03:51:59 ntpdate[87462]: ntpdate 4.2.6p5@1.2349-o' Tue May 3 15:12:51 UTC 2016 (1)
Looking for host ntp.nova.org and service ntp
host found : ntp.nova.org
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
server 10.16.64.124, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.02580, dispersion 0.00000
transmitted 4, in filter 4
reference time: dd6329d6.e0896c6a Wed, Sep 13 2017 3:51:50.877
originate timestamp: dd6329df.4e7f4739 Wed, Sep 13 2017 3:51:59.306
transmit timestamp: dd6329df.4d0d781a Wed, Sep 13 2017 3:51:59.300
filter delay: 0.02585 0.02580 0.02580 0.02585
0.00000 0.00000 0.00000 0.00000
filter offset: 0.005469 0.005474 0.005467 0.005453
0.000000 0.000000 0.000000 0.000000
delay 0.02580, dispersion 0.00000
offset 0.005474

13 Sep 03:51:59 ntpdate[87462]: adjust time server 10.16.64.124 offset 0.005474 sec

We also did some comparisons with the system hardware clock and system time set by ntp by running the following command snippet:

s(){ printf "\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n";}; s; printf '`date`:\t '; date; printf '`sudo hwclock`: ';sudo hwclock; s; printf '`sudo ntpq -pn`:\n\n'; sudo ntpq -pn; s; printf '`sudo ntpq -c as`:\n'; sudo ntpq -c as; for id in $(sudo ntpq -c as | awk '/^ ./{print $2}'); do s; printf "\`sudo ntpq -c \"rv $id\"\`:\n\n"; sudo ntpq -c "rv $id"; done

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `date`: Wed Sep 13 05:23:47 UTC 2017 `sudo hwclock`: Wed 13 Sep 2017 05:23:48 AM UTC -0.343942 seconds ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -pn`: remote refid st t when poll reach delay offset jitter ============================================================================== *10.16.64.124 .GPS. 1 u 14 64 377 0.160 428.657 142.242 +10.20.64.124 .GPS. 1 u 40 64 377 13.456 416.054 136.215 +10.16.64.11 10.16.64.15 3 u 30 64 377 0.214 411.056 141.797 +10.24.64.11 10.16.64.15 3 u 28 64 377 91.317 419.916 141.874 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c as`: ind assid status conf reach auth condition last_event cnt =========================================================== 1 33492 961a yes yes none sys.peer sys_peer 1 2 33493 943a yes yes none candidate sys_peer 3 3 33494 9414 yes yes none candidate reachable 1 4 33495 9414 yes yes none candidate reachable 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33492"`: associd=33492 status=961a conf, reach, sel_sys.peer, 1 event, sys_peer, srcadr=ntp.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.305, refid=GPS, reftime=dd633f51.48ffa52c Wed, Sep 13 2017 5:23:29.285, rec=dd633f55.f6c4f89f Wed, Sep 13 2017 5:23:33.963, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=50, flash=00 ok, keyid=0, offset=428.657, delay=0.160, dispersion=0.980, jitter=142.242, xleave=0.033, filtdelay= 0.16 0.20 0.18 0.15 0.25 0.23 0.19 0.20, filtoffset= 428.66 397.34 365.10 333.82 302.01 269.84 237.60 205.38, filtdisp= 0.00 1.01 2.04 3.05 4.07 5.10 6.14 7.17 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33493"`: associd=33493 status=943a conf, reach, sel_candidate, 3 events, sys_peer, srcadr=mtntime.emrets.net, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.458, refid=GPS, reftime=dd633f2b.eeaa5a6e Wed, Sep 13 2017 5:22:51.932, rec=dd633f3b.031f6da1 Wed, Sep 13 2017 5:23:07.012, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=21, flash=00 ok, keyid=0, offset=416.054, delay=13.456, dispersion=0.947, jitter=136.215, xleave=0.026, filtdelay= 13.46 13.42 13.43 13.45 13.42 13.50 13.50 13.47, filtoffset= 416.05 385.25 355.33 324.97 294.15 264.31 232.99 202.63, filtdisp= 0.00 0.99 1.95 2.93 3.92 4.88 5.88 6.86 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33494"`: associd=33494 status=9414 conf, reach, sel_candidate, 1 event, reachable, srcadr=usstlz-pinfdc15.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=3, precision=-6, rootdelay=62.500, rootdisp=121.170, refid=10.16.64.15, reftime=dd633ccb.357101ca Wed, Sep 13 2017 5:12:43.208, rec=dd633f45.f8454467 Wed, Sep 13 2017 5:23:17.969, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=32, flash=00 ok, keyid=0, offset=411.056, delay=0.214, dispersion=16.513, jitter=141.797, xleave=0.018, filtdelay= 0.21 0.27 0.27 0.22 0.23 0.26 0.27 0.26, filtoffset= 411.06 377.74 348.50 307.74 276.47 246.26 226.04 197.64, filtdisp= 15.63 16.60 17.61 18.57 19.57 20.58 21.57 22.54 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33495"`: associd=33495 status=9414 conf, reach, sel_candidate, 1 event, reachable, srcadr=gblonz-pinfdc07.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=3, precision=-6, rootdelay=109.375, rootdisp=122.223, refid=10.16.64.15, reftime=dd633c4d.a4ae07e5 Wed, Sep 13 2017 5:10:37.643, rec=dd633f47.0cce5653 Wed, Sep 13 2017 5:23:19.050, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=33, flash=00 ok, keyid=0, offset=419.916, delay=91.317, dispersion=16.524, jitter=141.874, xleave=0.010, filtdelay= 91.32 91.61 91.40 91.13 91.36 92.86 91.49 91.38, filtoffset= 419.92 388.51 354.44 316.21 298.64 270.20 226.80 195.30, filtdisp= 15.63 16.62 17.62 18.61 19.60 20.58 21.58 22.57

We disabled slew mode for ntp as shown below by updating the OPTIONS parameter in etc/sysconfig/ntpd file:

# grep -i ntpd ps ntp 1082 0.0 0.0 26520 1980 ? Ss 03:09 0:00 ntpd -x -u ntp:ntp -p /var/run/ntpd.pid -g -4

# grep -v "^#" /etc/sysconfig/ntpd
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid -g -4"

After that we restarted the service:

# ps -ef | grep -E ^ntp
ntp 59539 1 0 04:48 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g -4

After this we did a stop and start of the service a few times and a manual update with the ntp server to finally fix the high offset problem.

sudo ntpdate -u ntp.nova.org
13 Sep 02:20:49 ntpdate[19667]: step time server 10.16.64.124 offset -42.674761 sec

Just a note that apart from the troubleshooting we did here to fix the high offset, this problem could also be caused by network latency so that is worth checking out.

In the second scenario we were again facing high offsets on one of our linux servers but in this case the time displayed by the hardware clock was accurate.

We did a few stop, start and restart operations along with a manual update with the ntp server but with no success.

[ssuri@usporinfradr00:~] $ sudo nptdate -u ntpls.nova.org
7 Oct 09:13:38 ntpdate[14262]: step time server 10.20.64.124 offset -173.793480 sec
[ssuri@usporinfradr00:~] $ sudo ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*ntp.nova.org .GPS. 1 u 4971 64 1 0.087 -173811 0.000
10.16.64.124 .INIT. 16 u - 64 0 0.000 0.000 0.000
usporz-infrac21 10.16.64.15 3 u 4971 64 1 0.183 -173819 0.000
gblonz-infrac07 10.16.64.15 3 u 4971 64 1 101.189 -173806 0.000

The system hardware clock was accurate though.

[ssuri@usporinfradr00:~] $ sudo hwclock -r

Sat 07 Oct 2017 09:18:34 AM UTC -0.485089 seconds

[ssuri@usporinfradr00:~] $ date

Sat Oct 7 09:16:05 UTC 2017

We then tried to sync the system and hardware clocks by executing hwclock -s but to our surprise now the hardware clock was also out of sync.

[ssuri@usporinfradr00:~] $ sudo hwclock -s

[ssuri@usporinfradr00:~] $ date

Sat Oct 7 09:20:18 UTC 2017

[root@usporinfradr00 ~]# hwclock -r

Sat 07 Oct 2017 09:20:38 AM UTC -0.265782 seconds

We did a manual update with our ntp server again followed by a restart of the service.

[root@usporinfradr00 ~]# sudo ntpdate -u ntpls.nova.org

7 Oct 09:18:05 ntpdate[88723]: step time server 10.20.64.124 offset -173.991557 sec

[root@usporinfradr00 ~]# sudo ntpq -p

remote refid st t when poll reach delay offset jitter

==============================================================================

ntpserver.nova .GPS. 1 u 4971 64 37 0.076 82.388 123032.

10.16.64.124 .INIT. 16 u - 64 0 0.000 0.000 0.000

usporz-infrac21 10.16.64.15 3 u 4971 64 37 0.161 70.839 123031.

gblonz-infrac07 10.16.64.15 3 u 4971 64 37 101.189 -173806 150569.

[root@usporinfradr00 ~]# sudo ntpq -pservice ntpd restart

Shutting down ntpd: [ OK ]

Starting ntpd: [ OK ]

This finally corrected the offset and brought the system time in sync with the hardware clock as well.

[root@usporinfradr00 ~]# date

Sat Oct 7 09:18:56 UTC 2017

[root@usporinfradr00 ~]# hwclock -r

Sat 07 Oct 2017 09:19:26 AM UTC -0.031508 seconds

[root@usporinfradr00 ~]# date

Sat Oct 7 09:19:30 UTC 2017

[root@usporinfradr00 ~]# sudo ntpq -p

remote refid st t when poll reach delay offset jitter

==============================================================================

*ntpls.nova.org .GPS. 1 u 14 64 3 0.081 9.357 32.137

10.16.64.124 .INIT. 16 u - 64 0 0.000 0.000 0.000

usporz-infrac21 10.16.64.15 3 u 9 64 3 0.170 10.491 32.264

gblonz-infrac07 10.16.64.15 3 u 10 64 3 101.359 22.452 32.468

[root@usporinfradr00 ~]#

[root@usporinfradr00 ~]# date

Sat Oct 7 09:20:21 UTC 2017

About ping timeouts in Solaris and Linux

While writing a script for checking ping response from a couple of servers I ran into some issues while setting timeouts for the pings. I was setting a timeout of 2 or 3 seconds but the ping command was still taking much longer to time out for the unreachable hosts.

Finally I realised that this was because of the time spent on name resolution. The ping responses came into affect only after name resolution or DNS query timed out.

In this article I'll demonstrate what I mentioned above.

Solaris:
The default ping timeout is 20 seconds. We can set a custom timeout by specifying it in seconds in the ping command as: ping <host> <timeout>

root@sandbox:/# time ping google 2
ping: unknown host google

real 0m21.452s
user 0m0.001s
sys 0m0.002s

In the above example the ping should've ideally timed out in just 2 seconds but it actually took almost 22 seconds. The reason being name resolution time out.

The workaround is to use IP addresses instead of names or specify a timeout in the /etc/resolv.conf file.

Here's an example of trying to ping a non-reachable IP address instead of hostname:

root@sandbox:/# time ping 1.2.3.4

no answer from 1.2.3.4

real 0m20.002s

user 0m0.002s

sys 0m0.008s

root@sandbox:/# time ping 1.2.3.4 2

no answer from 1.2.3.4

real 0m2.002s

user 0m0.001s

sys 0m0.003s

Linux:

The same name resolution delay is encountered while specifying a timeout with -w while working on Linux.

[root@pbox6 ~]# time ping -w 1 google

ping: unknown host google

real 0m10.013s

user 0m0.001s

sys 0m0.001s

The ping should've timed out after 1 second but took 10 seconds instead.

The fix is the same as in case of solaris. Either use IP addresses or specify a timeout for DNS resolution in the /etc/resolv.conf file.

[root@pbox6 ~]# time ping -w 1 1.2.3.4

PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data.

--- 1.2.3.4 ping statistics ---

2 packets transmitted, 0 received, 100% packet loss, time 1000ms

real 0m1.009s

user 0m0.000s

sys 0m0.007s

Saturday, 7 October 2017

Installing oracle XE in CentOS 6

In this article I'll describe how we can install oracle 11g express edition in CentOS 6. The download is a zip file oracle-xe-11.2.0-1.0.x86_64.rpm.zip.

We get started by extracting the zip file and installing the rpm.

[root@walk XE]# ls
oracle-xe-11.2.0-1.0.x86_64.rpm.zip
[root@walk XE]# unzip oracle-xe-11.2.0-1.0.x86_64.rpm.zip
Archive: oracle-xe-11.2.0-1.0.x86_64.rpm.zip
creating: Disk1/
creating: Disk1/upgrade/
inflating: Disk1/upgrade/gen_inst.sql
creating: Disk1/response/
inflating: Disk1/response/xe.rsp
inflating: Disk1/oracle-xe-11.2.0-1.0.x86_64.rpm
[root@walk XE]#

[root@walk Disk1]# rpm -ivh oracle-xe-11.2.0-1.0.x86_64.rpm
Preparing... ########################################### [100%]
1:oracle-xe ########################################### [100%]
Executing post-install steps...
You must run '/etc/init.d/oracle-xe configure' as the root user to configure the database.

Next, as directed we proceed to launch the database configuration wizard.

[root@walk ~]# /etc/init.d/oracle-xe configure

Oracle Database 11g Express Edition Configuration
-------------------------------------------------
This will configure on-boot properties of Oracle Database 11g Express
Edition. The following questions will determine whether the database should
be starting upon system boot, the ports it will use, and the passwords that
will be used for database accounts. Press <Enter> to accept the defaults.
Ctrl-C will abort.

Specify the HTTP port that will be used for Oracle Application Express [8080]:

Specify a port that will be used for the database listener [1521]:

Specify a password to be used for database accounts. Note that the same
password will be used for SYS and SYSTEM. Oracle recommends the use of
different passwords for each database account. This can be done after
initial configuration:
Confirm the password:

Do you want Oracle Database 11g Express Edition to be started on boot (y/n) [y]:y

Starting Oracle Net Listener...Done
Configuring database...Done
Starting Oracle Database 11g Express Edition instance...Done
Installation completed successfully.

This takes care of creating the required oracle user and dba group along with creating and populating the /u01 directory with all the content that the DB will need.

This also installs oracle-xe script in /etc/init.d to control the database through init.

[root@walk ~]# ls -l /etc/init.d/oracle-xe
-rwxr-xr-x. 1 root root 19592 Aug 29 2011 /etc/init.d/oracle-xe

We can treat the DB instance as a service and view it's status like any other init service:

[root@walk ~]# /etc/init.d/oracle-xe status

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 07-OCT-2017 11:18:15

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC_FOR_XE)))

STATUS of the LISTENER

------------------------

Alias LISTENER

Version TNSLSNR for Linux: Version 11.2.0.2.0 - Production

Start Date 07-OCT-2017 10:18:59

Uptime 0 days 0 hr. 59 min. 16 sec

Trace Level off

Security ON: Local OS Authentication

SNMP OFF

Default Service XE

Listener Parameter File /u01/app/oracle/product/11.2.0/xe/network/admin/listener.ora

Listener Log File /u01/app/oracle/diag/tnslsnr/walk/listener/alert/log.xml

Listening Endpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC_FOR_XE)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=walk)(PORT=1521)))

Services Summary...

Service "PLSExtProc" has 1 instance(s).

Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...

Service "XE" has 1 instance(s).

Instance "XE", status READY, has 1 handler(s) for this service...

Service "XEXDB" has 1 instance(s).

Instance "XE", status READY, has 1 handler(s) for this service...

The command completed successfully

Now we will switch to the oracle user, set up our environment and connect to the database via SQLplus.

-bash-4.1$ id -a oracle

uid=500(oracle) gid=500(dba) groups=500(dba)

-bash-4.1$ cat /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh

export ORACLE_HOME=/u01/app/oracle/product/11.2.0/xe

export ORACLE_SID=XE

export NLS_LANG=`$ORACLE_HOME/bin/nls_lang.sh`

export PATH=$ORACLE_HOME/bin:$PATH

-bash-4.1$ . /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh

As shown the environment variables are available in the script oracle_env.sh and we need to source it for the variables to come into effect. To avoid running the script at each login we could just add these variables to the .bash_profile file for the oracle user.

Now let's launch SQLplus.

-bash-4.1$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Sat Oct 7 10:27:42 2017

Connected to:

Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production

SQL>

Connecting to the instance via "/ as sysdba" is like connecting as root user in Linux. This denotes full administrative access over the instance.

Let's query the v$instance and v$database views to take a look at the instance status:

SQL> SELECT INSTANCE_NAME, STATUS, DATABASE_STATUS FROM V$INSTANCE;

INSTANCE_NAME STATUS DATABASE_STATUS

---------------- ------------ -----------------

XE OPEN ACTIVE

SQL> SELECT NAME,CREATED,LOG_MODE,OPEN_MODE FROM V$DATABASE;

NAME CREATED LOG_MODE OPEN_MODE

--------- ------------------ ------------ --------------------

XE 07-OCT-17 NOARCHIVELOG READ WRITE

From a system admin's perspective we tend to look for the pmon process to confirm if a DB is running or not.

[root@walk ~]# ps -ef | grep oracle | grep pmon

oracle 4853 1 0 10:22 ? 00:00:00 xe_pmon_XE

Friday, 6 October 2017

Changing a Solaris 10 zone's ip type from shared to exclusive

Zones in Solaris 10 are configured with IP type as shared by default whereas in case of Solaris 11 the default IP type is exclusive but that's a completely different story.

In Solaris 10 zones can have one of two IP types:

Shared-ip:
In this type of network setup the zone shares a network interface or data link with the global zone. When the zone boots a logical interface is created on top of the physical interface with the IP address we specify in the zonecfg configuration for the net resource. This logical interface stays as long as the zone is running and is removed once the zone halts and is re-created at next boot and so forth. In this way the zone itself doesn't really control it's networking stack.

Exclusive-ip:
In this setup the zone is given dedicated control of a physical network interface. We set the IP address and default route from within the zone and not through the zone's configuration done via zonecfg.
Here are some of the features bestowed upon the non-global zone through this method of zone networking:

DHCPv4 and IPv6 stateless address autoconfiguration
IP Filter, including network address translation (NAT) functionality
IP Network Multipathing (IPMP)
IP routing
ndd for setting TCP/UDP/SCTP as well as IP/ARP-level knobs
IP security (IPsec)

Now getting to the actual purpose of the article. The conversion of a zone network configuration from shared-ip to exclusive-ip.

So, here we have a zone configured with shared-ip networking:

root@sandbox:/# zonecfg -z auto-zone info
zonename: auto-zone
zonepath: /zones/auto-zone
brand: native
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
inherit-pkg-dir:
dir: /lib
inherit-pkg-dir:
dir: /platform
inherit-pkg-dir:
dir: /sbin
inherit-pkg-dir:
dir: /usr
net:
address: 192.168.87.144/24
physical: e1000g0
defrouter: 192.168.87.2

To modiy the IP type, enter the configuration menu/setup by typing zonecfg -z <zone_name> and type:

zonecfg:auto-zone> set ip-type=exclusive

I tried to modify the existing net resource to make it exclusive-ip but it didn't work.

zonecfg:auto-zone> select net address=192.168.87.144/24

zonecfg:auto-zone:net> info

net:

address: 192.168.87.144/24

physical: e1000g0

defrouter: 192.168.87.2

zonecfg:auto-zone:net> remove defrouter 192.168.87.2

zonecfg:auto-zone:net> set physical=e1000g1

I couldn't get rid of the address property therefore I removed the net resource and added it again.

zonecfg:auto-zone> remove net address=192.168.87.144/24

zonecfg:auto-zone> info ip-type

ip-type: exclusive

zonecfg:auto-zone> add net

zonecfg:auto-zone:net> set physical=e1000g1

zonecfg:auto-zone:net> end

zonecfg:auto-zone> info

zonename: auto-zone

zonepath: /zones/auto-zone

brand: native

autoboot: false

bootargs:

pool:

limitpriv:

scheduling-class:

ip-type: exclusive

inherit-pkg-dir:

dir: /lib

inherit-pkg-dir:

dir: /platform

inherit-pkg-dir:

dir: /sbin

inherit-pkg-dir:

dir: /usr

net:

address not specified

physical: e1000g1

defrouter not specified

zonecfg:auto-zone> verify

zonecfg:auto-zone> commit

zonecfg:auto-zone> exit

to verify that the NIC e1000g1 is indeed exclusively assigned to the zone we can use the following command to verify:

oot@sandbox:/# dladm show-linkprop

LINK PROPERTY VALUE DEFAULT POSSIBLE

e1000g0 zone -- -- --

e1000g0 tagmode vlanonly vlanonly vlanonly,normal

e1000g1 zone auto-zone -- --

e1000g1 tagmode vlanonly vlanonly vlanonly,normal

e1000g2 zone -- -- --

e1000g2 tagmode vlanonly vlanonly vlanonly,normal

root@sandbox:/#

Next we login to the zone and configure the IP address on the interface:

bash-3.00# ifconfig e1000g1 plumb

bash-3.00# ifconfig e1000g1 192.168.87.144 netmask 255.255.255.0 up

bash-3.00# ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2

inet 192.168.87.144 netmask ffffff00 broadcast 192.168.87.255

ether 0:c:29:59:30:ba

bash-3.00# route -p add default 192.168.87.2

add net default: gateway 192.168.87.2

add persistent net default: gateway 192.168.87.2

bash-3.00# netstat -rn

Routing Table: IPv4

Destination Gateway Flags Ref Use Interface

-------------------- -------------------- ----- ----- ---------- ---------

default 192.168.87.2 UG 1 0

192.168.87.0 192.168.87.144 U 1 0 e1000g1

127.0.0.1 127.0.0.1 UH 5 126 lo0

bash-3.00#

Let's verify the correctness of our setup by attempting to get a successful ping off the default route:

bash-3.00# ping 192.168.87.2

192.168.87.2 is alive

Everything appears to be in order.

Let's try to connect to the zones' IP from outside the zone.

[user.DESKTOP-4NUE93O] ➤ ssh 192.168.87.144

Warning: Permanently added '192.168.87.144' (RSA) to the list of known hosts.

user@192.168.87.144's password:

Looks good. Now let's make the IP address configuration persistent followed by a reboot and verification.

bash-3.00# echo "192.168.87.144" > /etc/hostname.e1000g1

bash-3.00# cat /etc/hostname.e1000g1

192.168.87.144

bash-3.00# init 6

bash-3.00#

[Connection to zone 'auto-zone' pts/4 closed]

root@sandbox:/# zlogin auto-zone

[Connected to zone 'auto-zone' pts/4]

Last login: Fri Oct 6 22:11:27 on pts/4

Sun Microsystems Inc. SunOS 5.10 Generic January 2005

# bash

bash-3.00# ifconfig -a

lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1

inet 127.0.0.1 netmask ff000000

e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2

inet 192.168.87.144 netmask ffffff00 broadcast 192.168.87.255

ether 0:c:29:59:30:ba

bash-3.00#

A system engineer's notes

Thursday, 16 November 2017

How to panic a guest domain in Solaris

Monday, 6 November 2017

Shutdown a zone stuck in down state

Monday, 16 October 2017

Installing openssh on Ubuntu 16.04 (with/without internet access)

Sunday, 15 October 2017

Avoid extra typing "grep -v grep"

Saturday, 14 October 2017

Workaround for "Error getting private key" error while starting realvnc on Linux

Sunday, 8 October 2017

Fixing NTP sync issues in RHEL 6

About ping timeouts in Solaris and Linux

Saturday, 7 October 2017

Installing oracle XE in CentOS 6

Friday, 6 October 2017

Changing a Solaris 10 zone's ip type from shared to exclusive

Using capture groups in grep in Linux

Tags

Flickr Imags