Sunday 8 October 2017

Fixing NTP sync issues in RHEL 6

In this article I'll be exploring two distinct issues I faced with ntp sync wherein the servers were not able to properly sync their time with ntp servers.

We observed a high offset when we checked the ntpq -p output.

I did a couple of stop-start operations followed by troubleshooting procedure outlined in RedHat KB articles 35640 and 64868 whose logs I'm sharing below:

[ssuri@usporiainfrar00:~]' $ sudo ntpq
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+ntp.nova.org   .GPS.            1 u   43   64  377    0.189  309.823 175.931
*mtntime.emrets. .GPS.            1 u   44   64  377   13.401  309.833 174.006
xusporz-infrac15 10.16.64.15      3 u   43   64  377    0.208  286.078 169.984
xgblonz-infrac07 10.16.64.15      3 u   39   64  377   91.365  287.473 167.180
ntpq> as

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 57822  941a   yes   yes  none candidate    sys_peer  1
  2 57823  961a   yes   yes  none  sys.peer    sys_peer  1
  3 57824  9124   yes   yes  none falsetick   reachable  2
  4 57825  9124   yes   yes  none falsetick   reachable  2
ntpq> rv 57822
associd=57822 status=941a conf, reach, sel_candidate, 1 event, sys_peer,
srcadr=ntp.nova.org, srcport=123, dstadr=10.16.216.184, dstport=123,
leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.290,
refid=GPS, reftime=dd6328c0.a7684541  Wed, Sep 13 2017  3:47:12.653,
rec=dd6328c3.fc501541  Wed, Sep 13 2017  3:47:15.985, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=29, flash=00 ok,
keyid=0, offset=504.129, delay=0.171, dispersion=1.525, jitter=183.751,
xleave=0.028,
filtdelay=     0.19    0.17    0.19    0.22    0.21    0.19    0.20    0.21,
filtoffset=  553.93  504.13  454.51  405.18  356.99  309.82  250.70  191.10,
filtdisp=      0.00    0.96    1.94    2.93    3.92    4.91    5.88    6.87
ntpq> q

[ssuri@usporiainfrar00:~]' $ sudo ntpdate -u ntp.nova.org
13 Sep 03:51:50 ntpdate[87098]: step time server 10.16.64.124 offset 0.757074 sec
[ssuri@usporiainfrar00:~]' $ sudo ntpdate -d ntp.nova.org
13 Sep 03:51:59 ntpdate[87462]: ntpdate 4.2.6p5@1.2349-o' Tue May  3 15:12:51 UTC 2016 (1)
Looking for host ntp.nova.org and service ntp
host found : ntp.nova.org
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
transmit(10.16.64.124)
receive(10.16.64.124)
server 10.16.64.124, port 123
stratum 1, precision -20, leap 00, trust 000
refid [GPS], delay 0.02580, dispersion 0.00000
transmitted 4, in filter 4
reference time:    dd6329d6.e0896c6a  Wed, Sep 13 2017  3:51:50.877
originate timestamp: dd6329df.4e7f4739  Wed, Sep 13 2017  3:51:59.306
transmit timestamp:  dd6329df.4d0d781a  Wed, Sep 13 2017  3:51:59.300
filter delay:  0.02585  0.02580  0.02580  0.02585
         0.00000  0.00000  0.00000  0.00000
filter offset: 0.005469 0.005474 0.005467 0.005453
         0.000000 0.000000 0.000000 0.000000
delay 0.02580, dispersion 0.00000
offset 0.005474

13 Sep 03:51:59 ntpdate[87462]: adjust time server 10.16.64.124 offset 0.005474 sec


We also did some comparisons with the system hardware clock and system time set by ntp by running the following command snippet:

 s(){ printf "\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n";}; s; printf '`date`:\t '; date; printf '`sudo hwclock`: ';sudo hwclock; s; printf '`sudo ntpq -pn`:\n\n'; sudo ntpq -pn; s; printf '`sudo ntpq -c as`:\n'; sudo ntpq -c as; for id in $(sudo ntpq -c as | awk '/^ ./{print $2}'); do s; printf "\`sudo ntpq -c \"rv $id\"\`:\n\n"; sudo ntpq -c "rv $id"; done

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `date`: Wed Sep 13 05:23:47 UTC 2017 `sudo hwclock`: Wed 13 Sep 2017 05:23:48 AM UTC -0.343942 seconds ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -pn`: remote refid st t when poll reach delay offset jitter ============================================================================== *10.16.64.124 .GPS. 1 u 14 64 377 0.160 428.657 142.242 +10.20.64.124 .GPS. 1 u 40 64 377 13.456 416.054 136.215 +10.16.64.11 10.16.64.15 3 u 30 64 377 0.214 411.056 141.797 +10.24.64.11 10.16.64.15 3 u 28 64 377 91.317 419.916 141.874 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c as`: ind assid status conf reach auth condition last_event cnt =========================================================== 1 33492 961a yes yes none sys.peer sys_peer 1 2 33493 943a yes yes none candidate sys_peer 3 3 33494 9414 yes yes none candidate reachable 1 4 33495 9414 yes yes none candidate reachable 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33492"`: associd=33492 status=961a conf, reach, sel_sys.peer, 1 event, sys_peer, srcadr=ntp.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.305, refid=GPS, reftime=dd633f51.48ffa52c Wed, Sep 13 2017 5:23:29.285, rec=dd633f55.f6c4f89f Wed, Sep 13 2017 5:23:33.963, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=50, flash=00 ok, keyid=0, offset=428.657, delay=0.160, dispersion=0.980, jitter=142.242, xleave=0.033, filtdelay= 0.16 0.20 0.18 0.15 0.25 0.23 0.19 0.20, filtoffset= 428.66 397.34 365.10 333.82 302.01 269.84 237.60 205.38, filtdisp= 0.00 1.01 2.04 3.05 4.07 5.10 6.14 7.17 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33493"`: associd=33493 status=943a conf, reach, sel_candidate, 3 events, sys_peer, srcadr=mtntime.emrets.net, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.458, refid=GPS, reftime=dd633f2b.eeaa5a6e Wed, Sep 13 2017 5:22:51.932, rec=dd633f3b.031f6da1 Wed, Sep 13 2017 5:23:07.012, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=21, flash=00 ok, keyid=0, offset=416.054, delay=13.456, dispersion=0.947, jitter=136.215, xleave=0.026, filtdelay= 13.46 13.42 13.43 13.45 13.42 13.50 13.50 13.47, filtoffset= 416.05 385.25 355.33 324.97 294.15 264.31 232.99 202.63, filtdisp= 0.00 0.99 1.95 2.93 3.92 4.88 5.88 6.86 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33494"`: associd=33494 status=9414 conf, reach, sel_candidate, 1 event, reachable, srcadr=usstlz-pinfdc15.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=3, precision=-6, rootdelay=62.500, rootdisp=121.170, refid=10.16.64.15, reftime=dd633ccb.357101ca Wed, Sep 13 2017 5:12:43.208, rec=dd633f45.f8454467 Wed, Sep 13 2017 5:23:17.969, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=32, flash=00 ok, keyid=0, offset=411.056, delay=0.214, dispersion=16.513, jitter=141.797, xleave=0.018, filtdelay= 0.21 0.27 0.27 0.22 0.23 0.26 0.27 0.26, filtoffset= 411.06 377.74 348.50 307.74 276.47 246.26 226.04 197.64, filtdisp= 15.63 16.60 17.61 18.57 19.57 20.58 21.57 22.54 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `sudo ntpq -c "rv 33495"`: associd=33495 status=9414 conf, reach, sel_candidate, 1 event, reachable, srcadr=gblonz-pinfdc07.emrsn.org, srcport=123, dstadr=10.16.216.184, dstport=123, leap=00, stratum=3, precision=-6, rootdelay=109.375, rootdisp=122.223, refid=10.16.64.15, reftime=dd633c4d.a4ae07e5 Wed, Sep 13 2017 5:10:37.643, rec=dd633f47.0cce5653 Wed, Sep 13 2017 5:23:19.050, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=33, flash=00 ok, keyid=0, offset=419.916, delay=91.317, dispersion=16.524, jitter=141.874, xleave=0.010, filtdelay= 91.32 91.61 91.40 91.13 91.36 92.86 91.49 91.38, filtoffset= 419.92 388.51 354.44 316.21 298.64 270.20 226.80 195.30, filtdisp= 15.63 16.62 17.62 18.61 19.60 20.58 21.58 22.57


 We disabled slew mode for ntp as shown below by updating the OPTIONS parameter in etc/sysconfig/ntpd  file:

# grep -i ntpd ps ntp 1082 0.0 0.0 26520 1980 ? Ss 03:09 0:00 ntpd -x -u ntp:ntp -p /var/run/ntpd.pid -g -4 

 # grep -v "^#" /etc/sysconfig/ntpd 
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid -g -4" 

After that we restarted the service:

# ps -ef | grep -E ^ntp 
ntp 59539 1 0 04:48 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g -4 


After this we did a stop and start of the service a few times and a manual update with the ntp server to finally fix the high offset problem.

sudo ntpdate -u ntp.nova.org 
13 Sep 02:20:49 ntpdate[19667]: step time server 10.16.64.124 offset -42.674761 sec 

Just a note that apart from the troubleshooting we did here to fix the high offset, this problem could also be caused by network latency so that is worth checking out.


In the second scenario we were again facing high offsets on one of our linux servers but in this case the time displayed by the hardware clock was accurate.

We did a few stop, start and restart operations along with a manual update with the ntp server but with no success.

[ssuri@usporinfradr00:~] $ sudo nptdate -u ntpls.nova.org
  7 Oct 09:13:38 ntpdate[14262]: step time server 10.20.64.124 offset -173.793480 sec
 [ssuri@usporinfradr00:~] $ sudo ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
 *ntp.nova.org   .GPS.            1 u 4971   64    1    0.087  -173811   0.000
  10.16.64.124    .INIT.          16 u    -   64    0    0.000    0.000   0.000
  usporz-infrac21 10.16.64.15      3 u 4971   64    1    0.183  -173819   0.000
  gblonz-infrac07 10.16.64.15      3 u 4971   64    1  101.189  -173806   0.000


The system hardware clock was accurate though.

[ssuri@usporinfradr00:~] $ sudo hwclock -r
 Sat 07 Oct 2017 09:18:34 AM UTC  -0.485089 seconds
 [ssuri@usporinfradr00:~] $ date
 Sat Oct  7 09:16:05 UTC 2017


We then tried to sync the system and hardware clocks by executing hwclock -s but to our surprise now the hardware clock was also out of sync.

[ssuri@usporinfradr00:~] $ sudo hwclock -s
 [ssuri@usporinfradr00:~] $ date
 Sat Oct  7 09:20:18 UTC 2017

[root@usporinfradr00 ~]# hwclock -r
 Sat 07 Oct 2017 09:20:38 AM UTC  -0.265782 seconds

We did a manual update with our ntp server again followed by a restart of the service.

[root@usporinfradr00 ~]# sudo ntpdate -u ntpls.nova.org
  7 Oct 09:18:05 ntpdate[88723]: step time server 10.20.64.124 offset -173.991557 sec
 [root@usporinfradr00 ~]# sudo ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
  ntpserver.nova .GPS.            1 u 4971   64   37    0.076   82.388 123032.
  10.16.64.124    .INIT.          16 u    -   64    0    0.000    0.000   0.000
  usporz-infrac21 10.16.64.15      3 u 4971   64   37    0.161   70.839 123031.
  gblonz-infrac07 10.16.64.15      3 u 4971   64   37  101.189  -173806 150569.

 [root@usporinfradr00 ~]# sudo ntpq -pservice ntpd restart
 Shutting down ntpd: [  OK  ]

 Starting ntpd: [  OK  ]


This finally corrected the offset and brought the system time in sync with the hardware clock as well.

[root@usporinfradr00 ~]# date
 Sat Oct  7 09:18:56 UTC 2017
 [root@usporinfradr00 ~]#  hwclock -r
 Sat 07 Oct 2017 09:19:26 AM UTC  -0.031508 seconds
 [root@usporinfradr00 ~]# date
 Sat Oct  7 09:19:30 UTC 2017
 [root@usporinfradr00 ~]# sudo ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
 *ntpls.nova.org .GPS.            1 u   14   64    3    0.081    9.357  32.137
  10.16.64.124    .INIT.          16 u    -   64    0    0.000    0.000   0.000
  usporz-infrac21 10.16.64.15      3 u    9   64    3    0.170   10.491  32.264
  gblonz-infrac07 10.16.64.15      3 u   10   64    3  101.359   22.452  32.468
 [root@usporinfradr00 ~]# 
 [root@usporinfradr00 ~]# date
 Sat Oct  7 09:20:21 UTC 2017

No comments:

Post a Comment

Using capture groups in grep in Linux

Introduction Let me start by saying that this article isn't about capture groups in grep per se. What we are going to do here with gr...