169529 – NTP rpm fails to install sometimes due to scriptlet error

Bug 169529 - NTP rpm fails to install sometimes due to scriptlet error

Summary: NTP rpm fails to install sometimes due to scriptlet error

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	ntp
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Miroslav Lichvar
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-09-29 12:53 UTC by James Olin Oden
Modified:	2007-11-30 22:07 UTC (History)
CC List:	0 users
Fixed In Version:	RHBA-2007-0414
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-06-11 18:51:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0414	0	normal	SHIPPED_LIVE	ntp bug fix update	2007-06-07 21:18:50 UTC

Description James Olin Oden 2005-09-29 12:53:54 UTC

Description of problem:
I am not going into the error scenario, but am simply going to explain the 
problem.  The scriptlets in ntp do not explicitly return 0, but do have code 
that can return an error (rc > 0 in shell speak).   One of the scriptlets (%
post if I recall) do something like:

   if some condition true
   then
         do lots of stuff
   fi

The problem with this is its the last bit of code at the end of the scriptlet 
and since doesn't look like this:

   if some condition true
   then
         do lots of stuff
   fi
   return 0

In the case that the condition is not true if returns rc > 0, and it being the 
last command executed the shell returns rc > 0 to rpm.   Glorious package 
failure ensues.

This exact same scenario (with different commands) is true of %pre, %post, %
preun and %postun.

Comment 1 Petr Raszyk 2005-10-03 12:42:14 UTC

ntp.spec is interpreted by 'bash' . If you write your own bash-script,
for instance:

-------------------------------
#!/bin/sh
if test -f my_file.txt
   then
        echo File exists.
fi
return 0
-------------------------------

It does not work. The 'bash' fails and print an error message: 

./my_script: line 7: return: can only `return' from a function or sourced script

If you do not explicitly call 'exit 0'  (instead of 'return 0'),the return code
of the script is automatically 0.
The same case if you have a bash-function, and if you do not call explicitly
'return 0' in this function, the return code of this function is automatically
0.

There is a example (bash-script using a function):
-------------------------------
#!/bin/sh
my_bash_function()
   {
     if test -f my_file.txt
        then
            echo File exists.
            return 0;   # <--- NOTE: we do not call expicitly   return 0;
     fi
     # <--- NOTE: we do not call expicitly   return 0;
   }

#Here starts our 'bash-main' and we call our function twice.

my_bash_function
if test $? -ne 0
   then
        echo Return code form function is not 0
   else
        echo Return code form function is 0
fi

my_bash_function
if test $? -ne 0
   then
        echo Return code form function is not 0
   else
        echo Return code form function is 0
fi

exit 0
-------------------------------

Comment 2 Petr Raszyk 2005-10-03 12:47:12 UTC

if  "do lots of stuff" fails (exit > 0), ntp.spec 'should not continue'.

Comment 3 James Olin Oden 2005-10-03 13:55:09 UTC

My bad on the return rather than exit (which is what I meant).  Also, I am 
afraid I jumped to a conclusion because I have seen scriptlets that 
people write which don't explicitly exit with 0 when they are done, and end up 
with the return code of the last command.  Somehow I assumed a failed if would 
do this, but in reality it was not, which leads me to believe that the return 
we got was in the body of the "if" in %post of ntp.   Problem is that no error 
checking is occuring in there, and I was not the one who saw the error (just 
the output, something like scriptlet failed with no other output).

So, I am going to close this bug.  I will try to reproduce the original problem 
which must of had something to do with drift files in an upgrade context, and 
see if I can pin down which line was failing.

Wait, I just figured out what is going on.  At the end of the scriptlet before 
the "fi" you have a check to see if the daemon was running and if so you 
restart it.  Only its coded like:

   [ $wasrunning -eq 0 ] && service ntpd start > /dev/null 2>&1

I just wrote script of the same form:

   if [ 1 = 1 ]
   then
      [ 1 = 2 ] && echo blah blah
   fi

Running this unlike my last example does cause the problem (i.e. I get a return 
code greater than zero.  

So the issue is that the ntp %post will fail in upgrade scenario where it 
detects that it needs to move the drift file, and ntp was not running.

Comment 4 Petr Raszyk 2005-10-03 15:22:01 UTC

You are right. Your example above (as script) terminates with != 0.
But if you write (after fi) the following:

echo blah blah > /dev/null

The exit code is 0. 
Please, can you put this in ntp.spec ? Can you try it in your
scenario ?
Whole script:
-----------------------
if [ 1 = 1 ]
then
   [ 1 = 2 ] && echo blah blah
fi

echo blah blah > /dev/null
-----------------------

Comment 5 James Olin Oden 2005-10-03 16:00:55 UTC

Here is the %post scriptlet in ntp:

   /sbin/chkconfig --add ntpd
   grep %{_sysconfdir}/ntp/drift %{_sysconfdir}/ntp.conf > /dev/null 2>&1
   olddrift=$?
   if [ "$1" -ge "1" -a $olddrift -eq 0 ]; then
     service ntpd status > /dev/null 2>&1
     wasrunning=$?
     # let ntp save the actual drift
     [ $wasrunning -eq 0 ] && service ntpd stop > /dev/null 2>&1
     # copy the driftfile to the new location
     [ -f %{_sysconfdir}/ntp/drift ] && cp %{_sysconfdir}/ntp/drift %
{_var}/lib/ntp/drift
     # change the path in the config file
     sed -e 's#%{_sysconfdir}/ntp/drift#%{_var}/lib/ntp/drift#g' %
{_sysconfdir}/ntp.conf > %{_sysconfdir}/ntp.conf.rpmupdate \
     && mv %{_sysconfdir}/ntp.conf.rpmupdate %{_sysconfdir}/ntp.conf
     # remove the temp file
     rm -f %{_sysconfdir}/ntp.conf.rpmupdate
     # start ntp if it was running previously
     [ $wasrunning -eq 0 ] && service ntpd start > /dev/null 2>&1
   fi

It has no echo at the end that you are suggesting, thus if it sees it needs to 
make the change to the drift file, but the ntp service was not running the 
scriptlet will return 1 to rpm in psm.c in the function runScript(), and it 
will produce an error, thus snipping the erasure of the old ntp package, which 
is not what I believe was the intent of the person who wrote this scriptlet.  

Really if: 

     [ $wasrunning -eq 0 ] && service ntpd start > /dev/null 2>&1

was changed to:

     [ $wasrunning -eq 0 ] && service ntpd start > /dev/null 2>&1 || :

then the original intent would be achieved.   

I can see other problems with this scriptlet though like the following two:

     sed -e 's#%{_sysconfdir}/ntp/drift#%{_var}/lib/ntp/drift#g' %
{_sysconfdir}/ntp.conf > %{_sysconfdir}/ntp.conf.rpmupdate \
     && mv %{_sysconfdir}/ntp.conf.rpmupdate %{_sysconfdir}/ntp.conf
     # remove the temp file
     rm -f %{_sysconfdir}/ntp.conf.rpmupdate

Now this is not what this bug report is about, but if the sed edit fails for 
some reason, then the mv will not occur, but the rm will.  The person that 
wrote this script was trying to apply KISS to keep the script simple, and thus 
not much error checking has occured.  I apreciate that, but I don't want the
ntp package to fail in %post just because I have disabled ntp during an 
upgrade, which is what will happen with these last two lines:

     [ $wasrunning -eq 0 ] && service ntpd start > /dev/null 2>&1
   fi

Furthermore, I know it actually does fail, I was just wrong originally with my 
diagnosis of why it failed (but not completely, because I suspected as I have 
seen many many times that a return code of the last command to run was being 
used as the scriptlets exit code, and indeed it was; I was only wrong about 
which actual command that was returning the error code).

Cheers...james

Comment 6 Petr Raszyk 2005-10-04 14:14:24 UTC

You are right. This is a bug in ntp* installation.
I will create a patch for this issue. I will use your
solution (it seeams better than 'echo > /dev/null' and
it costs not so much CPU).
Thanks for your investigation. I will use

 [ $wasrunning -eq 0 ] && service ntpd start > /dev/null 2>&1 || :

Comment 12 Red Hat Bugzilla 2007-06-11 18:51:58 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0414.html

Note You need to log in before you can comment on or make changes to this bug.