Bug 510636 - [Stratus 5.5 bug] %pre/%post scripts report failure for no apparent reason
Summary: [Stratus 5.5 bug] %pre/%post scripts report failure for no apparent reason
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: anaconda
Version: 5.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 5.5
Assignee: Chris Lumens
QA Contact: Alexander Todorov
URL:
Whiteboard:
Depends On:
Blocks: 533941
TreeView+ depends on / blocked
 
Reported: 2009-07-09 23:12 UTC by Rich Johnson
Modified: 2011-08-31 12:57 UTC (History)
13 users (show)

Fixed In Version: anaconda-11.1.2.198-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 08:00:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0194 0 normal SHIPPED_LIVE anaconda bug fix and enhancement update 2010-03-29 12:24:02 UTC

Description Rich Johnson 2009-07-09 23:12:26 UTC
Description of problem:
anaconda log reports:
ERROR   : Error code 1 encountered running a kickstart %pre/%post script

and leaves no indication wrt:
 - the actual failure
 - the script triggering the failure
 - the triggering line in the kickstart file.


Version-Release number of selected component (if applicable):
anaconda-11.1.2.183-1

Example script:
%post 
#
# Eject all known CDs
#
# Attempt to read a block from each drive:
#   failure indicates no media, so no need to eject
#   success indicates media present, so eject it
#
# Because read is a built-in, we do the redirection on the loop's output.
#
DRIVES=/dev/scd*

for dev in ${DRIVES};
do
    read -n 512 <${dev} scratch || continue
    eject ${dev}
done > /dev/null 2>&1;
%end

 
Actual results:
- anaconda log contains the message
  ERROR   : Error code 1 encountered running a kickstart %pre/%post script

- vt3 includes the additional info:
  /tmp/ks-script-eGn27J: line 17: fg: no job control

- the script is 16 lines long
- file /tmp/ks-script-eGn27J has been removed.
- the script contains only 16 lines
- all indicates are that the script executed successfully

Expected results:
- no error reported in this case.
- anaconda log includes all script output and all anaconda output, including stderr
- if an error does occur, then either:
  a) the script should be left in situ so the situation can be investigated from vt2.
  b) the line# in the kickstart file is reported

Additional information:
This symptom is observed for scripts which operated correctly when installing RHEL 5.2

Comment 1 Chris Lumens 2009-11-12 15:16:26 UTC
This seems reasonable, and should definitely be done on master as well as rhel5-branch.  I'll work on a patch and send a link when I've got something.

Note that line 17 is probably a blank line getting added to the end of the script when we write it out to its own file.  You can also add --logfile=<whatever> to your %pre script and have the real error message written there.  For instance:

%pre --logfile=/tmp/junk
blargle --whatever 2>&1
%end

# cat /tmp/junk
/tmp/ks-script-f6GhmE: line 1: blargle: command not found

From the code on the master branch at least, it appears we do write stdout/stderr to /dev/tty3 if you do not specify --logfile.  However since /tmp/anaconda.log is streamed to /dev/tty3 as well, the scriptlet output scrolls off very quickly and is not preserved anywhere.

So we've got most of what you're asking for already, it's just not working right.

Comment 2 Chris Lumens 2009-11-12 16:31:55 UTC
Here's a thread for talking about the patch to master:

    https://www.redhat.com/archives/anaconda-devel-list/2009-November/msg00253.html

Once we come to a decision there, I can work on porting to rhel5.

Comment 3 Andrius Benokraitis 2009-11-12 17:09:26 UTC
Chris - thanks for the feedback - please let me know if there is anything you need from Stratus in the meantime.

Comment 4 RHEL Program Management 2009-11-16 14:51:48 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Chris Ward 2009-11-18 09:12:50 UTC
@Status

We need to confirm that there is third-party commitment to 
test for the resolution of this request during the RHEL 5.5 
Beta Test Phase before we can approve it for acceptance 
into the release.

RHEL 5.5 Beta Test Phase is expected to begin around February
2010.

In order to avoid any unnecessary delays, please post a 
confirmation as soon as possible, including the contact 
information for testing engineers.

Any additional information about alternative testing variations we 
could use to reproduce this issue in-house would be appreciated.

Comment 6 Rich Johnson 2009-11-18 11:23:19 UTC
I am signed up to test this.  Tests will include:

- the case presented above
- forced internal failure.
- internal redirection to file with non-existent path.
- 4 script classes  %pre, %post, %pre --nochroot, %post --nochroot
- --logfile=<some file>
- --logfile=<some bad path>

I assume that "--logfile=<somefile>" and "--nochroot --logfile=/mnt/sysimage/<somefile>" 
 would be equivalent?

Comment 8 Chris Lumens 2009-11-18 14:52:10 UTC
> - --logfile=<some bad path>

This looks like it's going to result in a traceback, but then it always would have resulted in a traceback.  I'll work on getting something added to my patch to take care of this case, but I'm not sure what the proper behavior should be.

> I assume that "--logfile=<somefile>" and "--nochroot
> --logfile=/mnt/sysimage/<somefile>" 
>  would be equivalent?  

Yes, they should result in the log file being written to the same location.

Comment 11 Chris Ward 2010-02-11 10:11:34 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 12 Rich Johnson 2010-02-18 17:49:06 UTC
Test results:  The cause of the failure report is now apparent

Additional info:
 - most of the scriptlets are left intact in /tmp along with a log (unless the --logfile option's been used to rename the log) 
 - of my 8 scriptlets.  Both %pre and 3 of the 6 %post scripts remained.  It is unclear why the others were removed...N.B. the example given above is one of those that was removed.
 - each line a scriptlet writes to stdout is reported in /tmp/anaconda.log as an ERROR

 - I suspect there might be a bug in the parser used to extract the scriptlets.
 - The terminal %end from each %pre/post scriptlet is included in the scriptlet files.
 - interstitial commentary and white space (between %end and the following %post/%pre) are also appended to the scriptlet file.
 - The "ERROR   : Error code 1 encountered running a kickstart %pre/%post script" message is generated when the interpreter (/bin/sh) trips over that "%end"
 
Avoidances:
1.  eliminate %end - I don't like this option because it is recommended usage according to the online anaconda docs
2.  insert an explicit "exit 0" before each %end.

N.B.:
- anaconda blows up if a %end is used to terminate the %packages list--also a practice recommended by the online anaconda docs.

Comment 13 Chris Lumens 2010-02-18 18:12:22 UTC
>  - of my 8 scriptlets.  Both %pre and 3 of the 6 %post scripts remained.  It is
> unclear why the others were removed...N.B. the example given above is one of
> those that was removed.

This is strange.  There's no longer an os.unlink call after scripts have been run.  I'd be curious to know what the removed scripts have in common.

>  - I suspect there might be a bug in the parser used to extract the scriptlets.
>  - The terminal %end from each %pre/post scriptlet is included in the scriptlet
> files.
>  - interstitial commentary and white space (between %end and the following
> %post/%pre) are also appended to the scriptlet file.
>  - The "ERROR   : Error code 1 encountered running a kickstart %pre/%post
> script" message is generated when the interpreter (/bin/sh) trips over that
> "%end"
> 
> Avoidances:
> 1.  eliminate %end - I don't like this option because it is recommended usage
> according to the online anaconda docs
> 2.  insert an explicit "exit 0" before each %end.
> 
> N.B.:
> - anaconda blows up if a %end is used to terminate the %packages list--also a
> practice recommended by the online anaconda docs.    

The kickstart docs on the Fedora wiki (which is what I assume you mean by "online anaconda docs") are for the absolute latest version of kickstart syntax only.  %end was introduced after RHEL5, so you should remove it.  In RHEL5, scripts and the packages section have no terminating token - they are only terminated by the start of another section, or by the end of the file.  I suspect removing your %ends will clear up all these issues.

Comment 14 Rich Johnson 2010-02-18 20:13:33 UTC
(In reply to comment #13)
[...snip...]
> The kickstart docs on the Fedora wiki (which is what I assume you mean by
> "online anaconda docs") are for the absolute latest version of kickstart syntax
> only.  %end was introduced after RHEL5, so you should remove it.  In RHEL5,
> scripts and the packages section have no terminating token - they are only
> terminated by the start of another section, or by the end of the file.  I
> suspect removing your %ends will clear up all these issues.    

Ah, version skew.  One of the pitfalls of google.  

And yes, removing the %ends eliminates the failure reports.

Comment 15 Rich Johnson 2010-02-18 20:19:55 UTC
(In reply to comment #13)
> >  - of my 8 scriptlets.  Both %pre and 3 of the 6 %post scripts remained.  It is
> > unclear why the others were removed...N.B. the example given above is one of
> > those that was removed.
> 
> This is strange.  There's no longer an os.unlink call after scripts have been
> run.  I'd be curious to know what the removed scripts have in common.
> 
[...snip...]

Found them!--it's all about the chroot.

The %pre scripts and the three --nochroot'd %post scripts reside in /tmp.
The three chroot'd %post scripts reside in /mnt/sysimage/tmp.  Not quite what I expected, but in hindsight it makes sense.

Comment 16 Alexander Todorov 2010-02-22 18:04:31 UTC
Hi Rich,
it looks like your test results wore successful. Do you consider this issue fixed or not?

Thanks.

Comment 17 Rich Johnson 2010-02-22 18:17:49 UTC
Stratus considers this fixed.

Comment 18 Alexander Todorov 2010-02-23 09:43:11 UTC
VERIFIED, see comments #15 and #17

Comment 21 errata-xmlrpc 2010-03-30 08:00:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0194.html


Note You need to log in before you can comment on or make changes to this bug.