Bug 529070 - rpm packaging problems (cannot reinstall correctly)
Summary: rpm packaging problems (cannot reinstall correctly)
Status: MODIFIED
Alias: None
Product: Dogtag Certificate System
Classification: Retired
Component: CA
Version: 1.2
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Matthew Harmsen
QA Contact: Ben Levenson
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 553072 553074 553075 553076 553078 freeipa20
TreeView+ depends on / blocked
 
Reported: 2009-10-14 20:30 UTC by John Dennis
Modified: 2015-01-06 01:16 UTC (History)
5 users (show)

(edit)
Clone Of:
: 553072 553074 553075 553076 553078 (view as bug list)
(edit)
Last Closed:


Attachments (Terms of Use)
base diffs (163.88 KB, patch)
2009-12-14 21:14 UTC, Matthew Harmsen
no flags Details | Diff
dogtag diffs (5.53 KB, patch)
2009-12-14 21:15 UTC, Matthew Harmsen
no flags Details | Diff
base file inventory (682 bytes, text/plain)
2009-12-17 02:01 UTC, Matthew Harmsen
no flags Details
base diffs (172.88 KB, patch)
2009-12-17 02:02 UTC, Matthew Harmsen
no flags Details | Diff
dogtag file inventory (445 bytes, text/plain)
2009-12-17 02:03 UTC, Matthew Harmsen
no flags Details
dogtag diffs (12.66 KB, patch)
2009-12-17 02:03 UTC, Matthew Harmsen
no flags Details | Diff

Description John Dennis 2009-10-14 20:30:50 UTC
You cannot remove the pki RPM's with rpm -e, it leaves junk behind (aside from what the uninstall warns about) which prohibits a successful reinstall. Specifically the file /etc/inid.d/pki-ca is generated, it's not listed in the %files section of the rpm (even with %ghost) and is not removed when rpm -e is run. The existence of the /etc/init.d/pki-ca file is used during the %post stage as a condition to determine if the rest of the operations in /usr/bin/pkicreate should execute. If the init script is present (it will be if pki had ever been installed previously because it's never removed when the package is removed) then the pkicreate script prematurely exits leaving the system in an undefined and unworkable state.

My workaround fix was to manually remove /etc/init.d/pki-ca init script, e.g.:

rpm -e `rpm -qa pki-\*`
# If the following is not done then the subsequent yum install fails
rm /etc/init.d/pki-ca

and then do a

yum install pki-ca pki-silent

Obvious packaging problems:
---------------------------

An RPM must own (e.g. list in it's %files) every file it installs, otherwise you end up with orphaned files and the inability of rpm to validate the set of installed files. Orphaned files can create numerous other problems, one example is seen above.

Generating files during install is generally frowned upon, however it's sometimes necessary, in most circumstances those files must still be listed in the %files using a %ghost directive. If you don't do this the files are neither removed during an uninstall nor can they be validated after installation. There are rare cases where a %post generated file will need to be removed in the %preun stage.

Init scripts must be reviewable and verifiable, as such an init script must be contained in the package. rpm -V must be able to validate all init scripts. It is extremely doubtful a post install machine generated init script would ever get approved for Fedora or RHEL.

It is never permissible to modify the file permissions of an installed file from what it is in the %files section, especially to add execute permissions.

It is never permissible for a rpm install to prompt the user for input.

It is never permissible when installing an rpm to start any service as a consequence of installation.

Most of what occurs in /usr/share/pki/ca/setup/postinstall and /usr/bin/pkicreate should never occur during an rpm install.

Comment 1 Chandrasekar Kannan 2009-10-14 22:56:23 UTC
(In reply to comment #0)

> My workaround fix was to manually remove /etc/init.d/pki-ca init script, e.g.:
> 
> rpm -e `rpm -qa pki-\*`
> # If the following is not done then the subsequent yum install fails
> rm /etc/init.d/pki-ca
> 
> and then do a
> 
> yum install pki-ca pki-silent
> 

a better work around is documented here if you are interested.
http://www.redhat.com/docs/manuals/cert-system/8.0/cli/html/Create_and_Remove_Instance_Tools.html#Create_and_Remove_Instance_Tools-pkicreate

should be as simple as "export DONT_RUN_PKICREATE=1" before doing yum install pki-ca

Comment 2 John Dennis 2009-10-15 02:25:02 UTC
re comment #1

I'm not sure that solves the problem, all it does is move the problem. pkicreate still me be run successfully to completion at some point in time even if it's not part of the rpm install, there is no guarantee the instance was ever successfully created previously. The existence of files left over from the previous install causes the pkicreate script to abort prematurely without actually creating a viable instance.

Even if exporting the environment variable DONT_RUN_PKICREATE prior to an rpm install produced a working installation it's not how rpm installs and uninstalls are supposed to work.

Comment 3 Chandrasekar Kannan 2009-10-15 02:52:57 UTC
(In reply to comment #2)
> re comment #1
> 
> I'm not sure that solves the problem

Yep. That's why its called a "work around".

Comment 7 Matthew Harmsen 2009-12-14 21:14:28 UTC
Created attachment 378361 [details]
base diffs

Comment 8 Matthew Harmsen 2009-12-14 21:15:01 UTC
Created attachment 378362 [details]
dogtag diffs

Comment 9 Ade Lee 2009-12-15 20:52:12 UTC
Some initial comments on code review of pki-cad:

1. The Fedora init script page you specify says:

The start, stop, restart, force-reload, and status actions shall be supported by all init scripts; the reload and the try-restart actions are optional. Other init-script actions may be defined by the init script.  Looks like we need force-reload.

2.  I looked at the exit codes at various points in the script.  I may have missed some, but here are the cases I identified.

- pki-ca not installed - status 4 (unknown) , else 5 (not installed) 
  -- this is OK
- no instances (no registry) - status 4 else 5 (not installed) 
  -- this is OK
- script directory deleted - status 4 else 1
  -- this is OK
- user is not root - status 4 else 1  
  -- this appears to be incorrect -- should be 4 (insufficient privilege) for 
     start/stop etc. 
  -- Also, it appears one needs to be root to do a status.  Isn't this bad?
- excess arguments - according to code, returns 2 for all 
  -- This is presumably excess arguments on start/stop.
     It looks to me like this is not caught - excess arguments are simply (and 
     silently) ignored instead of returning 2.  That said, it might be nice to 
     take a list of instances to start/stop/status.
- invalid arguments - this (according to code) is supposed to return 2 
   -- this is the same as the case above. I don't believe that there are 
      invalid arguments.
- solaris not root -> status 4, else 1 (unknown) 
  -- this is OK
- unsupported os - status 4, else 1 (unknown) 
  -- this is OK
- invalid instance - status 4 else 1 (which is unknown)
  -- I wonder whether 5 is more appropriate for start/stop, just as in case of 
     no instances.
- not configured - status 4 else 6 
  -- as we discussed, I do not believe status should return 4 here - rather it 
     should return 0 or maybe an application specific code)
- server.xml not exist - status 4, else 1 (unknown) 
  -- this is OK
- invalid command -> usage ,  -status 4, else 1 (unknown)
  -- I think this is incorrect.  It should return 3 - unimplemented feature
- configured but not restarted -  status 4, else 1 (unknown)
  -- I'm going to argue for an application specific code here , rather than 
     unknown
- pidfile exists but is empty - all 1 
  -- I think 1 is ok for start/stop; 4 is better for status
- status definitions not found -  status 4, else 1 (unknown)
  --  this is OK
- instance stopped - status 7, else 3  
  -- (you actually ignore status return codes in start() - so this is OK
- dead , but pid file exists - all 1
  -- this is OK
- running and configured - 0 
  -- this is OK

many instances (start): 1 fails --> return this failure code
                more than one fails -> 1
                1 or more config (and no errors) -> 6
                no installed - 5
                only one instance -- return that instance return code

many instances (stop): 1 fails -> return that code
                       more than 1 fails : 1
                       no mention of config errors - can they occur?
                       no instances 5

many instances (status) 1 fails -> return that code
                        more than 1 fails: 4
                        no installed 4

The above behaviour for multiple instances is OK.

3. As I mentioned, it would be nice to be able to add a list of instances to start/ stop etc.

Comment 10 Matthew Harmsen 2009-12-16 00:30:20 UTC
To address issue (1) from comment #9 above:

default_error=0
command="$1"
pki_instance="$2"
case "${command}" in
    start|stop|restart|condrestart|force-restart|try-restart)
        # * 1 generic or unspecified error (current practice)
        default_error=1
        ;;
    reload)
        default_error=3
        ;;
    status)
        # * 4 program or service status is unknown
        default_error=4
        ;;
    *)
        # * 2 invalid or excess argument(s)
        default_error=2
esac
...
# See how we were called.
case "${command}" in
    start|stop|restart|status)
        ${command}
        exit $?
        ;;
    condrestart|force-restart|try-restart)
        [ ! -f ${lockfile} ] || restart
        exit $?
        ;;
    reload)
        exit ${default_error}
        ;;
    *)
        echo "Usage: ${PKI_INIT_SCRIPT} {start|stop|restart|condrestart|force-restart|try-restart|reload|status} [instance-name]"
        exit ${default_error}
esac

Comment 11 Matthew Harmsen 2009-12-16 02:35:47 UTC
To address issue (3) from comment #9 above:

After discussing this with nkinder, he confirmed that this feature is not offered by 389, and his feeling was that this would be a seldom used feature.

Consequently, I suggested that a bug could be created requesting this as a future feature enhancement.

Comment 13 Matthew Harmsen 2009-12-16 03:17:47 UTC
To address issue (2) from comment #9 above:

- user is not root - status 4 else 1  
  -- this appears to be incorrect -- should be 4 (insufficient privilege) for 
     start/stop etc.
  -- Also, it appears one needs to be root to do a status.  Isn't this bad?
Spoke with nkinder of 389 on this, and he stated that it is probably more or less optional --- however, I will try to see if I can allow "status" to be invoked as a non-root user. NOTE:  THIS WORK IS IN PROGRESS . . .


# Obtain the operating system upon which this script is being executed
OS=`uname -s`
ARCHITECTURE=""

# This script must be run as root (for all actions other than 'status')!
RV=0
if [ ${OS} = "Linux" ] ; then
        PKI_INIT_SCRIPT="/sbin/service ${PKI_PROCESS}"
        if [ "${command}" != "status" ]; then
                if [ `id -u` -ne 0 ] ; then
                        echo "Must be 'root' to execute '$0'!"
                        exit 4
                fi
        fi
        ARCHITECTURE=`uname -i`
elif [ ${OS} = "SunOS" ] ; then
        PKI_INIT_SCRIPT="/etc/init.d/${PKI_PROCESS}"
        if [ "${command}" != "status" ]; then
                if [ `/usr/xpg4/bin/id -u` -ne 0 ] ; then
                        echo "Must be 'root' to execute '$0'!"
                        exit 4
                fi
        fi
        ARCHITECTURE=`uname -p`
        if      [ "${ARCHITECTURE}" = "sparc" ] &&
                [ -d "/usr/lib/sparcv9/" ] ; then
                ARCHITECTURE="sparcv9"
        fi
else
        echo "Unsupported OS '${OS}'!"
        exit ${default_error}
fi

==============================================================================

- excess arguments - according to code, returns 2 for all 
  -- This is presumably excess arguments on start/stop.
     It looks to me like this is not caught - excess arguments are simply (and 
     silently) ignored instead of returning 2.  That said, it might be nice to 
     take a list of instances to start/stop/status.
As for the "list of instances", see Comment #11 above -- that being said, I did print out a list after the usage statement.

- invalid arguments - this (according to code) is supposed to return 2 
   -- this is the same as the case above. I don't believe that there are 
      invalid arguments.

- invalid instance - status 4 else 1 (which is unknown)
  -- I wonder whether 5 is more appropriate for start/stop, just as in case of 
     no instances.

- invalid command -> usage ,  -status 4, else 1 (unknown)
  -- I think this is incorrect.  It should return 3 - unimplemented feature


usage()
{
    echo -n "Usage: ${PKI_INIT_SCRIPT} "
    echo -n "{start"
    echo -n "|stop"
    echo -n "|restart"
    echo -n "|condrestart"
    echo -n "|force-restart"
    echo -n "|try-restart"
    echo -n "|reload"
    echo -n "|status} "
    echo -n "[instance-name]"
    echo
    echo
}

list_instances()
{
    echo
    for FILE in `/bin/ls -1 ${PKI_REGISTRY}/* 2>/dev/null`; do
        echo "    ${FILE}"
    done
    echo
}

# Check arguments
if [ $# -lt 1 ] ; then
    # * 3 unimplemented feature (for example, "reload")
    #     [insufficient arguments]
    echo "$0:  Insufficient arguments!"
    echo
    usage
    echo "where valid instance names include:"
    list_instances
    exit 3
elif [ ${default_error} -eq 2 ] ; then
    # * 2 invalid argument
    echo "$0:  Invalid arguments!"
    echo
    usage
    echo "where valid instance names include:"
    list_instances
    exit 2
elif [ $# -gt 2 ] ; then
    echo "$0:  Excess arguments!"
    echo
    usage
    echo "where valid instance names include:"
    list_instances
    if [ "${command}" != "status" ]; then
        # * 2 excess arguments
        exit 2
    else
        # * 4 program or service status is unknown
        exit 4
    fi
fi

# If an "instance" was supplied, check that it is a "valid" instance
if [ -n "${pki_instance}" ]; then
    if [ "${PKI_REGISTRY}/${pki_instance}" != "${PKI_REGISTRY_ENTRIES}" ]; then
        echo -n "${pki_instance} is an invalid '${PKI_TYPE}' instance"
        echo_failure
        echo
        if [ "${command}" != "status" ]; then
            # * 5 program is not installed
            exit 5
        else
            # * 4 program or service status is unknown
            exit 4
        fi
    fi
fi
...
# See how we were called.
case "${command}" in
    start|stop|restart|status)
        ${command}
        exit $?
        ;;
    condrestart|force-restart|try-restart)
        [ ! -f ${lockfile} ] || restart
        exit $?
        ;;
    reload)
        exit ${default_error}
        ;;
    *)
        # * 3 unimplemented feature (for example, "reload")
        #     [invalid command - should never be reached]
        echo
        usage
        echo "where valid instance names include:"
        list_instances
        exit 3
        ;;
esac

==============================================================================

- pidfile exists but is empty - all 1 
  -- I think 1 is ok for start/stop; 4 is better for status

display_instance_status()
{
    rv=0

    if [ -f ${pidfile} ] ; then
        pid=`cat ${pidfile}`
        if [ "${pid}" == "" ] ; then
            echo "${PKI_INSTANCE_ID} pid file exists but is empty"
            if [ "${command}" != "status" ]; then
                # * 1 generic or unspecified error (current practice)
                rv=1
            else
                # * 4 program or service status is unknown
                rv=4
            fi
            ...

==============================================================================

- not configured - status 4 else 6 
  -- as we discussed, I do not believe status should return 4 here - rather it 
     should return 0 or maybe an application specific code)

- configured but not restarted -  status 4, else 1 (unknown)
  -- I'm going to argue for an application specific code here , rather than 
     unknown

many instances (start): 1 fails --> return this failure code
                more than one fails -> 1
                1 or more config (and no errors) -> 6
                no installed - 5
                only one instance -- return that instance return code

In speaking with nkinder of 389 on this subject, he stated that he would steer clear of using "application specific codes" as these would not generally mean anything to anyone that would care about error code returns.

As for our handling of "configuration" errors, nkinder suggested testing "rebooting" the system to verify that other tools such as chkconfig used during the boot process would flag an "error" because of a non-zero return value (this needs to be tested and warrants some further discussion).  His belief is that "configuration" means something different to most other operating system applications.

==============================================================================

many instances (stop): 1 fails -> return that code
                       more than 1 fails : 1
                       no mention of config errors - can they occur?
                       no instances 5

In speaking with nkinder of 389, he agreed that ignoring "configuration" errors during a "stop" process seemed logically correct.

Comment 15 Ade Lee 2009-12-16 04:24:36 UTC
As we discussed, the code in pkicreate that creates an init script for a particular instance should probably be removed for linux.  Such a script - created in the location it is used, would never be permitted to run from an selinux point of view.  And its confusing for the user.

Otherwise, I'm OK with the rest of the changes in the spec files etc.

Comment 16 Matthew Harmsen 2009-12-16 18:25:36 UTC
In response to Comment #15, as discussed in IRC, nkinder of 389 has security policy that addresses this issue.

Rather than removing this, let's first try to make it work (potentially needing to add another executable under /usr/bin which is owned by the "pki-ca" RPM) since this has been provided in previous versions of the product.

Comment 17 Matthew Harmsen 2009-12-16 18:45:45 UTC
Update to Comment #13:

- user is not root - status 4 else 1
  -- this appears to be incorrect -- should be 4 (insufficient privilege) for 
     start/stop etc.
  -- Also, it appears one needs to be root to do a status.  Isn't this bad?

As stated earlier, nkinder of 389 stated that this was optional.  Also in
chats in IRC, I believe that this should not be changed at this time for the following reasons:

    * no requirement
    * adding functionality (ability for non-root user to obtain status)
      that was not provided in previous versions of the product
    * information currently returned by 'status' may be considered
      'sensitive' based upon the customer's policy
    * would probably need to create a 'pkistatus' action for the existing
      'status', and provide much less information with a new 'status'
      action, and it is unclear why this would be needed

However, I did separate and change the error reporting status of not being 'root':

# Obtain the operating system upon which this script is being executed
OS=`uname -s`
ARCHITECTURE=""

# This script must be run as root!
RV=0
if [ ${OS} = "Linux" ] ; then
    PKI_INIT_SCRIPT="/sbin/service ${PKI_PROCESS}"
    if [ `id -u` -ne 0 ] ; then
        echo "Must be 'root' to execute '$0'!"
        if [ "${command}" != "status" ]; then
            # * 4 user had insufficient privilege
            exit 4
        else
            # * 4 program or service status is unknown
            exit 4
        fi
    fi
    ARCHITECTURE=`uname -i`
elif [ ${OS} = "SunOS" ] ; then
    PKI_INIT_SCRIPT="/etc/init.d/${PKI_PROCESS}"
    if [ `/usr/xpg4/bin/id -u` -ne 0 ] ; then
        echo "Must be 'root' to execute '$0'!"
        if [ "${command}" != "status" ]; then
            # * 4 user had insufficient privilege
            exit 4
        else
            # * 4 program or service status is unknown
            exit 4
        fi
    fi
    ARCHITECTURE=`uname -p`
    if  [ "${ARCHITECTURE}" = "sparc" ] &&
        [ -d "/usr/lib/sparcv9/" ] ; then
        ARCHITECTURE="sparcv9"
    fi
else
    echo "Unsupported OS '${OS}'!"
    exit ${default_error}
fi

Comment 18 Matthew Harmsen 2009-12-17 02:01:59 UTC
Created attachment 378873 [details]
base file inventory

Comment 19 Matthew Harmsen 2009-12-17 02:02:29 UTC
Created attachment 378874 [details]
base diffs

Comment 20 Matthew Harmsen 2009-12-17 02:03:04 UTC
Created attachment 378875 [details]
dogtag file inventory

Comment 21 Matthew Harmsen 2009-12-17 02:03:39 UTC
Created attachment 378876 [details]
dogtag diffs

Comment 22 Matthew Harmsen 2009-12-17 02:04:07 UTC
Update to Comment #13:

- not configured - status 4 else 6 
  -- as we discussed, I do not believe status should return 4 here - rather it 
     should return 0 or maybe an application specific code)

- configured but not restarted -  status 4, else 1 (unknown)
  -- I'm going to argue for an application specific code here , rather than unknown

many instances (start): 1 fails --> return this failure code
                        more than one fails -> 1
                        1 or more config (and no errors) -> 6
                        no installed - 5
                        only one instance -- return that instance return code

As discussed on IRC, it is believed that "configuration" from init scripts refer more
to how other OS programs view this problem:

     * "unconfigured" instances from a PKI view point are still "running"
     * "unconfigured" instances from a non-PKI view point are considered
       to NOT be "running", and focus on the return of a non-zero error code

Consequently, it has been decided that we should return "0" for all of these cases:

    - not configured - status 0 else 0
    - configured but not restarted -  status 0, else 0
    - many instances (start): 1 or more config (and no errors) -> 0
    - many instances (status): configuration errors -> 0

NOTE:  In the case of more than one instance, we will now attempt to print
       a WARNING message about the number of unconfigured instances.

IMPORTANT:  I have attached fresh versions of the diffs (pay special attention to
            'pki-cad').  I also fixed the "usage" statement and return value in
            instant script created by pkicreate.

REMINDER:   Other than the existing changes, I have still not looked to see whether or
            not we need to update 'pki-silent' programs or templates to account for this
            new 'pki-cad' process.  We still need to test this by actually configuring
            an instance.

Comment 23 Ade Lee 2009-12-17 16:23:48 UTC
Very nice - particularly the changed text in the status (showing the number of instances to be configured).

pkisilent does not restart the instances, so it does not have to be changed.

I still think that allowing non-root to do status is valuable. Please open a bug for this and we can discuss at bug council.

One final point, when we do:

service pki-cad reload

the script returns with the correct return code, but there is no indication that this actually did nothing.  There should be some output to the user that this is unimplemented.

With this change, the code is approved.

alee +

Comment 24 Matthew Harmsen 2009-12-17 17:06:14 UTC
Added the "echo" line to 'pki-cad' reload option:

    reload)
        echo "The 'reload' action is an unimplemented feature."
        exit ${default_error}
        ;;

Filed Bugzilla Bug #548510 -  Allow non-root usage of 'status' command

Comment 25 Matthew Harmsen 2009-12-17 17:14:03 UTC
# cd pki/base
# svn status | grep -v ^$ | grep -v ^P | grep -v ^X | grep -v ^?
D       ca/setup/postinstall
M       ca/LICENSE
D       ca/shared/etc/init.d/httpd
A       ca/shared/etc/init.d/pki-cad
M       ca/build.xml
M       selinux/LICENSE
D       common/setup/postinstall
M       common/LICENSE
M       common/build.xml
M       silent/LICENSE
M       silent/templates/subca_silent.template
M       setup/pkiremove
M       setup/LICENSE
M       setup/pkicreate
M       setup/pkicommon
M       config/product.xml
M       ca/config/product.xml
M       selinux/config/product.xml
M       common/config/product.xml
M       silent/config/product.xml
M       setup/config/product.xml

# svn commit
Sending        base/ca/LICENSE
Sending        base/ca/build.xml
Deleting       base/ca/setup/postinstall
Deleting       base/ca/shared/etc/init.d/httpd
Adding         base/ca/shared/etc/init.d/pki-cad
Sending        base/common/LICENSE
Sending        base/common/build.xml
Deleting       base/common/setup/postinstall
Sending        base/config/product.xml
Sending        base/selinux/LICENSE
Sending        base/setup/LICENSE
Sending        base/setup/pkicommon
Sending        base/setup/pkicreate
Sending        base/setup/pkiremove
Sending        base/silent/LICENSE
Sending        base/silent/templates/subca_silent.template
Transmitting file data .............
Committed revision 892.

# cd pki/dogtag
# svn status | grep -v ^$ | grep -v ^P | grep -v ^X | grep -v ^?
M       ca/pki-ca.spec
M       selinux/pki-selinux.spec
M       scripts/build_ca
M       scripts/build_pki
M       common/pki-common.spec
M       silent/pki-silent.spec
M       setup/pki-setup.spec
M       ca/config/product.xml
M       selinux/config/product.xml
M       common/config/product.xml
M       silent/config/product.xml
M       setup/config/product.xml

# svn commit
Sending        dogtag/ca/pki-ca.spec
Sending        dogtag/common/pki-common.spec
Sending        dogtag/scripts/build_ca
Sending        dogtag/scripts/build_pki
Sending        dogtag/selinux/pki-selinux.spec
Sending        dogtag/setup/pki-setup.spec
Sending        dogtag/silent/pki-silent.spec
Transmitting file data .......
Committed revision 893.

Comment 26 Matthew Harmsen 2009-12-18 04:15:26 UTC
This feature has been fully documented at:

   * http://pki.fedoraproject.org/wiki/PKI_Registry


Note You need to log in before you can comment on or make changes to this bug.