Bug 1253432

Summary: Some upgrades are failing on RHEL 6 with an error about tomcat
Product: Red Hat Satellite Reporter: Stephen Benjamin <stbenjam>
Component: UpgradesAssignee: Stephen Benjamin <stbenjam>
Status: CLOSED ERRATA QA Contact: Sachin Ghai <sghai>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1.0CC: bbuckingham, dgross, michael.orlov, mmccune, sghai, stbenjam, sthirugn
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
URL: http://projects.theforeman.org/issues/11353
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-26 19:47:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stephen Benjamin 2015-08-13 17:04:04 UTC
Description of problem:
Stopping tomcat6: waiting for processes 2787 to exit
   killing 2787 which did not stop after 60 seconds [WARNING]
                                                              [ OK  ]
   Starting tomcat6:                                          [ OK  ]
   Stopping httpd:                                            [ OK  ]
   Starting httpd:                                            [ OK  ]
   Starting foreman-tasks:                                    [ OK  ]
   Some services failed: tomcat6

   [ERROR 2015-08-12 12:22:53 main] Upgrade step restart_services failed.
Check logs for more information.
   [DEBUG 2015-08-12 12:22:53 main] Exit with status code: 1 (signal was 1)
   [ERROR 2015-08-12 12:22:53 main] Repeating errors encountered during run:
   [ERROR 2015-08-12 12:22:53 main] mongod is stopped

*[ERROR 2015-08-12 12:22:53 main] <NilClass> nil*
   [ERROR 2015-08-12 12:22:53 main] httpd is stopped

Comment 1 Stephen Benjamin 2015-08-13 18:27:55 UTC
I am not able to reproduce this on my own systems, not matter what I do to tomcat.  If anyone encounters this, it would be very helpful to see what step it's failing on, and what state tomcat is in.  This script gathers that data:

#!/bin/bash

time service tomcat6 stop
echo "Exit code: " $?

time service tomcat6 start
echo "Exit code: " $?

# maximum time to wait (in seconds)
WAIT_MAX=${WAIT_MAX:-30}
TOMCAT_PORT=${TOMCAT_PORT:-8443}
TOMCAT_SERV_PORT=${TOMCAT_SERV_PORT:-8005}
TOMCAT_TEST_URL=${TOMCAT_TEST_URL:-https://localhost:$TOMCAT_PORT/candlepin/status}

wait_for_url() {
    echo $(date)

    time /usr/bin/wget --timeout=1 --tries=$WAIT_MAX --retry-connrefused -qO- --no-check-certificate $1
    echo "wget exit code ${?}"

    time /usr/bin/curl -ks --retry $WAIT_MAX --retry-delay 1 $1

    echo ""
    echo "curl exit code ${?}"

    if ! [ $? = '0' ]; then
        RETVAL=5
    fi
}

time wait_for_url $TOMCAT_TEST_URL
echo "wait_for_url exit code ${?}"

time service-wait tomcat6 restart
echo "service-wait exit code ${?}"

Comment 2 Dylan Gross 2015-08-13 18:48:43 UTC
Customer experiencing the issue in (Case 01491540) ran Tomcat stop/start script from Comment #1.   Results....

Stopping tomcat6: waiting for processes 28198 to exit
                                                           [  OK  ]

real	0m3.344s
user	0m0.311s
sys	0m0.049s
Exit code:  0
Starting tomcat6:                                          [  OK  ]

real	0m0.078s
user	0m0.018s
sys	0m0.017s
Exit code:  0
Thu Aug 13 14:38:31 EDT 2015

real	0m0.014s
user	0m0.012s
sys	0m0.002s
wget exit code 4

real	0m0.004s
user	0m0.000s
sys	0m0.002s

curl exit code 0

real	0m0.019s
user	0m0.012s
sys	0m0.004s
wait_for_url exit code 0
Stopping tomcat6: waiting for processes 28938 to exit
killing 28938 which did not stop after 60 seconds          [WARNING]
                                                           [  OK  ]
Starting tomcat6:                                          [  OK  ]

real	1m1.994s
user	0m0.496s
sys	0m0.390s
service-wait exit code 5

Comment 4 Michael Orlov 2015-08-14 12:54:51 UTC
As a workaround I commented out the tomcat6 call in /usr/sbin/service-wait.
That allows me to upgrade the Satellite.

Comment 6 Stephen Benjamin 2015-08-14 14:17:26 UTC
Created redmine issue http://projects.theforeman.org/issues/11353 from this bug

Comment 7 Stephen Benjamin 2015-08-14 14:53:14 UTC
Thanks for those who supplied the output of that, it helped understand what was going wrong.

There's a brief window where tomcat's listening on 8443, but not responding 200 to /candlepin/status.  Sometimes, we end up calling wget in that window, which exits immediately with a failure.

wget does not obey the --tries there in this case.

It's generally reproducible if you do this as one command, you'll see wget does NOT retry:

service tomcat6 stop; service tomcat6 start; /usr/bin/wget --timeout=1 --tries=30 --retry-connrefused -qO- --no-check-certificate https://localhost:8443/candlepin/status; echo $?

Comment 8 Bryan Kearney 2015-08-14 17:53:24 UTC
Upstream bug assigned to stbenjam

Comment 9 Bryan Kearney 2015-08-14 17:53:25 UTC
Moving to POST since upstream bug http://projects.theforeman.org/issues/11353 has been closed

Comment 13 Sachin Ghai 2015-08-19 11:35:44 UTC
Ok, I was trying to verify this bz. I installed sat6.0.8 and populated some content (along with capsule/provisioning conf) and upgraded the server with snap17.

Upgrade is completed successfully.

[root@cloud-qe-9 yum.repos.d]# katello-installer --upgrade
Upgrading...
Upgrade Step: stop_services...
Upgrade Step: start_mongo...
Upgrade Step: migrate_pulp...
Upgrade Step: start_httpd...
Upgrade Step: migrate_candlepin...
Upgrade Step: migrate_foreman...
Upgrade Step: Running installer...
Installing             Done                                               [100%] [..................................................................]
  The full log is at /var/log/katello-installer/katello-installer.log
Upgrade Step: restart_services...
Upgrade Step: db_seed...
Upgrade Step: errata_import (this may take a while) ...
Upgrade Step: update_gpg_urls (this may take a while) ...
Upgrade Step: update_repository_metadata (this may take a while) ...
Katello upgrade completed!

Comment 14 Sachin Ghai 2015-08-19 11:36:22 UTC
I'm curious to know  if there is any other way to know whether the original issue is really fixed with new snap ?

I can see the changes in `/usr/share/katello/script/service-wait` but still would like to double check. thanks

Comment 17 Sachin Ghai 2015-08-19 12:05:36 UTC
Based on comments 13 and 15. Moving this to verified. Thanks

Comment 19 errata-xmlrpc 2015-08-26 19:47:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1688