Hide Forgot
Description of problem: (undercloud) [stack@undercloud-0 ~]$ openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com --control-scale 1 --templates $THT -e $THT/environments/low-memory-usage.yaml -e $THT/environments/docker.yaml -e $THT/environments/major-upgrade-composable-steps-docker.yaml -e $THT/environments/containers-upgrade-repos.yaml -e $THT/environments/docker-osp12.yaml Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 17ba8909-34a5-4738-a169-47eafa3c7b53 Waiting for messages on queue 'e6be1d56-3d22-45d2-9e5e-e991c1e786b4' with no timeout. (undercloud) [stack@undercloud-0 ~]$ date Mon May 29 10:58:10 EDT 2017 (undercloud) [stack@undercloud-0 ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead ^[[A+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ | id | stack_name | stack_status | creation_time | updated_time | project | +--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ | 61252786-3df4-4fa8-b91b-868a5af148ad | overcloud | CREATE_COMPLETE | 2017-05-29T08:38:37Z | None | 853b4c20691043b58ab0b5fa344cf23d | +--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ (undercloud) [stack@undercloud-0 ~]$ ps -aux |grep overcloud stack 3471 0.1 0.3 359000 62840 pts/0 S+ 10:49 0:00 /usr/bin/python2 /usr/bin/openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com --control-scale 1 --templates /usr/share/openstack-tripleo-heat-templates -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/containers-upgrade-repos.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml stack 3908 0.0 0.0 112664 972 pts/1 R+ 10:58 0:00 grep --color=auto overcloud Version-Release number of selected component (if applicable): OSP11->OSP12 How reproducible: alway Steps to Reproduce: 1) Deploy OSP11 Undercloud & Overcloud 2) perform minor update of undercloud node >sudo rhos-release 11 -r 7.4 >sudo systemctl stop 'openstack-*' 'neutron-*' httpd >sudo yum update -y instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient >openstack undercloud upgrade >sudo reboot 3) upgrade osp11 undrcloud to osp12 >sudo rhos-release 12-director >sudo systemctl stop openstack-* >sudo systemctl stop neutron-* >sudo systemctl stop httpd Apply workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1456458 >sudo yum -y update instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient >openstack undercloud upgrade #Overcloud upgrade source stackrc export IMAGE_TAG=`bash /usr/bin/puddle-version http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/latest_containers` sudo wget -O /home/stack/overcloud_containers.yaml http://file.rdu.redhat.com/~ohochman/containers/overcloud_containers.yaml sudo chown stack:stack /home/stack/overcloud_containers.yaml sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /home/stack/overcloud_containers.yaml sudo sed -i 's/--insecure-registry 192.168.24.3:8787/--insecure-registry docker-registry.engineering.redhat.com/' /etc/sysconfig/docker sudo service docker restart source /home/stack/stackrc && openstack overcloud container image upload --verbose --config-file /home/stack/overcloud_containers.yaml sudo wget -O /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml http://file.rdu.redhat.com/~ohochman/containers/docker-osp12.yaml sudo chown stack:stack /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml # use master repos cat > ~/containers-upgrade-repos.yaml <<EOEF parameter_defaults: UpgradeInitCommand: | set -e curl -LO http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm rpm -ivh rhos-release-latest.noarch.rpm || true rhos-release 12-director yum clean all EOEF export THT=/usr/share/openstack-tripleo-heat-templates # migrate to containers openstack overcloud deploy \ --libvirt-type qemu \ --ntp-server clock.redhat.com \ --control-scale 1 \ --templates $THT \ -e $THT/environments/low-memory-usage.yaml \ -e $THT/environments/docker.yaml \ -e $THT/environments/major-upgrade-composable-steps-docker.yaml \ -e ~/containers-upgrade-repos.yaml \ -e ~/docker-osp12.yaml Actual results: overcloud upgrade stuck Expected results: overcloud successfully upgraded to next major version Additional info:
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
In the Mistral executor logs I found the following error: 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor Traceback (most recent call last): 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 112, in run_action 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor result = action.run() 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 159, in run 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor (self.__class__.__name__, self.client_method_name, e_str) 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor ActionException: ZaqarAction.queue_post failed: <class 'zaqarclient.transport.errors.InternalServerError'>: Error response from Zaqar. Code: 500. Text: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <html><head> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <title>500 Internal Server Error</title> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </head><body> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <h1>Internal Server Error</h1> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>The server encountered an internal error or 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor misconfiguration and was unable to complete 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor your request.</p> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>Please contact the server administrator at 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor [no address given] to inform them of the time this error occurred, 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor and the actions you performed just before this error.</p> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>More information about this error may be available 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor in the server error log.</p> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </body></html> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor . 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor It seems that Zaqar is returning an error every time Mistral attempts to access it.
So from the apache logs: [Wed Jun 07 09:56:16.069984 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Target WSGI script '/var/www/cgi-bin/zaqar/zaqar-server' cannot be loaded as Python module. [Wed Jun 07 09:56:16.070035 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Exception occurred processing WSGI script '/var/www/cgi-bin/zaqar/zaqar-server'. [Wed Jun 07 09:56:16.070054 2017] [:error] [pid 1906] [remote 192.168.24.1:144] Traceback (most recent call last): [Wed Jun 07 09:56:16.070068 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/var/www/cgi-bin/zaqar/zaqar-server", line 41, in <module> [Wed Jun 07 09:56:16.070091 2017] [:error] [pid 1906] [remote 192.168.24.1:144] log.register_options(conf) [Wed Jun 07 09:56:16.070098 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 253, in register_options [Wed Jun 07 09:56:16.070110 2017] [:error] [pid 1906] [remote 192.168.24.1:144] conf.register_cli_opts(_options.common_cli_opts) [Wed Jun 07 09:56:16.070115 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2318, in __inner [Wed Jun 07 09:56:16.070124 2017] [:error] [pid 1906] [remote 192.168.24.1:144] result = f(self, *args, **kwargs) [Wed Jun 07 09:56:16.070129 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2540, in register_cli_opts [Wed Jun 07 09:56:16.070136 2017] [:error] [pid 1906] [remote 192.168.24.1:144] self.register_cli_opt(opt, group, clear_cache=False) [Wed Jun 07 09:56:16.070140 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2322, in __inner [Wed Jun 07 09:56:16.070147 2017] [:error] [pid 1906] [remote 192.168.24.1:144] return f(self, *args, **kwargs) [Wed Jun 07 09:56:16.070151 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2532, in register_cli_opt [Wed Jun 07 09:56:16.070185 2017] [:error] [pid 1906] [remote 192.168.24.1:144] raise ArgsAlreadyParsedError("cannot register CLI option") [Wed Jun 07 09:56:16.070201 2017] [:error] [pid 1906] [remote 192.168.24.1:144] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option Restarting apache seems to fix it, but I don't know why.
After the issue reappeared, it seemed that an additional step was required: move /var/log/zaqar/zaqar.log. I wasn't able to know why yet, but maybe permissions error?
w/a restart httpd service and delete /var/log/zaqar/zaqar.log
Any chance this is related to https://review.openstack.org/#/c/472618
Looks like an SELinux issue: [root@undercloud-0 stack]# grep zaqar /var/log/audit/audit.log type=AVC msg=audit(1499809225.125:6424): avc: denied { open } for pid=23130 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file type=AVC msg=audit(1499809391.677:6514): avc: denied { open } for pid=23563 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log -rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0 /var/log/zaqar/zaqar.log
After removing the file and restarting httpd: [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log -rw-r--r--. zaqar zaqar system_u:object_r:httpd_log_t:s0 /var/log/zaqar/zaqar.log
Lon, Can you help confirm this is an selinux bug?
Created attachment 1304472 [details] audit.log After undercloud upgrade: [root@undercloud-0 stack]# rpm -qa | grep openstack-selinux openstack-selinux-0.8.8-0.20170622195307.74ddc0e.el7ost.noarch [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log -rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0 /var/log/zaqar/zaqar.log Attaching /var/log/audit/audit.log after undercloud upgrade.
*** Bug 1477611 has been marked as a duplicate of this bug. ***
We need to set /var/log/zaqar/zaqar.log to a different file context and then call 'restorecon' after deployment.
https://github.com/redhat-openstack/openstack-selinux/commit/f7297a77903fcb8963c7796e49d357cb7cce2b28 This still needs a restorecon for it to operate correctly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462