Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1456562 - Cannot upgrade OSP11 to OSP12 due to freeze after executing of overcloud deploy command [NEEDINFO]
Cannot upgrade OSP11 to OSP12 due to freeze after executing of overcloud depl...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-selinux (Show other bugs)
12.0 (Pike)
x86_64 Linux
high Severity urgent
: ga
: 12.0 (Pike)
Assigned To: Lon Hohberger
Udi Shkalim
: Triaged
: 1477611 (view as bug list)
Depends On:
Blocks: 1478004
  Show dependency treegraph
 
Reported: 2017-05-29 11:09 EDT by Artem Hrechanychenko
Modified: 2018-02-05 14:07 EST (History)
19 users (show)

See Also:
Fixed In Version: openstack-selinux-0.8.8-0.20170804200925.ad96ed3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1478004 (view as bug list)
Environment:
Last Closed: 2017-12-13 16:29:42 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
dprince: needinfo? (shardy)


Attachments (Terms of Use)
audit.log (7.75 MB, text/plain)
2017-07-25 17:14 EDT, Marius Cornea
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-15 20:43:25 EST

  None (edit)
Description Artem Hrechanychenko 2017-05-29 11:09:19 EDT
Description of problem:
(undercloud) [stack@undercloud-0 ~]$ openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com     --control-scale 1     --templates $THT     -e $THT/environments/low-memory-usage.yaml     -e $THT/environments/docker.yaml     -e $THT/environments/major-upgrade-composable-steps-docker.yaml     -e $THT/environments/containers-upgrade-repos.yaml     -e $THT/environments/docker-osp12.yaml
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 17ba8909-34a5-4738-a169-47eafa3c7b53
Waiting for messages on queue 'e6be1d56-3d22-45d2-9e5e-e991c1e786b4' with no timeout.

(undercloud) [stack@undercloud-0 ~]$ date
Mon May 29 10:58:10 EDT 2017
(undercloud) [stack@undercloud-0 ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
^[[A+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time | project                          |
+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
| 61252786-3df4-4fa8-b91b-868a5af148ad | overcloud  | CREATE_COMPLETE | 2017-05-29T08:38:37Z | None         | 853b4c20691043b58ab0b5fa344cf23d |
+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
(undercloud) [stack@undercloud-0 ~]$ ps -aux |grep overcloud
stack     3471  0.1  0.3 359000 62840 pts/0    S+   10:49   0:00 /usr/bin/python2 /usr/bin/openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com --control-scale 1 --templates /usr/share/openstack-tripleo-heat-templates -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/containers-upgrade-repos.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml
stack     3908  0.0  0.0 112664   972 pts/1    R+   10:58   0:00 grep --color=auto overcloud


Version-Release number of selected component (if applicable):
OSP11->OSP12

How reproducible:
alway

Steps to Reproduce:
1) Deploy OSP11 Undercloud & Overcloud
2) perform minor update of undercloud node
>sudo rhos-release 11 -r 7.4
>sudo systemctl stop 'openstack-*' 'neutron-*' httpd
>sudo yum update -y instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
>openstack undercloud upgrade
>sudo reboot

3) upgrade osp11 undrcloud to osp12

>sudo rhos-release 12-director
>sudo systemctl stop openstack-*
>sudo systemctl stop neutron-*
>sudo systemctl stop httpd

Apply workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1456458

>sudo yum -y update instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
>openstack undercloud upgrade

#Overcloud upgrade 
source stackrc

export IMAGE_TAG=`bash /usr/bin/puddle-version http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/latest_containers`

 sudo wget -O /home/stack/overcloud_containers.yaml http://file.rdu.redhat.com/~ohochman/containers/overcloud_containers.yaml
    sudo chown stack:stack /home/stack/overcloud_containers.yaml
    sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /home/stack/overcloud_containers.yaml
    sudo sed -i 's/--insecure-registry 192.168.24.3:8787/--insecure-registry docker-registry.engineering.redhat.com/' /etc/sysconfig/docker
    sudo service docker restart
    source /home/stack/stackrc && openstack overcloud container image upload --verbose --config-file /home/stack/overcloud_containers.yaml 
    
sudo wget -O /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml http://file.rdu.redhat.com/~ohochman/containers/docker-osp12.yaml
    sudo chown stack:stack /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml

    sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml


# use master repos
cat > ~/containers-upgrade-repos.yaml <<EOEF
parameter_defaults:
  UpgradeInitCommand: |
    set -e
    curl -LO http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm
    rpm -ivh rhos-release-latest.noarch.rpm || true
    rhos-release 12-director
    yum clean all
EOEF

export THT=/usr/share/openstack-tripleo-heat-templates

# migrate to containers
openstack overcloud deploy \
    --libvirt-type qemu \
    --ntp-server clock.redhat.com \
    --control-scale 1 \
    --templates $THT \
    -e $THT/environments/low-memory-usage.yaml \
    -e $THT/environments/docker.yaml \
    -e $THT/environments/major-upgrade-composable-steps-docker.yaml \
    -e ~/containers-upgrade-repos.yaml \
    -e ~/docker-osp12.yaml

Actual results:
overcloud upgrade stuck

Expected results:

overcloud successfully upgraded to next major version
Additional info:
Comment 1 Red Hat Bugzilla Rules Engine 2017-05-29 11:09:24 EDT
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Comment 2 Red Hat Bugzilla Rules Engine 2017-05-29 11:14:07 EDT
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Comment 3 Dougal Matthews 2017-06-07 08:54:05 EDT
In the Mistral executor logs I found the following error:

2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor Traceback (most recent call last):
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 112, in run_action
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor     result = action.run()
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 159, in run
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor     (self.__class__.__name__, self.client_method_name, e_str)
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor ActionException: ZaqarAction.queue_post failed: <class 'zaqarclient.transport.errors.InternalServerError'>: Error response from Zaqar. Code: 500. Text: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <html><head>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <title>500 Internal Server Error</title>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </head><body>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <h1>Internal Server Error</h1>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>The server encountered an internal error or
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor misconfiguration and was unable to complete
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor your request.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>Please contact the server administrator at 
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor  [no address given] to inform them of the time this error occurred,
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor  and the actions you performed just before this error.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>More information about this error may be available
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor in the server error log.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </body></html>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor .
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor 



It seems that Zaqar is returning an error every time Mistral attempts to access it.
Comment 4 Thomas Hervé 2017-06-07 10:35:35 EDT
So from the apache logs:

[Wed Jun 07 09:56:16.069984 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Target WSGI script '/var/www/cgi-bin/zaqar/zaqar-server' cannot be loaded as Python module.
[Wed Jun 07 09:56:16.070035 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Exception occurred processing WSGI script '/var/www/cgi-bin/zaqar/zaqar-server'.
[Wed Jun 07 09:56:16.070054 2017] [:error] [pid 1906] [remote 192.168.24.1:144] Traceback (most recent call last):
[Wed Jun 07 09:56:16.070068 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/var/www/cgi-bin/zaqar/zaqar-server", line 41, in <module>
[Wed Jun 07 09:56:16.070091 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     log.register_options(conf)
[Wed Jun 07 09:56:16.070098 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 253, in register_options
[Wed Jun 07 09:56:16.070110 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     conf.register_cli_opts(_options.common_cli_opts)
[Wed Jun 07 09:56:16.070115 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2318, in __inner
[Wed Jun 07 09:56:16.070124 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     result = f(self, *args, **kwargs)
[Wed Jun 07 09:56:16.070129 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2540, in register_cli_opts
[Wed Jun 07 09:56:16.070136 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     self.register_cli_opt(opt, group, clear_cache=False)
[Wed Jun 07 09:56:16.070140 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2322, in __inner
[Wed Jun 07 09:56:16.070147 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     return f(self, *args, **kwargs)
[Wed Jun 07 09:56:16.070151 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2532, in register_cli_opt
[Wed Jun 07 09:56:16.070185 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     raise ArgsAlreadyParsedError("cannot register CLI option")
[Wed Jun 07 09:56:16.070201 2017] [:error] [pid 1906] [remote 192.168.24.1:144] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option


Restarting apache seems to fix it, but I don't know why.
Comment 5 Thomas Hervé 2017-06-08 15:50:25 EDT
After the issue reappeared, it seemed that an additional step was required: move /var/log/zaqar/zaqar.log. I wasn't able to know why yet, but maybe permissions error?
Comment 6 Artem Hrechanychenko 2017-06-19 11:50:30 EDT
w/a restart httpd service and delete /var/log/zaqar/zaqar.log
Comment 7 Dan Prince 2017-06-19 11:53:40 EDT
Any chance this is related to https://review.openstack.org/#/c/472618
Comment 8 Marius Cornea 2017-07-11 17:46:54 EDT
Looks like an SELinux issue:

 [root@undercloud-0 stack]# grep zaqar /var/log/audit/audit.log                                                                                                                                                                                
type=AVC msg=audit(1499809225.125:6424): avc:  denied  { open } for  pid=23130 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file
type=AVC msg=audit(1499809391.677:6514): avc:  denied  { open } for  pid=23563 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file


 [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log 
-rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0   /var/log/zaqar/zaqar.log
Comment 9 Marius Cornea 2017-07-11 17:49:36 EDT
After removing the file and restarting httpd:

[root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log
-rw-r--r--. zaqar zaqar system_u:object_r:httpd_log_t:s0 /var/log/zaqar/zaqar.log
Comment 10 Mike Orazi 2017-07-25 10:49:15 EDT
Lon,

Can you help confirm this is an selinux bug?
Comment 11 Marius Cornea 2017-07-25 17:14 EDT
Created attachment 1304472 [details]
audit.log

After undercloud upgrade:

[root@undercloud-0 stack]# rpm -qa | grep openstack-selinux
openstack-selinux-0.8.8-0.20170622195307.74ddc0e.el7ost.noarch

[root@undercloud-0 stack]#  ls -lZ /var/log/zaqar/zaqar.log
-rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0   /var/log/zaqar/zaqar.log

Attaching /var/log/audit/audit.log after undercloud upgrade.
Comment 12 Marius Cornea 2017-08-02 10:06:43 EDT
*** Bug 1477611 has been marked as a duplicate of this bug. ***
Comment 13 Lon Hohberger 2017-08-02 11:18:52 EDT
We need to set /var/log/zaqar/zaqar.log to a different file context and then call 'restorecon' after deployment.
Comment 14 Lon Hohberger 2017-08-02 11:55:03 EDT
https://github.com/redhat-openstack/openstack-selinux/commit/f7297a77903fcb8963c7796e49d357cb7cce2b28

This still needs a restorecon for it to operate correctly.
Comment 21 errata-xmlrpc 2017-12-13 16:29:42 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.