Bug 1456562 - Cannot upgrade OSP11 to OSP12 due to freeze after executing of overcloud deploy command
Summary: Cannot upgrade OSP11 to OSP12 due to freeze after executing of overcloud depl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-selinux
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ga
: 12.0 (Pike)
Assignee: Lon Hohberger
QA Contact: Udi Shkalim
URL:
Whiteboard:
: 1477611 (view as bug list)
Depends On:
Blocks: 1478004
TreeView+ depends on / blocked
 
Reported: 2017-05-29 15:09 UTC by Artem Hrechanychenko
Modified: 2023-09-14 03:58 UTC (History)
19 users (show)

Fixed In Version: openstack-selinux-0.8.8-0.20170804200925.ad96ed3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1478004 (view as bug list)
Environment:
Last Closed: 2017-12-13 21:29:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
audit.log (7.75 MB, text/plain)
2017-07-25 21:14 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Artem Hrechanychenko 2017-05-29 15:09:19 UTC
Description of problem:
(undercloud) [stack@undercloud-0 ~]$ openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com     --control-scale 1     --templates $THT     -e $THT/environments/low-memory-usage.yaml     -e $THT/environments/docker.yaml     -e $THT/environments/major-upgrade-composable-steps-docker.yaml     -e $THT/environments/containers-upgrade-repos.yaml     -e $THT/environments/docker-osp12.yaml
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 17ba8909-34a5-4738-a169-47eafa3c7b53
Waiting for messages on queue 'e6be1d56-3d22-45d2-9e5e-e991c1e786b4' with no timeout.

(undercloud) [stack@undercloud-0 ~]$ date
Mon May 29 10:58:10 EDT 2017
(undercloud) [stack@undercloud-0 ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
^[[A+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time | project                          |
+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
| 61252786-3df4-4fa8-b91b-868a5af148ad | overcloud  | CREATE_COMPLETE | 2017-05-29T08:38:37Z | None         | 853b4c20691043b58ab0b5fa344cf23d |
+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
(undercloud) [stack@undercloud-0 ~]$ ps -aux |grep overcloud
stack     3471  0.1  0.3 359000 62840 pts/0    S+   10:49   0:00 /usr/bin/python2 /usr/bin/openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com --control-scale 1 --templates /usr/share/openstack-tripleo-heat-templates -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/containers-upgrade-repos.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml
stack     3908  0.0  0.0 112664   972 pts/1    R+   10:58   0:00 grep --color=auto overcloud


Version-Release number of selected component (if applicable):
OSP11->OSP12

How reproducible:
alway

Steps to Reproduce:
1) Deploy OSP11 Undercloud & Overcloud
2) perform minor update of undercloud node
>sudo rhos-release 11 -r 7.4
>sudo systemctl stop 'openstack-*' 'neutron-*' httpd
>sudo yum update -y instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
>openstack undercloud upgrade
>sudo reboot

3) upgrade osp11 undrcloud to osp12

>sudo rhos-release 12-director
>sudo systemctl stop openstack-*
>sudo systemctl stop neutron-*
>sudo systemctl stop httpd

Apply workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1456458

>sudo yum -y update instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
>openstack undercloud upgrade

#Overcloud upgrade 
source stackrc

export IMAGE_TAG=`bash /usr/bin/puddle-version http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/latest_containers`

 sudo wget -O /home/stack/overcloud_containers.yaml http://file.rdu.redhat.com/~ohochman/containers/overcloud_containers.yaml
    sudo chown stack:stack /home/stack/overcloud_containers.yaml
    sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /home/stack/overcloud_containers.yaml
    sudo sed -i 's/--insecure-registry 192.168.24.3:8787/--insecure-registry docker-registry.engineering.redhat.com/' /etc/sysconfig/docker
    sudo service docker restart
    source /home/stack/stackrc && openstack overcloud container image upload --verbose --config-file /home/stack/overcloud_containers.yaml 
    
sudo wget -O /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml http://file.rdu.redhat.com/~ohochman/containers/docker-osp12.yaml
    sudo chown stack:stack /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml

    sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml


# use master repos
cat > ~/containers-upgrade-repos.yaml <<EOEF
parameter_defaults:
  UpgradeInitCommand: |
    set -e
    curl -LO http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm
    rpm -ivh rhos-release-latest.noarch.rpm || true
    rhos-release 12-director
    yum clean all
EOEF

export THT=/usr/share/openstack-tripleo-heat-templates

# migrate to containers
openstack overcloud deploy \
    --libvirt-type qemu \
    --ntp-server clock.redhat.com \
    --control-scale 1 \
    --templates $THT \
    -e $THT/environments/low-memory-usage.yaml \
    -e $THT/environments/docker.yaml \
    -e $THT/environments/major-upgrade-composable-steps-docker.yaml \
    -e ~/containers-upgrade-repos.yaml \
    -e ~/docker-osp12.yaml

Actual results:
overcloud upgrade stuck

Expected results:

overcloud successfully upgraded to next major version
Additional info:

Comment 1 Red Hat Bugzilla Rules Engine 2017-05-29 15:09:24 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 2 Red Hat Bugzilla Rules Engine 2017-05-29 15:14:07 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 3 Dougal Matthews 2017-06-07 12:54:05 UTC
In the Mistral executor logs I found the following error:

2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor Traceback (most recent call last):
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 112, in run_action
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor     result = action.run()
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 159, in run
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor     (self.__class__.__name__, self.client_method_name, e_str)
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor ActionException: ZaqarAction.queue_post failed: <class 'zaqarclient.transport.errors.InternalServerError'>: Error response from Zaqar. Code: 500. Text: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <html><head>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <title>500 Internal Server Error</title>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </head><body>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <h1>Internal Server Error</h1>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>The server encountered an internal error or
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor misconfiguration and was unable to complete
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor your request.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>Please contact the server administrator at 
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor  [no address given] to inform them of the time this error occurred,
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor  and the actions you performed just before this error.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>More information about this error may be available
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor in the server error log.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </body></html>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor .
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor 



It seems that Zaqar is returning an error every time Mistral attempts to access it.

Comment 4 Thomas Hervé 2017-06-07 14:35:35 UTC
So from the apache logs:

[Wed Jun 07 09:56:16.069984 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Target WSGI script '/var/www/cgi-bin/zaqar/zaqar-server' cannot be loaded as Python module.
[Wed Jun 07 09:56:16.070035 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Exception occurred processing WSGI script '/var/www/cgi-bin/zaqar/zaqar-server'.
[Wed Jun 07 09:56:16.070054 2017] [:error] [pid 1906] [remote 192.168.24.1:144] Traceback (most recent call last):
[Wed Jun 07 09:56:16.070068 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/var/www/cgi-bin/zaqar/zaqar-server", line 41, in <module>
[Wed Jun 07 09:56:16.070091 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     log.register_options(conf)
[Wed Jun 07 09:56:16.070098 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 253, in register_options
[Wed Jun 07 09:56:16.070110 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     conf.register_cli_opts(_options.common_cli_opts)
[Wed Jun 07 09:56:16.070115 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2318, in __inner
[Wed Jun 07 09:56:16.070124 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     result = f(self, *args, **kwargs)
[Wed Jun 07 09:56:16.070129 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2540, in register_cli_opts
[Wed Jun 07 09:56:16.070136 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     self.register_cli_opt(opt, group, clear_cache=False)
[Wed Jun 07 09:56:16.070140 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2322, in __inner
[Wed Jun 07 09:56:16.070147 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     return f(self, *args, **kwargs)
[Wed Jun 07 09:56:16.070151 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2532, in register_cli_opt
[Wed Jun 07 09:56:16.070185 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     raise ArgsAlreadyParsedError("cannot register CLI option")
[Wed Jun 07 09:56:16.070201 2017] [:error] [pid 1906] [remote 192.168.24.1:144] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option


Restarting apache seems to fix it, but I don't know why.

Comment 5 Thomas Hervé 2017-06-08 19:50:25 UTC
After the issue reappeared, it seemed that an additional step was required: move /var/log/zaqar/zaqar.log. I wasn't able to know why yet, but maybe permissions error?

Comment 6 Artem Hrechanychenko 2017-06-19 15:50:30 UTC
w/a restart httpd service and delete /var/log/zaqar/zaqar.log

Comment 7 Dan Prince 2017-06-19 15:53:40 UTC
Any chance this is related to https://review.openstack.org/#/c/472618

Comment 8 Marius Cornea 2017-07-11 21:46:54 UTC
Looks like an SELinux issue:

 [root@undercloud-0 stack]# grep zaqar /var/log/audit/audit.log                                                                                                                                                                                
type=AVC msg=audit(1499809225.125:6424): avc:  denied  { open } for  pid=23130 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file
type=AVC msg=audit(1499809391.677:6514): avc:  denied  { open } for  pid=23563 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file


 [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log 
-rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0   /var/log/zaqar/zaqar.log

Comment 9 Marius Cornea 2017-07-11 21:49:36 UTC
After removing the file and restarting httpd:

[root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log
-rw-r--r--. zaqar zaqar system_u:object_r:httpd_log_t:s0 /var/log/zaqar/zaqar.log

Comment 10 Mike Orazi 2017-07-25 14:49:15 UTC
Lon,

Can you help confirm this is an selinux bug?

Comment 11 Marius Cornea 2017-07-25 21:14:42 UTC
Created attachment 1304472 [details]
audit.log

After undercloud upgrade:

[root@undercloud-0 stack]# rpm -qa | grep openstack-selinux
openstack-selinux-0.8.8-0.20170622195307.74ddc0e.el7ost.noarch

[root@undercloud-0 stack]#  ls -lZ /var/log/zaqar/zaqar.log
-rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0   /var/log/zaqar/zaqar.log

Attaching /var/log/audit/audit.log after undercloud upgrade.

Comment 12 Marius Cornea 2017-08-02 14:06:43 UTC
*** Bug 1477611 has been marked as a duplicate of this bug. ***

Comment 13 Lon Hohberger 2017-08-02 15:18:52 UTC
We need to set /var/log/zaqar/zaqar.log to a different file context and then call 'restorecon' after deployment.

Comment 14 Lon Hohberger 2017-08-02 15:55:03 UTC
https://github.com/redhat-openstack/openstack-selinux/commit/f7297a77903fcb8963c7796e49d357cb7cce2b28

This still needs a restorecon for it to operate correctly.

Comment 21 errata-xmlrpc 2017-12-13 21:29:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Comment 23 Red Hat Bugzilla 2023-09-14 03:58:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.