Bug 1509584
| Summary: | civetweb binding ip address not honored [WAS: OSP11 -> OSP12 upgrade: major-upgrade-composable-steps-docker on environments with radosgw enabled fails: FAILED!] | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Marius Cornea <mcornea> | ||||
| Component: | Container | Assignee: | Guillaume Abrioux <gabrioux> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | ||||
| Severity: | urgent | Docs Contact: | Aron Gunn <agunn> | ||||
| Priority: | high | ||||||
| Version: | 2.4 | CC: | agunn, anharris, ceph-qe-bugs, dang, dbecker, dcadzow, gabrioux, gcharot, gfidente, gkadam, gmeno, goneri, hchen, hnallurv, jefbrown, jim.curtis, kdreyer, mburns, mcornea, mflusche, morazi, nalmond, pgrist, pprakash, rhel-osp-director-maint, scohen, shan, tserlin, yrabl | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 2.5 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | rhceph:ceph-2-rhel-7-docker-candidate-55538-20180119024102 | Doc Type: | Bug Fix | ||||
| Doc Text: |
.The Ceph Object Gateway successfully starts after upgrading Red Hat OpenStack Platform 11 to 12
Previously, when upgrading Red Hat OpenStack Platform 11 to 12, the Ceph Object Gateway would fail to start because port 8080 was already in use by `haproxy`. With this release, you can specify the IP address and port bindings for the Ceph Object Gateway. As a result, the Ceph Object Gateway will start properly.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1536074 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-02-21 20:38:32 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1498183 | ||||||
| Bug Blocks: | 1536401 | ||||||
| Attachments: |
|
||||||
Created attachment 1347910 [details]
ceph-install-workflow.log
I think the issue is with the list of environment files passed on upgrade. Specifically this: -e $THT/environments/ceph-radosgw.yaml \ should be -e $THT/environments/ceph-ansible/ceph-rgw.yaml \ Same is for the MDS service, the old environment file at environments/services/ceph-mds.yaml is deploying using puppet-ceph; the new environment file to be used is environments/ceph-ansible/ceph-mds.yaml Should we turn this into an upgrade docs bug? (In reply to Giulio Fidente from comment #2) > I think the issue is with the list of environment files passed on upgrade. > Specifically this: > > -e $THT/environments/ceph-radosgw.yaml \ > > should be > > -e $THT/environments/ceph-ansible/ceph-rgw.yaml \ > > Same is for the MDS service, the old environment file at > > environments/services/ceph-mds.yaml > > is deploying using puppet-ceph; the new environment file to be used is > > environments/ceph-ansible/ceph-mds.yaml > > Should we turn this into an upgrade docs bug? Sorry, I missed the environment files. I'm going to try using the ceph-ansible environments and see how it goes. After switching the environment files to use the ceph-ansible ones upgrade completed ok but several issues show up:
1. radosgw services are still running under systemd:
[root@overcloud-controller-0 heat-admin]# systemctl list-units -a | grep rados
ceph-radosgw.service loaded active running Ceph rados gateway
ceph-radosgw.service loaded activating auto-restart Ceph RGW
system-ceph\x2dradosgw.slice loaded active active system-ceph\x2dradosgw.slice
ceph-radosgw.target loaded active active ceph target allowing to start/stop all ceph-radosgw@.service instances at once
[root@overcloud-controller-0 heat-admin]# systemctl status ceph-radosgw.service
● ceph-radosgw.service - Ceph rados gateway
Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2017-11-06 18:39:43 UTC; 20h ago
Main PID: 72610 (radosgw)
CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service
└─72610 /usr/bin/radosgw -f --cluster ceph --name client.radosgw.gateway --setuser ceph --setgroup ceph
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
[root@overcloud-controller-0 heat-admin]# systemctl status ceph-radosgw.service
● ceph-radosgw.service - Ceph RGW
Loaded: loaded (/etc/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2017-11-07 15:11:12 UTC; 8s ago
Process: 137550 ExecStopPost=/usr/bin/docker stop ceph-rgw-overcloud-controller-0 (code=exited, status=1/FAILURE)
Process: 137339 ExecStart=/usr/bin/docker run --rm --net=host --memory=1g --cpu-quota=100000 -v /var/lib/ceph:/var/lib/ceph -v /etc/ceph:/etc/ceph -e RGW_CIVETWEB_IP=10.0.0.145 -v /etc/localtime:/etc/localtime:r
o -e CEPH_DAEMON=RGW -e CLUSTER=ceph -e RGW_CIVETWEB_PORT=8080 --name=ceph-rgw-overcloud-controller-0 docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest (code=exited, status=5)
Process: 137331 ExecStartPre=/usr/bin/docker rm ceph-rgw-overcloud-controller-0 (code=exited, status=1/FAILURE)
Process: 137323 ExecStartPre=/usr/bin/docker stop ceph-rgw-overcloud-controller-0 (code=exited, status=1/FAILURE)
Main PID: 137339 (code=exited, status=5)
Nov 07 15:11:12 overcloud-controller-0 systemd[1]: Unit ceph-radosgw.service entered failed state.
Nov 07 15:11:12 overcloud-controller-0 systemd[1]: ceph-radosgw.service failed.
2. There is no radosgw container running after the upgrade completes:
[root@overcloud-controller-0 heat-admin]# docker ps | grep ceph
c4b3874e93ed docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest "/entrypoint.sh" 17 hours ago Up 17 hours ceph-mon-overcloud-controller-0
[root
3. After rebooting a controller node the radosgw container starts but haproxy container fails to start:
[root@overcloud-controller-2 heat-admin]# docker ps | grep ceph
9b76d42c4927 docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest "/entrypoint.sh" 17 hours ago Up 17 hours ceph-rgw-overcloud-controller-2
e3b570004295 docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest "/entrypoint.sh" 17 hours ago Up 17 hours ceph-mon-overcloud-controller-2
Docker container set: haproxy-bundle [docker-registry.engineering.redhat.com/rhosp12/openstack-haproxy-docker:pcmklatest]
haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0
haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started overcloud-controller-1
haproxy-bundle-docker-2 (ocf::heartbeat:docker): Stopped
Failed Actions:
* haproxy-bundle-docker-2_start_0 on overcloud-controller-2 'unknown error' (1): call=89, status=complete, exitreason='Newly created docker container exited after start',
last-rc-change='Mon Nov 6 21:51:59 2017', queued=0ms, exec=9021ms
The radosgw service binds on all addresses:
[root@overcloud-controller-2 heat-admin]# ps axu | grep radosgw
ceph 10068 0.1 0.2 3800048 33436 ? Ssl Nov06 2:01 /usr/bin/radosgw --cluster ceph --setuser ceph --setgroup ceph -d -n client.rgw.overcloud-controller-2 -k /var/lib/ceph/radosgw/overcloud-controller-2/keyring --rgw-socket-path= --rgw-zonegroup= --rgw-zone= --rgw-frontends=civetweb port=8080
root 140381 0.0 0.0 112664 972 pts/0 S+ 15:14 0:00 grep --color=auto radosgw
[root@overcloud-controller-2 heat-admin]# netstat -tupan | grep radosgw
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 10068/radosgw
tcp 0 0 10.0.0.153:38624 10.0.0.149:6800 ESTABLISHED 10068/radosgw
tcp 0 0 10.0.0.153:35116 10.0.0.155:6802 ESTABLISHED 10068/radosgw
tcp 0 0 10.0.0.153:34206 10.0.0.149:6802 ESTABLISHED 10068/radosgw
tcp 0 0 10.0.0.153:59340 10.0.0.142:6802 ESTABLISHED 10068/radosgw
tcp 0 0 10.0.0.153:52438 10.0.0.153:6789 ESTABLISHED 10068/radosgw
tcp 0 0 10.0.0.153:60062 10.0.0.155:6800 ESTABLISHED 10068/radosgw
tcp 0 0 10.0.0.153:55256 10.0.0.142:6800 ESTABLISHED 10068/radosgw
Yes, it is fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1498183 Providing the QA ack. A deployment with rgw failed. The rgw contianer fail to start because port 8080 is already in use civetweb: 0x55abc6c5adc0: set_ports_option: cannot bind to 172.17.3.20:8080: 98 (Address already in use) The verification failed with the same error with rhceph:ceph-2-rhel-7-docker-candidate-43803-20180119213048 Hi Yogev, from what i've seen, rhceph:ceph-2-rhel-7-docker-candidate-43803-20180119213048 contains the fix for this issue. Any chance we can access the environment where you tried to deploy this image ? tested it with the latest image and it passed. tested it with the latest image and it passed. *** Bug 1539192 has been marked as a duplicate of this bug. *** Hi Aron, Actually, this BZ was filled in Ceph Storage Product / Container Component but the actual solution here is a more a precision in the procedure for upgrading OSP11 -> OSP12. I don't mind filling the Doc Text field but I'm not sure what will be the impact regarding this BZ affection. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0341 *** Bug 1536074 has been marked as a duplicate of this bug. *** |
Description of problem: OSP11 -> OSP12 upgrade: major-upgrade-composable-steps-docker on environments with radosgw enabled fails: FAILED! => {"changed": false, "failed": true, "msg": "you must set radosgw_interface, radosgw_address or radosgw_address_block"} Version-Release number of selected component (if applicable): ceph-ansible-3.0.8-1.el7cp.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 with radosgw enabled: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates/ openstack overcloud deploy --templates $THT \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e $THT/environments/storage-environment.yaml \ -e $THT/environments/ceph-radosgw.yaml \ -e $THT/environments/tls-endpoints-public-ip.yaml \ -e ~/openstack_deployment/environments/nodes.yaml \ -e ~/openstack_deployment/environments/network-environment.yaml \ -e ~/openstack_deployment/environments/disk-layout.yaml \ -e ~/openstack_deployment/environments/public_vip.yaml \ -e ~/openstack_deployment/environments/enable-tls.yaml \ -e ~/openstack_deployment/environments/inject-trust-anchor.yaml \ -e ~/openstack_deployment/environments/neutron-settings.yaml \ 2. Upgrade to OSP12 source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates/ openstack overcloud deploy --templates $THT \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e $THT/environments/ceph-ansible/ceph-ansible.yaml \ -e $THT/environments/ceph-radosgw.yaml \ -e $THT/environments/tls-endpoints-public-ip.yaml \ -e ~/openstack_deployment/environments/nodes.yaml \ -e ~/openstack_deployment/environments/network-environment.yaml \ -e ~/openstack_deployment/environments/disk-layout.yaml \ -e ~/openstack_deployment/environments/public_vip.yaml \ -e ~/openstack_deployment/environments/enable-tls.yaml \ -e ~/openstack_deployment/environments/inject-trust-anchor.yaml \ -e ~/openstack_deployment/environments/neutron-settings.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \ -e /home/stack/ceph-ansible-env.yaml \ -e /home/stack/docker-osp12.yaml \ Actual results: Upgrade fails: [root@undercloud-0 stack]# tail /var/log/mistral/ceph-install-workflow.log 2017-11-04 17:33:23,336 p=27936 u=mistral | TASK [ceph-docker-common : make sure radosgw_interface, radosgw_address or radosgw_address_block is defined] *** 2017-11-04 17:33:23,395 p=27936 u=mistral | fatal: [192.168.0.24]: FAILED! => {"changed": false, "failed": true, "msg": "you must set radosgw_interface, radosgw_address or radosgw_address_block"} 2017-11-04 17:33:23,396 p=27936 u=mistral | PLAY RECAP ********************************************************************* 2017-11-04 17:33:23,396 p=27936 u=mistral | 192.168.0.13 : ok=4 changed=0 unreachable=0 failed=0 2017-11-04 17:33:23,396 p=27936 u=mistral | 192.168.0.17 : ok=4 changed=0 unreachable=0 failed=0 2017-11-04 17:33:23,397 p=27936 u=mistral | 192.168.0.19 : ok=4 changed=0 unreachable=0 failed=0 2017-11-04 17:33:23,397 p=27936 u=mistral | 192.168.0.20 : ok=4 changed=0 unreachable=0 failed=0 2017-11-04 17:33:23,397 p=27936 u=mistral | 192.168.0.23 : ok=4 changed=0 unreachable=0 failed=0 2017-11-04 17:33:23,397 p=27936 u=mistral | 192.168.0.24 : ok=24 changed=3 unreachable=0 failed=1 2017-11-04 17:33:23,397 p=27936 u=mistral | localhost : ok=0 changed=0 unreachable=0 failed=0 Expected results: Upgrade completes fine. Additional info: [stack@undercloud-0 ~]$ cat /home/stack/ceph-ansible-env.yaml parameter_defaults: CephAnsibleDisksConfig: devices: - '/dev/vdb' - '/dev/vdc'