Bug 1509584

Summary:

civetweb binding ip address not honored [WAS: OSP11 -> OSP12 upgrade: major-upgrade-composable-steps-docker on environments with radosgw enabled fails: FAILED!]

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Marius Cornea <mcornea>

Component:

Container

Assignee:

Guillaume Abrioux <gabrioux>

Status:

CLOSED ERRATA

QA Contact:

Marius Cornea <mcornea>

Severity:

urgent

Docs Contact:

Aron Gunn <agunn>

Priority:

high

Version:

2.4

CC:

agunn, anharris, ceph-qe-bugs, dang, dbecker, dcadzow, gabrioux, gcharot, gfidente, gkadam, gmeno, goneri, hchen, hnallurv, jefbrown, jim.curtis, kdreyer, mburns, mcornea, mflusche, morazi, nalmond, pgrist, pprakash, rhel-osp-director-maint, scohen, shan, tserlin, yrabl

Target Milestone:

Target Release:

2.5

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

rhceph:ceph-2-rhel-7-docker-candidate-55538-20180119024102

Doc Type:

Bug Fix

Doc Text:

.The Ceph Object Gateway successfully starts after upgrading Red Hat OpenStack Platform 11 to 12 Previously, when upgrading Red Hat OpenStack Platform 11 to 12, the Ceph Object Gateway would fail to start because port 8080 was already in use by `haproxy`. With this release, you can specify the IP address and port bindings for the Ceph Object Gateway. As a result, the Ceph Object Gateway will start properly.

Story Points:

---

Clone Of:

Clones:

1536074 (view as bug list)

Environment:

Last Closed:

2018-02-21 20:38:32 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1498183

Bug Blocks:

1536401

Attachments:

Description	Flags
ceph-install-workflow.log	none

Description Marius Cornea 2017-11-04 21:40:44 UTC

Description of problem:
OSP11 -> OSP12 upgrade: major-upgrade-composable-steps-docker on environments with radosgw enabled fails: FAILED! => {"changed": false, "failed": true, "msg": "you must set radosgw_interface, radosgw_address or radosgw_address_block"}

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.8-1.el7cp.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11 with radosgw enabled:

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates/

openstack overcloud deploy --templates $THT \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e $THT/environments/storage-environment.yaml \
-e $THT/environments/ceph-radosgw.yaml \
-e $THT/environments/tls-endpoints-public-ip.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/public_vip.yaml \
-e ~/openstack_deployment/environments/enable-tls.yaml \
-e ~/openstack_deployment/environments/inject-trust-anchor.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \

2. Upgrade to OSP12

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates/

openstack overcloud deploy --templates $THT \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e $THT/environments/ceph-ansible/ceph-ansible.yaml \
-e $THT/environments/ceph-radosgw.yaml \
-e $THT/environments/tls-endpoints-public-ip.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/public_vip.yaml \
-e ~/openstack_deployment/environments/enable-tls.yaml \
-e ~/openstack_deployment/environments/inject-trust-anchor.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \
-e /home/stack/ceph-ansible-env.yaml \
-e /home/stack/docker-osp12.yaml \

Actual results:
Upgrade fails:

[root@undercloud-0 stack]# tail /var/log/mistral/ceph-install-workflow.log 
2017-11-04 17:33:23,336 p=27936 u=mistral |  TASK [ceph-docker-common : make sure radosgw_interface, radosgw_address or radosgw_address_block is defined] ***
2017-11-04 17:33:23,395 p=27936 u=mistral |  fatal: [192.168.0.24]: FAILED! => {"changed": false, "failed": true, "msg": "you must set radosgw_interface, radosgw_address or radosgw_address_block"}
2017-11-04 17:33:23,396 p=27936 u=mistral |  PLAY RECAP *********************************************************************
2017-11-04 17:33:23,396 p=27936 u=mistral |  192.168.0.13               : ok=4    changed=0    unreachable=0    failed=0   
2017-11-04 17:33:23,396 p=27936 u=mistral |  192.168.0.17               : ok=4    changed=0    unreachable=0    failed=0   
2017-11-04 17:33:23,397 p=27936 u=mistral |  192.168.0.19               : ok=4    changed=0    unreachable=0    failed=0   
2017-11-04 17:33:23,397 p=27936 u=mistral |  192.168.0.20               : ok=4    changed=0    unreachable=0    failed=0   
2017-11-04 17:33:23,397 p=27936 u=mistral |  192.168.0.23               : ok=4    changed=0    unreachable=0    failed=0   
2017-11-04 17:33:23,397 p=27936 u=mistral |  192.168.0.24               : ok=24   changed=3    unreachable=0    failed=1   
2017-11-04 17:33:23,397 p=27936 u=mistral |  localhost                  : ok=0    changed=0    unreachable=0    failed=0   

Expected results:

Upgrade completes fine.

Additional info:

[stack@undercloud-0 ~]$ cat /home/stack/ceph-ansible-env.yaml
parameter_defaults:
    CephAnsibleDisksConfig:
        devices:
            - '/dev/vdb'
            - '/dev/vdc'

Comment 1 Marius Cornea 2017-11-04 21:41:47 UTC

Created attachment 1347910 [details]
ceph-install-workflow.log

Comment 2 Giulio Fidente 2017-11-06 09:30:09 UTC

I think the issue is with the list of environment files passed on upgrade. Specifically this:

  -e $THT/environments/ceph-radosgw.yaml \

should be

  -e $THT/environments/ceph-ansible/ceph-rgw.yaml \

Same is for the MDS service, the old environment file at

  environments/services/ceph-mds.yaml

is deploying using puppet-ceph; the new environment file to be used is

  environments/ceph-ansible/ceph-mds.yaml

Should we turn this into an upgrade docs bug?

Comment 3 Marius Cornea 2017-11-06 09:35:10 UTC

(In reply to Giulio Fidente from comment #2)
> I think the issue is with the list of environment files passed on upgrade.
> Specifically this:
> 
>   -e $THT/environments/ceph-radosgw.yaml \
> 
> should be
> 
>   -e $THT/environments/ceph-ansible/ceph-rgw.yaml \
> 
> Same is for the MDS service, the old environment file at
> 
>   environments/services/ceph-mds.yaml
> 
> is deploying using puppet-ceph; the new environment file to be used is
> 
>   environments/ceph-ansible/ceph-mds.yaml
> 
> Should we turn this into an upgrade docs bug?

Sorry, I missed the environment files. I'm going to try using the ceph-ansible environments and see how it goes.

Comment 6 Marius Cornea 2017-11-07 15:15:57 UTC

After switching the environment files to use the ceph-ansible ones upgrade completed ok but several issues show up:

1.  radosgw services are still running under systemd:

[root@overcloud-controller-0 heat-admin]# systemctl list-units -a | grep rados                                                                                                                                       
  ceph-radosgw.service                                          loaded    active     running      Ceph rados gateway                                                                                 
  ceph-radosgw.service                               loaded    activating auto-restart Ceph RGW                                                                                           
  system-ceph\x2dradosgw.slice                                                  loaded    active     active       system-ceph\x2dradosgw.slice                                                                       
  ceph-radosgw.target                                                           loaded    active     active       ceph target allowing to start/stop all ceph-radosgw@.service instances at once                     
[root@overcloud-controller-0 heat-admin]# systemctl status ceph-radosgw.service                                                                                                                      
● ceph-radosgw.service - Ceph rados gateway                                                                                                                                                          
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)                                                                                                                  
   Active: active (running) since Mon 2017-11-06 18:39:43 UTC; 20h ago                                                                                                                                               
 Main PID: 72610 (radosgw)                                                                                                                                                                                           
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service                                                                                                                           
           └─72610 /usr/bin/radosgw -f --cluster ceph --name client.radosgw.gateway --setuser ceph --setgroup ceph                                                                                                   
                                                                                                                                                                                                                     
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.                                                                                                                   
[root@overcloud-controller-0 heat-admin]# systemctl status ceph-radosgw.service                                                                                                           
● ceph-radosgw.service - Ceph RGW                                                                                                                                                         
   Loaded: loaded (/etc/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)                                                                                                                      
   Active: activating (auto-restart) (Result: exit-code) since Tue 2017-11-07 15:11:12 UTC; 8s ago                                                                                                                   
  Process: 137550 ExecStopPost=/usr/bin/docker stop ceph-rgw-overcloud-controller-0 (code=exited, status=1/FAILURE)                                                                                                  
  Process: 137339 ExecStart=/usr/bin/docker run --rm --net=host --memory=1g --cpu-quota=100000 -v /var/lib/ceph:/var/lib/ceph -v /etc/ceph:/etc/ceph -e RGW_CIVETWEB_IP=10.0.0.145 -v /etc/localtime:/etc/localtime:r
o -e CEPH_DAEMON=RGW -e CLUSTER=ceph -e RGW_CIVETWEB_PORT=8080 --name=ceph-rgw-overcloud-controller-0 docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest (code=exited, status=5)                      
  Process: 137331 ExecStartPre=/usr/bin/docker rm ceph-rgw-overcloud-controller-0 (code=exited, status=1/FAILURE)                                                                                                    
  Process: 137323 ExecStartPre=/usr/bin/docker stop ceph-rgw-overcloud-controller-0 (code=exited, status=1/FAILURE)                                                                                                  
 Main PID: 137339 (code=exited, status=5)                                                                                                                                                                            
                                                                                                                                                                                                                     
Nov 07 15:11:12 overcloud-controller-0 systemd[1]: Unit ceph-radosgw.service entered failed state.                                                                                        
Nov 07 15:11:12 overcloud-controller-0 systemd[1]: ceph-radosgw.service failed.

2. There is no radosgw container running after the upgrade completes:

[root@overcloud-controller-0 heat-admin]# docker ps | grep ceph
c4b3874e93ed        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest                                    "/entrypoint.sh"         17 hours ago        Up 17 hours                                 ceph-mon-overcloud-controller-0
[root

3. After rebooting a controller node the radosgw container starts but haproxy container fails to start:

[root@overcloud-controller-2 heat-admin]# docker ps | grep ceph
9b76d42c4927        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest                                    "/entrypoint.sh"         17 hours ago        Up 17 hours                                 ceph-rgw-overcloud-controller-2
e3b570004295        docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest                                    "/entrypoint.sh"         17 hours ago        Up 17 hours                                 ceph-mon-overcloud-controller-2

 Docker container set: haproxy-bundle [docker-registry.engineering.redhat.com/rhosp12/openstack-haproxy-docker:pcmklatest]
   haproxy-bundle-docker-0	(ocf::heartbeat:docker):	Started overcloud-controller-0
   haproxy-bundle-docker-1	(ocf::heartbeat:docker):	Started overcloud-controller-1
   haproxy-bundle-docker-2	(ocf::heartbeat:docker):	Stopped

Failed Actions:
* haproxy-bundle-docker-2_start_0 on overcloud-controller-2 'unknown error' (1): call=89, status=complete, exitreason='Newly created docker container exited after start',
    last-rc-change='Mon Nov  6 21:51:59 2017', queued=0ms, exec=9021ms

The radosgw service binds on all addresses:

[root@overcloud-controller-2 heat-admin]# ps axu | grep radosgw
ceph       10068  0.1  0.2 3800048 33436 ?       Ssl  Nov06   2:01 /usr/bin/radosgw --cluster ceph --setuser ceph --setgroup ceph -d -n client.rgw.overcloud-controller-2 -k /var/lib/ceph/radosgw/overcloud-controller-2/keyring --rgw-socket-path= --rgw-zonegroup= --rgw-zone= --rgw-frontends=civetweb port=8080
root      140381  0.0  0.0 112664   972 pts/0    S+   15:14   0:00 grep --color=auto radosgw
[root@overcloud-controller-2 heat-admin]# netstat -tupan | grep radosgw
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      10068/radosgw       
tcp        0      0 10.0.0.153:38624        10.0.0.149:6800         ESTABLISHED 10068/radosgw       
tcp        0      0 10.0.0.153:35116        10.0.0.155:6802         ESTABLISHED 10068/radosgw       
tcp        0      0 10.0.0.153:34206        10.0.0.149:6802         ESTABLISHED 10068/radosgw       
tcp        0      0 10.0.0.153:59340        10.0.0.142:6802         ESTABLISHED 10068/radosgw       
tcp        0      0 10.0.0.153:52438        10.0.0.153:6789         ESTABLISHED 10068/radosgw       
tcp        0      0 10.0.0.153:60062        10.0.0.155:6800         ESTABLISHED 10068/radosgw       
tcp        0      0 10.0.0.153:55256        10.0.0.142:6800         ESTABLISHED 10068/radosgw

Comment 12 Sébastien Han 2018-01-03 16:51:09 UTC

Yes, it is fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1498183

Comment 16 Harish NV Rao 2018-01-10 16:45:14 UTC

Providing the QA ack.

Comment 18 Yogev Rabl 2018-01-17 14:52:35 UTC

A deployment with rgw failed. The rgw contianer fail to start because port 8080 is already in use

civetweb: 0x55abc6c5adc0: set_ports_option: cannot bind to 172.17.3.20:8080: 98 (Address already in use)

Comment 19 Sébastien Han 2018-01-18 14:35:46 UTC

fix in https://github.com/ceph/ceph-ansible/releases/tag/v3.0.18

Comment 21 Yogev Rabl 2018-01-22 19:57:51 UTC

The verification failed with the same error with rhceph:ceph-2-rhel-7-docker-candidate-43803-20180119213048

Comment 22 Guillaume Abrioux 2018-01-23 13:07:08 UTC

Hi Yogev,

from what i've seen, rhceph:ceph-2-rhel-7-docker-candidate-43803-20180119213048 contains the fix for this issue.

Any chance we can access the environment where you tried to deploy this image ?

Comment 24 Yogev Rabl 2018-01-25 19:42:10 UTC

tested it with the latest image and it passed.

Comment 25 Yogev Rabl 2018-01-25 19:42:12 UTC

tested it with the latest image and it passed.

Comment 26 John Fulton 2018-01-31 19:00:37 UTC

*** Bug 1539192 has been marked as a duplicate of this bug. ***

Comment 29 Guillaume Abrioux 2018-02-16 00:23:23 UTC

Hi Aron,

Actually, this BZ was filled in Ceph Storage Product / Container Component but the actual solution here is a more a precision in the procedure for upgrading OSP11 -> OSP12.

I don't mind filling the Doc Text field but I'm not sure what will be the impact regarding this BZ affection.

Comment 32 errata-xmlrpc 2018-02-21 20:38:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0341

Comment 33 Giulio Fidente 2018-09-03 16:39:24 UTC

*** Bug 1536074 has been marked as a duplicate of this bug. ***