Bug 1347323

Summary: [Upgrade] radosgw upgrade from 1.3.2 to 2.0 fails on RHEL
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tejas <tchandra>
Component: RGWAssignee: Marcus Watts <mwatts>
Status: CLOSED ERRATA QA Contact: Tejas <tchandra>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0CC: cbodley, ceph-eng-bugs, hnallurv, kbader, kdreyer, kurs, mbenjamin, owasserm, sweil, tchandra
Target Milestone: rc   
Target Release: 2.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:41:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1343229    

Description Tejas 2016-06-16 13:57:41 UTC
Description of problem:
A radosgw client upgrade from 1.3.2 to 2.0 is failing.
We are unable to start the radosgw.service on the upgraded node.

Version-Release number of selected component (if applicable):
ceph version 10.2.2-2.el7cp (f1f313912893a3ecab6afbdc5690054dde9789fb)


How reproducible:
Always

Steps to Reproduce:
1. create  a 1.3.2 ceph cluster with rgw configured
2. Upgrade the cluster from 1.3.2 to 2.0
3. the rgw client upgrade fails

Expected results:
The client upgrade is expected to pass.

Additional info:

the 2.0 cluster is up and running:
[root@magna052 ~]# ceph osd tree
ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 5.39996 root default                                        
-2 1.79999     host magna058                                   
 0 0.89999         osd.0          up  1.00000          1.00000 
 1 0.89999         osd.1          up  1.00000          1.00000 
-3 1.79999     host magna077                                   
 2 0.89999         osd.2          up  1.00000          1.00000 
 3 0.89999         osd.3          up  1.00000          1.00000 
-4 1.79999     host magna080                                   
 4 0.89999         osd.4          up  1.00000          1.00000 
 5 0.89999         osd.5          up  1.00000          1.00000 

root@magna052 ~]# ceph -s
    cluster 6787d37b-a1b0-40f1-9e53-6774bda1e1ab
     health HEALTH_OK
     monmap e1: 3 mons at {magna046=10.8.128.46:6789/0,magna052=10.8.128.52:6789/0,magna058=10.8.128.58:6789/0}
            election epoch 20, quorum 0,1,2 magna046,magna052,magna058
     osdmap e89: 6 osds: 6 up, 6 in
      pgmap v1394: 272 pgs, 12 pools, 5763 MB data, 1555 objects
            18790 MB used, 5509 GB / 5527 GB avail
                 272 active+clean


Just after the rgw upgrade:
root@magna031 ~]# systemctl status ceph*
● ceph-radosgw.service - LSB: radosgw RESTful rados gateway
   Loaded: loaded (/etc/rc.d/init.d/ceph-radosgw)
   Active: active (exited) since Thu 2016-06-16 11:37:28 UTC; 56min ago
     Docs: man:systemd-sysv-generator(8)

Jun 16 11:37:28 magna031 systemd[1]: Starting LSB: radosgw RESTful rados gateway...
Jun 16 11:37:28 magna031 ceph-radosgw[17551]: Starting client.rgw.magna031...
Jun 16 11:37:28 magna031 ceph-radosgw[17551]: Running as unit run-17574.service.
Jun 16 11:37:28 magna031 systemd[1]: Started LSB: radosgw RESTful rados gateway.
Warning: ceph-radosgw.service changed on disk. Run 'systemctl daemon-reload' to reload units.

● ceph-radosgw.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2016-06-16 12:33:57 UTC; 4s ago
  Process: 19565 ExecStart=/usr/bin/radosgw -f --cluster ${CLUSTER} --name client.%i --setuser ceph --setgroup ceph (code=exited, status=5)
 Main PID: 19565 (code=exited, status=5)

Jun 16 12:33:57 magna031 systemd[1]: Unit ceph-radosgw.service entered failed state.
Jun 16 12:33:57 magna031 systemd[1]: ceph-radosgw.service failed.
Jun 16 12:33:57 magna031 systemd[1]: ceph-radosgw.service holdoff time over, scheduling restart.
Jun 16 12:33:57 magna031 systemd[1]: start request repeated too quickly for ceph-radosgw.service
Jun 16 12:33:57 magna031 systemd[1]: Failed to start Ceph rados gateway.
Jun 16 12:33:57 magna031 systemd[1]: Unit ceph-radosgw.service entered failed state.
Jun 16 12:33:57 magna031 systemd[1]: ceph-radosgw.service failed.



the log: file:


2016-06-16 12:33:55.876869 7f4c02ec99c0  0 RGWZoneParams::create(): error creating default zone params: (17) File exists
2016-06-16 12:33:56.020736 7f4c02ec99c0  0 starting handler: civetweb
2016-06-16 12:33:56.020800 7f4c02ec99c0  0 civetweb: 0x7f4c03127be0: set_ports_option: cannot bind to 80: 98 (Address already in use)
2016-06-16 12:33:56.020822 7f4c02ec99c0 -1 ERROR: failed run
2016-06-16 12:33:56.328878 7fd7a6c019c0  0 deferred set uid:gid to 167:167 (ceph:ceph)
2016-06-16 12:33:56.328922 7fd7a6c019c0  0 ceph version 10.2.2-2.el7cp (f1f313912893a3ecab6afbdc5690054dde9789fb), process radosgw, pid 19491
2016-06-16 12:33:56.351162 7fd7a6c019c0 -1 asok(0x7fd7a7cbae00) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.rgw.magna031.asok': (17) File exists

The issue seems to be civetweb is unable to bind to port 80.
The moment we changed these variables:

#[client.rgw.magna031]
#rgw_frontends = "civetweb port=80"

the service started:
root@magna031 ~]# ps -ef | grep rados
root     17581     1  0 11:37 ?        00:00:22 /bin/radosgw -n client.rgw.magna031
ceph     21297     1 11 13:56 ?        00:00:00 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna031 --setuser ceph --setgroup ceph
root     21484 16252  0 13:56 pts/1    00:00:00 grep --color=auto rados
[root@magna031 ~]# systemctl status ceph*
● ceph-radosgw.service - LSB: radosgw RESTful rados gateway
   Loaded: loaded (/etc/rc.d/init.d/ceph-radosgw)
   Active: active (exited) since Thu 2016-06-16 11:37:28 UTC; 2h 19min ago
     Docs: man:systemd-sysv-generator(8)

Jun 16 11:37:28 magna031 systemd[1]: Starting LSB: radosgw RESTful rados gateway...
Jun 16 11:37:28 magna031 ceph-radosgw[17551]: Starting client.rgw.magna031...
Jun 16 11:37:28 magna031 ceph-radosgw[17551]: Running as unit run-17574.service.
Jun 16 11:37:28 magna031 systemd[1]: Started LSB: radosgw RESTful rados gateway.
Warning: ceph-radosgw.service changed on disk. Run 'systemctl daemon-reload' to reload units.

● ceph-radosgw.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2016-06-16 13:56:57 UTC; 16s ago
 Main PID: 21297 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service
           └─21297 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna031 --setuser ceph --setgroup ceph

Jun 16 13:56:57 magna031 systemd[1]: Started Ceph rados gateway.
Jun 16 13:56:57 magna031 systemd[1]: Starting Ceph rados gateway...
Jun 16 13:56:57 magna031 radosgw[21297]: 2016-06-16 13:56:57.064652 7ff2d0faa9c0 -1 asok(0x7ff2d1ce3e00) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX...7) File exists
Hint: Some lines were ellipsized, use -l to show in full.

Comment 10 Tejas 2016-06-29 13:21:13 UTC
Verified in ceph:
ceph version 10.2.2-9.el7cp

Moving to Verified state.

Comment 12 errata-xmlrpc 2016-08-23 19:41:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html