Created attachment 1176390 [details] rgw log Description of problem: On a Ceph upgrade from 1.3.2 to 2.0, the RGW setup has changed. I am unable to do IO with the older RGW setup. Version-Release number of selected component (if applicable): ceph version 10.2.2-15.el7cp (60cd52496ca02bdde9c2f4191e617f75166d87b6) How reproducible: Always Steps to Reproduce: 1. Create as3 user and do some IO on a ceph 1.3.2 cluster. 2. Upgrade the ceph cluster to 2.0 3. The older setup is not working on upgrade. Additional info: I have attached the log files which has debug_rgw=20 debug_ms=5 RGW node: magna080
Created attachment 1176399 [details] rgw log with debug_rgw=20 debug_ms=5
can you try running: radosgw-admin zone modify --master --rgw-zone=default see if now you can create the buckets successfully?
upstream fix: https://github.com/ceph/ceph/pull/10205
Tejas provided the qa ack. resetting need_info.
Orit, The issue is fixed for sure. But the behaviour seems strange. We do a reboot after the upgrade, and the radosgw process comes up automatically after the reboot: root@magna080 ~]# ps -ef | grep ceph ceph 1445 1 0 14:26 ? 00:00:00 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna080 --setuser ceph --setgroup ceph ceph 3162 1 1 14:26 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph ceph 4034 1 1 14:26 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph ceph 4843 1 1 14:26 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph root@magna080 ~]# netstat -ntlp | grep :7480 tcp 0 0 0.0.0.0:7480 0.0.0.0:* LISTEN 1445/radosgw But the IO still fails. The moment I do a process restart of radosgw, the IO happens as expected. Any idea why we need an additional ps restart? Thanks, Tejas
can you provide radosgw logs?
It could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1352396
I have changed the owner to Ceph:ceph unlike https://bugzilla.redhat.com/show_bug.cgi?id=1352396 Attaching the radosgw log, Had enabled the debug prams before doing a process restart. Thanks, Tejas
Created attachment 1180207 [details] Rados gw log
Moving this to assigned state based on comment 20. Please see comment 24 for logs.
It looks like there is a problem connecting to the monitor: 10.8.128.80:0/2525835698 submit_message mon_subscribe({osdmap=73}) v2 remote, 10.8.128.58:6789/0, failed lossy con, dropping message 0x7f7f480102d0 2016-07-15 14:26:27.010541 7f7f610e7700 0 monclient: hunting for new mon This is a different issue than the one described, could you open a new BZ?
Orit, does that mon contact failure have anything to do with RGW anymore?
Could be that the osds were not restarted after upgrade?
Matt will arrange a meeting with Orit and QE the team
After looking later in the log I see we created a new bucket bigbucket and added an object big.txt at 14:34:17. Maybe the IO was started too soon? Maybe you need to increase the timeout for the I/O ops? But like I said before this is not the same issue this should be a new BZ
Created attachment 1182552 [details] multipart upload script
hi, QE worked with Orit to repro this on a live setup.So this what we found from the meeting: 1. Created a 1.3.2 Ceph cluster with RGW on a separate node with IO in progress. 2. Stop the RGW process, and upgrade RGW, and reboot the node. 3. RGW process is running after the node comes up. 4. Bucket creation is failing. 5. Restart the RGW service. 6. Bucket creation works The rgw logs from today's testing is too big to be copied here. Please take a local copy of the log from here: root@magna080://var/log/ceph/ceph-client.rgw.magna080.log Thanks, Tejas
(In reply to Tejas from comment #36) > hi, > > QE worked with Orit to repro this on a live setup.So this what we found > from the meeting: > 1. Created a 1.3.2 Ceph cluster with RGW on a separate node with IO in > progress. > 2. Stop the RGW process, and upgrade RGW, and reboot the node. > 3. RGW process is running after the node comes up. > 4. Bucket creation is failing. > 5. Restart the RGW service. > 6. Bucket creation works > > The rgw logs from today's testing is too big to be copied here. > Please take a local copy of the log from here: > root@magna080://var/log/ceph/ceph-client.rgw.magna080.log > > Thanks, > Tejas Thanks, I copied to my computer.
Looks good, Orit
Hi, After upgrading cluster to 2.0, I was able to create new bucket and run I/Os. so moving this bug to verified state. Regards, Vasishta
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2815.html