Bug 1352888
Summary: | [Upgrade]: on Ceph upgrade from 1.3.2 to 2.0 the RGW default zone setup is not working | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Tejas <tchandra> | ||||||||||
Component: | RGW | Assignee: | Orit Wasserman <owasserm> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Vasishta <vashastr> | ||||||||||
Severity: | high | Docs Contact: | Bara Ancincova <bancinco> | ||||||||||
Priority: | high | ||||||||||||
Version: | 2.0 | CC: | cbodley, ceph-eng-bugs, ceph-qe-bugs, hnallurv, kbader, kdreyer, mbenjamin, owasserm, sweil, tchandra, tserlin, uboppana, vashastr | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | 2.1 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | RHEL: ceph-10.2.3-2.el7cp Ubuntu: ceph_10.2.3-3redhat1xenial | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
.Bucket creation no longer fails after upgrading Red Hat Ceph Storage 1.3 to 2.0
Previously, after upgrading an Ceph Object Gateway node from Red Hat Ceph Storage 1.3 to 2.0, an attempt to create a bucket failed. This bug has been fixed, and bucket creation no longer fails in this case.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-11-22 19:28:17 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1322504, 1383917 | ||||||||||||
Attachments: |
|
Created attachment 1176399 [details]
rgw log with debug_rgw=20 debug_ms=5
can you try running: radosgw-admin zone modify --master --rgw-zone=default see if now you can create the buckets successfully? upstream fix: https://github.com/ceph/ceph/pull/10205 Tejas provided the qa ack. resetting need_info. Orit, The issue is fixed for sure. But the behaviour seems strange. We do a reboot after the upgrade, and the radosgw process comes up automatically after the reboot: root@magna080 ~]# ps -ef | grep ceph ceph 1445 1 0 14:26 ? 00:00:00 /usr/bin/radosgw -f --cluster ceph --name client.rgw.magna080 --setuser ceph --setgroup ceph ceph 3162 1 1 14:26 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph ceph 4034 1 1 14:26 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph ceph 4843 1 1 14:26 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph root@magna080 ~]# netstat -ntlp | grep :7480 tcp 0 0 0.0.0.0:7480 0.0.0.0:* LISTEN 1445/radosgw But the IO still fails. The moment I do a process restart of radosgw, the IO happens as expected. Any idea why we need an additional ps restart? Thanks, Tejas can you provide radosgw logs? It could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1352396 I have changed the owner to Ceph:ceph unlike https://bugzilla.redhat.com/show_bug.cgi?id=1352396 Attaching the radosgw log, Had enabled the debug prams before doing a process restart. Thanks, Tejas Created attachment 1180207 [details]
Rados gw log
Moving this to assigned state based on comment 20. Please see comment 24 for logs. It looks like there is a problem connecting to the monitor: 10.8.128.80:0/2525835698 submit_message mon_subscribe({osdmap=73}) v2 remote, 10.8.128.58:6789/0, failed lossy con, dropping message 0x7f7f480102d0 2016-07-15 14:26:27.010541 7f7f610e7700 0 monclient: hunting for new mon This is a different issue than the one described, could you open a new BZ? Orit, does that mon contact failure have anything to do with RGW anymore? Could be that the osds were not restarted after upgrade? Matt will arrange a meeting with Orit and QE the team After looking later in the log I see we created a new bucket bigbucket and added an object big.txt at 14:34:17. Maybe the IO was started too soon? Maybe you need to increase the timeout for the I/O ops? But like I said before this is not the same issue this should be a new BZ Created attachment 1182552 [details]
multipart upload script
hi, QE worked with Orit to repro this on a live setup.So this what we found from the meeting: 1. Created a 1.3.2 Ceph cluster with RGW on a separate node with IO in progress. 2. Stop the RGW process, and upgrade RGW, and reboot the node. 3. RGW process is running after the node comes up. 4. Bucket creation is failing. 5. Restart the RGW service. 6. Bucket creation works The rgw logs from today's testing is too big to be copied here. Please take a local copy of the log from here: root@magna080://var/log/ceph/ceph-client.rgw.magna080.log Thanks, Tejas (In reply to Tejas from comment #36) > hi, > > QE worked with Orit to repro this on a live setup.So this what we found > from the meeting: > 1. Created a 1.3.2 Ceph cluster with RGW on a separate node with IO in > progress. > 2. Stop the RGW process, and upgrade RGW, and reboot the node. > 3. RGW process is running after the node comes up. > 4. Bucket creation is failing. > 5. Restart the RGW service. > 6. Bucket creation works > > The rgw logs from today's testing is too big to be copied here. > Please take a local copy of the log from here: > root@magna080://var/log/ceph/ceph-client.rgw.magna080.log > > Thanks, > Tejas Thanks, I copied to my computer. Looks good, Orit Hi, After upgrading cluster to 2.0, I was able to create new bucket and run I/Os. so moving this bug to verified state. Regards, Vasishta Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2815.html |
Created attachment 1176390 [details] rgw log Description of problem: On a Ceph upgrade from 1.3.2 to 2.0, the RGW setup has changed. I am unable to do IO with the older RGW setup. Version-Release number of selected component (if applicable): ceph version 10.2.2-15.el7cp (60cd52496ca02bdde9c2f4191e617f75166d87b6) How reproducible: Always Steps to Reproduce: 1. Create as3 user and do some IO on a ceph 1.3.2 cluster. 2. Upgrade the ceph cluster to 2.0 3. The older setup is not working on upgrade. Additional info: I have attached the log files which has debug_rgw=20 debug_ms=5 RGW node: magna080