Bug 1572607
Summary: | Post FFWD upgrade, cinder volume creates using a ceph back end were failing. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Darin Sorrentino <dsorrent> | ||||||||
Component: | openstack-cinder | Assignee: | Cinder Bugs List <cinder-bugs> | ||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Avi Avraham <aavraham> | ||||||||
Severity: | unspecified | Docs Contact: | Kim Nylander <knylande> | ||||||||
Priority: | unspecified | ||||||||||
Version: | 13.0 (Queens) | CC: | abishop, dsorrent, geguileo, rrubins, scohen, srevivo, tshefi | ||||||||
Target Milestone: | --- | Flags: | dsorrent:
needinfo-
|
||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2018-05-30 13:56:54 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Darin Sorrentino
2018-04-27 12:02:20 UTC
Please provide the cinder logs (especially cinder-volume), with DEBUG enabled. This could very well a problem with the time on the controller nodes, they must be more or less in sync, otherwise the schedulers can't tell when a service is up. As Alan mentioned logs will allow us to tell if the volume service was actually running correctly or not. If it was running correctly the issue is NTP related and has nothing to do with Cinder. Ignore my comment on the time, from Cinder's perspective the issue has nothing to do with it, because we can see that when the service is down there is only an 8 seconds difference between the controller nodes. Since this could be a Cinder or Ceph issue we really need the logs. Created attachment 1430071 [details]
Cinder logs controller-0
Created attachment 1430072 [details]
Cinder logs controller-1
Created attachment 1430073 [details]
Cinder logs controller-2
(In reply to Alan Bishop from comment #1) > Please provide the cinder logs (especially cinder-volume), with DEBUG > enabled. The logs I have provided have not had DEBUG enabled. After time synchronization and restarting the Cinder Services, I do not have the issue. Do you still want me to enable DEBUG, restart services and create volumes even though it appears to be working? I encountered a similar issue to Darin's, also FFU (OSP10 -> OSP13) upgrade related. I have a 6 node overcloud ( 3 Controller + 3 ComputeHCI nodes). Prior to upgrade, I had 3 boot-from volume instances running. During ceph-upgrade portion of FFU, I lost connectivity to those vms. I was able to recover them by adding ceph_mgr iptables rules and rebooting nodes. At this time I have a containerized ceph 3 and osp13 on my overcloud nodes. 'openstack volume service list' shows cinder-volume hostgroup@tripleo_ceph service as enabled and up. I can create ceph volumes using 'rbd -p volumes create ...' just fine. But any attempt creating cinder volumes get stuck what appears to be at cinder-scheduler. Also, I have debug enabled, so I can provide any logs you wish to see. But here's a snippet from cinder-scheduler.log: http://pastebin.test.redhat.com/585345 Gorka and I appreciate the efforts being made to provide the debug data we need to troubleshoot the problem. What we need is a full set of cinder logs (all cinder services), with DEBUG enabled during the time that the problem occurs. This means DEBUG will need to be enabled prior to performing the FFU. The reason we need this is there are several possible explanations for the symptoms you are seeing. - The cinder-volume service is down, possibly because RBD driver thinks the Ceph cluster is unavailable/down - The cinder-volume service appears to be down, but only because of clock skew between the cinder services - The cinder-scheduler service is misbehaving (see bug #1560919 for another example) Need logs per my previous comment to make progress on this. Closing as insufficient data, feel free to re-open if problem still occours with fresh logs. Sean |