| Summary: | [OSP 8.0 Bug]: Reboot errors unmounting NFS backed Cinder backends | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Dave Cain <dcain> | ||||||
| Component: | rhosp-director | Assignee: | Paul Grist <pgrist> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Tzach Shefi <tshefi> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 8.0 (Liberty) | CC: | byount, ctatman, dbecker, eharney, jcoufal, lkuchlan, lyarwood, mburns, morazi, pgrist, rhel-osp-director-maint, scohen, sreichar, tbarron, tim.darnell, tshefi, william.nguyen | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | 10.0 (Newton) | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-11-29 18:56:58 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Dave Cain
2016-03-15 02:13:16 UTC
Created attachment 1136347 [details]
Console output from node having problems unmounting remote NFS storage used for Cinder
Created attachment 1136348 [details]
Syslog output from host starting reboot until it rebooted 30 minutes later
Seems like Storage DFG but not sure. Can somebody from the group at least help to investigate where is the issue? There's a lot of info here, but this was director beta on 8. This just came over to storage and adding a number of people to take a look. Not sure this is still an issue or was reproduced after the beta. Tzach, do we have any recent NFS reboot tests to check if this is still an issue? If not is this something someone can try when NFS testing is being done? I moved it off the 10 list to be reviewed for the next release, but please bring it back if its still an issue. Paul, we don't have any nfs reboot testing that I know about. Bringing up a nfs based system to help Omri's Upgrade testing. I'll use that system to check this bug and report here. Keeping need info flag for tracking this. I've just tested this on a RHOS10 OSPD (2016-11-29.1) NFS backend for Glance and Cinder. Created bootable volume from Cirros, plus another empty volume. Booted three instances: From bootable volume From image+attached Cinder volume From image without adding cinder volume as reference. instances were spread on both compute nodes. On undercloud /stackrc and issued: nova reboot controller-0 (where volume service was running) Nova reboot compute-0 Nova reboot compute-1 All three worked, reset command rebooted them very fast a matter of a few seconds. Post procedure status of three server active and running. Instances were all in shutdown state, default expected behavior post compute node reboot. So this looks like a none issue at RHOS10. Paul - Should I check older versions 9 8 7? Tzach, thanks for the verification and testing on OSP10! Given the origin of this issue on director-8 beta, I think verifying on the current release is sufficient to close it out. Setting the state to mark this fixed in current release OSP10 rc. We are experiencing this same behavior (Glance NFS connection not unmounting during issue of a 'reboot' command from shell, then systemd shuts machine down 30 minutes later). We are using OSP10 GA release. We did get similar behavior as in comment 8 when issuing a 'nova reboot' of the node, so we see consistency there. However, it seems that issuing a 'nova reboot' of a node does not attempt a clean shutdown of the underlying operating system on the node, it issues a hard power of the node which could leave artifacts. Is this expected behavior when using 'nova reboot'? I (and I'm sure others) would like to see a graceful shutdown of the controller/compute node when issuing a reboot command without having to wait for 30 minutes. I'm not directly familiar with the underlying operation, but almost sounds like something is timing out on a clean shutdown or leads to a hard power reset. Will look for some nova help here, but you may want to ask you expected behavior question on the openstack-dev[nova] mailing list to get more info. (In reply to tim.darnell from comment #10) > However, it seems that issuing a 'nova reboot' of a node does not attempt a > clean shutdown of the underlying operating system on the node, it issues a > hard power of the node which could leave artifacts. > > Is this expected behavior when using 'nova reboot'? I (and I'm sure others) > would like to see a graceful shutdown of the controller/compute node when > issuing a reboot command without having to wait for 30 minutes. As this is an Ironic reboot it always attempts a hard reboot in Newton, the ability for it to preform soft reboots only landed last week in Ocata and will not be backported in OSP 10 : Ironic: Add soft reboot support to ironic driver https://review.openstack.org/#/c/403745/ |