Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1241644 - After full reboot of the host running the compute nodes, cinder-volume stays with error: Unable to update stats, LVMVolumeDriver -3.0.0
After full reboot of the host running the compute nodes, cinder-volume stays ...
Status: CLOSED WONTFIX
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
urgent Severity urgent
: ---
: 10.0 (Newton)
Assigned To: James Slagle
Shai Revivo
: Triaged, ZStream
: 1242936 1245545 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-09 13:04 EDT by Udi Shkalim
Modified: 2018-04-22 07:52 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
When openstack-cinder-volume uses an LVM backend and the Overcloud nodes reboot, the file-backed loopback device is not recreated. As a workaround, manually recreate the loopback device: $ sudo losetup /dev/loop2 /var/lib/cinder/cinder-volumes Then restart openstack-cinder-volume. Note that openstack-cinder-volume only runs on one node at a time in a high availability cluster of Overcloud Controller nodes. However, the loopback device should exist on all nodes.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-14 14:01:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
tshefi: automate_bug-


Attachments (Terms of Use)
/var/log/ dir (8.64 MB, application/x-bzip)
2015-07-09 13:04 EDT, Udi Shkalim
no flags Details
cinder volume log (352.40 KB, text/plain)
2015-07-13 18:21 EDT, James Slagle
no flags Details

  None (edit)
Description Udi Shkalim 2015-07-09 13:04:28 EDT
Created attachment 1050371 [details]
/var/log/ dir

Description of problem:
After a full reboot of the host running the virtual environment Instances are in SHUTOFF state and can't be started or DELETED

Version-Release number of selected component (if applicable):
RHEL-OSP director puddle 7.0 RC - 2015-07-02.1

openstack-dashboard-2015.1.0-10.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-6.el7ost.noarch
openstack-ceilometer-api-2015.1.0-6.el7ost.noarch
openstack-nova-console-2015.1.0-14.el7ost.noarch
python-django-openstack-auth-1.2.0-3.el7ost.noarch
openstack-nova-compute-2015.1.0-14.el7ost.noarch
openstack-heat-api-2015.1.0-4.el7ost.noarch
openstack-nova-scheduler-2015.1.0-14.el7ost.noarch
openstack-neutron-lbaas-2015.1.0-5.el7ost.noarch
openstack-selinux-0.6.35-1.el7ost.noarch
openstack-nova-common-2015.1.0-14.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-6.el7ost.noarch
openstack-ceilometer-compute-2015.1.0-6.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch
openstack-nova-conductor-2015.1.0-14.el7ost.noarch
openstack-cinder-2015.1.0-3.el7ost.noarch
openstack-neutron-metering-agent-2015.1.0-10.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-puppet-modules-2015.1.7-5.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-neutron-common-2015.1.0-10.el7ost.noarch
openstack-dashboard-theme-2015.1.0-10.el7ost.noarch
openstack-ceilometer-common-2015.1.0-6.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-6.el7ost.noarch
openstack-heat-engine-2015.1.0-4.el7ost.noarch
openstack-neutron-ml2-2015.1.0-10.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-14.el7ost.noarch
openstack-neutron-openvswitch-2015.1.0-10.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-neutron-2015.1.0-10.el7ost.noarch
openstack-heat-common-2015.1.0-4.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch
openstack-nova-api-2015.1.0-14.el7ost.noarch
openstack-keystone-2015.1.0-4.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-ceilometer-central-2015.1.0-6.el7ost.noarch
openstack-nova-cert-2015.1.0-14.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy Overcloud and launch instances 
2. Reboot the baremetal host
3. Try to start or delete instances

Actual results:
Instances state is stuck in "powering-on" / "deleting"

Expected results:
Instances are in Active mode and can be deleted


Additional info:
Log directory on debug attached
Comment 3 Udi Shkalim 2015-07-12 14:03:52 EDT
It seems like that pacemaker failed to load after reboot:
[heat-admin@overcloud-controller-0 ~]$ systemctl status pcs
pcs.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dea

In the log dir you can find the pacemaker.log with all the errors.
Comment 4 James Slagle 2015-07-13 09:08:50 EDT
attempting to reproduce this locally
Comment 5 James Slagle 2015-07-13 18:04:16 EDT
using a virt env, i deployed an overcloud (1 controller, 1 compute). then i rebooted the virt host without gracefully shutting down either the undercloud (instack vm) or either of the overcloud nodes.

the only issue i saw when i brought everything back up manually (virsh start instack, then power on the 2 overcloud nodes), was the cinder-volume had an error:

2015-07-13 17:57:14.324 7682 WARNING cinder.volume.manager [req-a5f5bfb2-9bac-49ae-89cf-58ad2834fbf2 - - - - -] Unable to update stats, LVMVolumeDriver -3.0.0 (config name tripleo_iscsi) driver is uninitialized.

turns out this is because /etc/puppet/modules/cinder/manifests/setup_test_volume.pp is what's responsible for creating the file backed loopback device at /var/lib/cinder/cinder-volumes. Except that puppet doesn't run on reboot, only on an os-collect-config metadata change.

So it's not clear to me how this is supposed to work. Is this considered a bug in puppet-cinder? Shouldn't it make this mount permanent somehow such as writing it to /etc/fstab  so it's persisted?

Furthermore, you can't even just run a simple puppet apply against setup_test_volume.pp because the exec resources in that manifest are refreshonly=>true (you would have to manually delete /var/lib/cinder/cinder-volumes).
Comment 6 James Slagle 2015-07-13 18:05:50 EDT
Dan/Emilien, any thoughts about what the expected behavior should be?
Comment 7 Emilien Macchi 2015-07-13 18:11:36 EDT
My first thought won't help you but setup_test_volume.pp is really not intended for production, but just to test if you can create a volume.

Please provide more logs (enable DEBUG, VERBOSE) for cinder volume. Also please provide iscsoadm logs and everything related to iSCSI. I feel like a service is not starting well, or in the wrong order.

It might be Puppet related, yes. I would be happy to help with more logs.
Comment 8 James Slagle 2015-07-13 18:17:56 EDT
(In reply to Emilien Macchi from comment #7)
> My first thought won't help you but setup_test_volume.pp is really not
> intended for production, but just to test if you can create a volume.
> 
> Please provide more logs (enable DEBUG, VERBOSE) for cinder volume. Also
> please provide iscsoadm logs and everything related to iSCSI. I feel like a
> service is not starting well, or in the wrong order.
> 
> It might be Puppet related, yes. I would be happy to help with more logs.

i'll attach the cinder volume log, but this is quite obviously the problem b/c the lvm pv and vg do not even exist after the reboot, so the tripleo_iscsi cinder backend won't even activate.

i understand setup_test_volume  is intended for production. but if we're going to use it at all, even for POC's, it needs to support reboots.

what iscsi logs do you want?
Comment 9 James Slagle 2015-07-13 18:19:13 EDT
(In reply to James Slagle from comment #8)
> (In reply to Emilien Macchi from comment #7)
> > My first thought won't help you but setup_test_volume.pp is really not
> > intended for production, but just to test if you can create a volume.
> > 
> > Please provide more logs (enable DEBUG, VERBOSE) for cinder volume. Also
> > please provide iscsoadm logs and everything related to iSCSI. I feel like a
> > service is not starting well, or in the wrong order.
> > 
> > It might be Puppet related, yes. I would be happy to help with more logs.
> 
> i'll attach the cinder volume log, but this is quite obviously the problem
> b/c the lvm pv and vg do not even exist after the reboot, so the
> tripleo_iscsi cinder backend won't even activate.
> 
> i understand setup_test_volume  is intended for production. but if we're

*isn't intended for production
Comment 10 James Slagle 2015-07-13 18:21:17 EDT
Created attachment 1051541 [details]
cinder volume log
Comment 11 James Slagle 2015-07-13 18:22:11 EDT
attached cinder volume log. you can see the afore mentioned error about being unable to initialize the backend. after that I manually did the commands to setup the lvm volume group and was able to get the backend initialized
Comment 12 Emilien Macchi 2015-07-13 18:44:11 EDT
It's kind of duplicated of https://bugzilla.redhat.com/show_bug.cgi?id=971145

Even if we provide a Puppet patch, the lsofsetup is not persistent so you'll have to run Puppet at every boot. I'll rather investigate something ugly but that works: patch rc.local and mount the loopback.
Comment 14 James Slagle 2015-07-14 13:59:36 EDT
*** Bug 1242936 has been marked as a duplicate of this bug. ***
Comment 15 Emilien Macchi 2015-07-14 15:51:22 EDT
So let me summarize a bit the problem and a solution proposal I would like to give here.

We are using setup_test_volume.pp in the product to create the loop device where will be mounted Cinder volumes.
It actually uses lsofsetup [1] which is not persistent. Even if you have the mount point in /etc/fstab, it won't create the loopback device at boot.
You have to re-run the lsofsetup command again (with eventually -f option).
That's why Cinder Volume fails to start, because we use a loop device that is not created at boot.

We have two solutions I think:

* Manage the lsofsetup in /etc/rc.local to make sure we run the command at boot.
* Run Pupet again at boot to make sure the Puppet script is run (would require some change maybe, because it needs to be idempotent).

I prefer solution #1 but I can help in fixing that with both solution, please let me know.

[1] https://github.com/openstack/puppet-cinder/blob/master/manifests/setup_test_volume.pp#L42-L45
Comment 16 Mike Burns 2015-07-22 07:22:32 EDT
*** Bug 1245545 has been marked as a duplicate of this bug. ***
Comment 17 James Slagle 2015-10-14 15:06:16 EDT
(In reply to Emilien Macchi from comment #15)
> So let me summarize a bit the problem and a solution proposal I would like
> to give here.
> 
> We are using setup_test_volume.pp in the product to create the loop device
> where will be mounted Cinder volumes.
> It actually uses lsofsetup [1] which is not persistent. Even if you have the
> mount point in /etc/fstab, it won't create the loopback device at boot.
> You have to re-run the lsofsetup command again (with eventually -f option).
> That's why Cinder Volume fails to start, because we use a loop device that
> is not created at boot.
> 
> We have two solutions I think:
> 
> * Manage the lsofsetup in /etc/rc.local to make sure we run the command at
> boot.

I think the above is the right thing to do. Could we do this in puppet-cinder? It seems to me that's the right place for it. Isn't it expected that changes applied by puppet are consistent across reboots?

Otherwise, we could add something to the puppet manifests in tripleo-heat-templates, but that creates tight coupling between those manifests and the puppet-cinder implementation in setup_test_volume.pp. We'd basically have to reimplement most of that.
Comment 18 Emilien Macchi 2015-10-14 16:46:11 EDT
setup_test_volume.pp is an hack to create a loopback device and create cinder volumes on it.
IMHO setup_test_volume.pp should even not exist.

I would suggest we add something in puppet-tripleo that would take care of this configuration and also make sure it's persistent across reboots, using Puppet if needed. We would have to use a template or directly write in the file.

I'm not in favor of having this code in puppet-cinder because this is an hack but if you think this is the right place we can submit the code in there.
Comment 19 Christian Horn 2016-01-28 05:11:53 EST
bz1300721 might be a duplicate, can someone check?
Comment 20 Mike Burns 2016-04-07 16:43:53 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 22 Marian Krcmarik 2016-08-18 06:00:46 EDT
Ping, maybe we can try a fix which was present in packstack?
https://review.openstack.org/#/c/25997/
Comment 25 James Slagle 2016-09-21 08:52:24 EDT
do we know why customers are deploying with this configuration? Is it just because it's the default and they don't know to change it, or are they actually trying to use it in production?
Comment 27 Paul Grist 2016-10-14 14:01:05 EDT
Closing this one out. LVM is not a supported backend, so even if we see errors (and this was reported originally in 7) we would rather see those result in a manual fix or realization that LVM is not for production use (which is documented).  We will add more to other bug and are also looking at post-OSP10 plans for highlighting the unsupported fact more and other options.

Note You need to log in before you can comment on or make changes to this bug.