Red Hat Bugzilla – Bug 1241644
After full reboot of the host running the compute nodes, cinder-volume stays with error: Unable to update stats, LVMVolumeDriver -3.0.0
Last modified: 2018-04-22 07:52:58 EDT
Created attachment 1050371 [details] /var/log/ dir Description of problem: After a full reboot of the host running the virtual environment Instances are in SHUTOFF state and can't be started or DELETED Version-Release number of selected component (if applicable): RHEL-OSP director puddle 7.0 RC - 2015-07-02.1 openstack-dashboard-2015.1.0-10.el7ost.noarch openstack-ceilometer-notification-2015.1.0-6.el7ost.noarch openstack-ceilometer-api-2015.1.0-6.el7ost.noarch openstack-nova-console-2015.1.0-14.el7ost.noarch python-django-openstack-auth-1.2.0-3.el7ost.noarch openstack-nova-compute-2015.1.0-14.el7ost.noarch openstack-heat-api-2015.1.0-4.el7ost.noarch openstack-nova-scheduler-2015.1.0-14.el7ost.noarch openstack-neutron-lbaas-2015.1.0-5.el7ost.noarch openstack-selinux-0.6.35-1.el7ost.noarch openstack-nova-common-2015.1.0-14.el7ost.noarch openstack-ceilometer-collector-2015.1.0-6.el7ost.noarch openstack-ceilometer-compute-2015.1.0-6.el7ost.noarch openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch openstack-nova-conductor-2015.1.0-14.el7ost.noarch openstack-cinder-2015.1.0-3.el7ost.noarch openstack-neutron-metering-agent-2015.1.0-10.el7ost.noarch openstack-swift-container-2.3.0-1.el7ost.noarch python-openstackclient-1.0.3-2.el7ost.noarch openstack-puppet-modules-2015.1.7-5.el7ost.noarch openstack-swift-2.3.0-1.el7ost.noarch openstack-neutron-common-2015.1.0-10.el7ost.noarch openstack-dashboard-theme-2015.1.0-10.el7ost.noarch openstack-ceilometer-common-2015.1.0-6.el7ost.noarch openstack-ceilometer-alarm-2015.1.0-6.el7ost.noarch openstack-heat-engine-2015.1.0-4.el7ost.noarch openstack-neutron-ml2-2015.1.0-10.el7ost.noarch openstack-nova-novncproxy-2015.1.0-14.el7ost.noarch openstack-neutron-openvswitch-2015.1.0-10.el7ost.noarch openstack-swift-proxy-2.3.0-1.el7ost.noarch openstack-swift-account-2.3.0-1.el7ost.noarch openstack-swift-plugin-swift3-1.7-3.el7ost.noarch openstack-neutron-2015.1.0-10.el7ost.noarch openstack-heat-common-2015.1.0-4.el7ost.noarch openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch openstack-nova-api-2015.1.0-14.el7ost.noarch openstack-keystone-2015.1.0-4.el7ost.noarch openstack-swift-object-2.3.0-1.el7ost.noarch redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch openstack-ceilometer-central-2015.1.0-6.el7ost.noarch openstack-nova-cert-2015.1.0-14.el7ost.noarch openstack-glance-2015.1.0-6.el7ost.noarch openstack-utils-2014.2-1.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy Overcloud and launch instances 2. Reboot the baremetal host 3. Try to start or delete instances Actual results: Instances state is stuck in "powering-on" / "deleting" Expected results: Instances are in Active mode and can be deleted Additional info: Log directory on debug attached
It seems like that pacemaker failed to load after reboot: [heat-admin@overcloud-controller-0 ~]$ systemctl status pcs pcs.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dea In the log dir you can find the pacemaker.log with all the errors.
attempting to reproduce this locally
using a virt env, i deployed an overcloud (1 controller, 1 compute). then i rebooted the virt host without gracefully shutting down either the undercloud (instack vm) or either of the overcloud nodes. the only issue i saw when i brought everything back up manually (virsh start instack, then power on the 2 overcloud nodes), was the cinder-volume had an error: 2015-07-13 17:57:14.324 7682 WARNING cinder.volume.manager [req-a5f5bfb2-9bac-49ae-89cf-58ad2834fbf2 - - - - -] Unable to update stats, LVMVolumeDriver -3.0.0 (config name tripleo_iscsi) driver is uninitialized. turns out this is because /etc/puppet/modules/cinder/manifests/setup_test_volume.pp is what's responsible for creating the file backed loopback device at /var/lib/cinder/cinder-volumes. Except that puppet doesn't run on reboot, only on an os-collect-config metadata change. So it's not clear to me how this is supposed to work. Is this considered a bug in puppet-cinder? Shouldn't it make this mount permanent somehow such as writing it to /etc/fstab so it's persisted? Furthermore, you can't even just run a simple puppet apply against setup_test_volume.pp because the exec resources in that manifest are refreshonly=>true (you would have to manually delete /var/lib/cinder/cinder-volumes).
Dan/Emilien, any thoughts about what the expected behavior should be?
My first thought won't help you but setup_test_volume.pp is really not intended for production, but just to test if you can create a volume. Please provide more logs (enable DEBUG, VERBOSE) for cinder volume. Also please provide iscsoadm logs and everything related to iSCSI. I feel like a service is not starting well, or in the wrong order. It might be Puppet related, yes. I would be happy to help with more logs.
(In reply to Emilien Macchi from comment #7) > My first thought won't help you but setup_test_volume.pp is really not > intended for production, but just to test if you can create a volume. > > Please provide more logs (enable DEBUG, VERBOSE) for cinder volume. Also > please provide iscsoadm logs and everything related to iSCSI. I feel like a > service is not starting well, or in the wrong order. > > It might be Puppet related, yes. I would be happy to help with more logs. i'll attach the cinder volume log, but this is quite obviously the problem b/c the lvm pv and vg do not even exist after the reboot, so the tripleo_iscsi cinder backend won't even activate. i understand setup_test_volume is intended for production. but if we're going to use it at all, even for POC's, it needs to support reboots. what iscsi logs do you want?
(In reply to James Slagle from comment #8) > (In reply to Emilien Macchi from comment #7) > > My first thought won't help you but setup_test_volume.pp is really not > > intended for production, but just to test if you can create a volume. > > > > Please provide more logs (enable DEBUG, VERBOSE) for cinder volume. Also > > please provide iscsoadm logs and everything related to iSCSI. I feel like a > > service is not starting well, or in the wrong order. > > > > It might be Puppet related, yes. I would be happy to help with more logs. > > i'll attach the cinder volume log, but this is quite obviously the problem > b/c the lvm pv and vg do not even exist after the reboot, so the > tripleo_iscsi cinder backend won't even activate. > > i understand setup_test_volume is intended for production. but if we're *isn't intended for production
Created attachment 1051541 [details] cinder volume log
attached cinder volume log. you can see the afore mentioned error about being unable to initialize the backend. after that I manually did the commands to setup the lvm volume group and was able to get the backend initialized
It's kind of duplicated of https://bugzilla.redhat.com/show_bug.cgi?id=971145 Even if we provide a Puppet patch, the lsofsetup is not persistent so you'll have to run Puppet at every boot. I'll rather investigate something ugly but that works: patch rc.local and mount the loopback.
*** Bug 1242936 has been marked as a duplicate of this bug. ***
So let me summarize a bit the problem and a solution proposal I would like to give here. We are using setup_test_volume.pp in the product to create the loop device where will be mounted Cinder volumes. It actually uses lsofsetup [1] which is not persistent. Even if you have the mount point in /etc/fstab, it won't create the loopback device at boot. You have to re-run the lsofsetup command again (with eventually -f option). That's why Cinder Volume fails to start, because we use a loop device that is not created at boot. We have two solutions I think: * Manage the lsofsetup in /etc/rc.local to make sure we run the command at boot. * Run Pupet again at boot to make sure the Puppet script is run (would require some change maybe, because it needs to be idempotent). I prefer solution #1 but I can help in fixing that with both solution, please let me know. [1] https://github.com/openstack/puppet-cinder/blob/master/manifests/setup_test_volume.pp#L42-L45
*** Bug 1245545 has been marked as a duplicate of this bug. ***
(In reply to Emilien Macchi from comment #15) > So let me summarize a bit the problem and a solution proposal I would like > to give here. > > We are using setup_test_volume.pp in the product to create the loop device > where will be mounted Cinder volumes. > It actually uses lsofsetup [1] which is not persistent. Even if you have the > mount point in /etc/fstab, it won't create the loopback device at boot. > You have to re-run the lsofsetup command again (with eventually -f option). > That's why Cinder Volume fails to start, because we use a loop device that > is not created at boot. > > We have two solutions I think: > > * Manage the lsofsetup in /etc/rc.local to make sure we run the command at > boot. I think the above is the right thing to do. Could we do this in puppet-cinder? It seems to me that's the right place for it. Isn't it expected that changes applied by puppet are consistent across reboots? Otherwise, we could add something to the puppet manifests in tripleo-heat-templates, but that creates tight coupling between those manifests and the puppet-cinder implementation in setup_test_volume.pp. We'd basically have to reimplement most of that.
setup_test_volume.pp is an hack to create a loopback device and create cinder volumes on it. IMHO setup_test_volume.pp should even not exist. I would suggest we add something in puppet-tripleo that would take care of this configuration and also make sure it's persistent across reboots, using Puppet if needed. We would have to use a template or directly write in the file. I'm not in favor of having this code in puppet-cinder because this is an hack but if you think this is the right place we can submit the code in there.
bz1300721 might be a duplicate, can someone check?
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Ping, maybe we can try a fix which was present in packstack? https://review.openstack.org/#/c/25997/
do we know why customers are deploying with this configuration? Is it just because it's the default and they don't know to change it, or are they actually trying to use it in production?
Closing this one out. LVM is not a supported backend, so even if we see errors (and this was reported originally in 7) we would rather see those result in a manual fix or realization that LVM is not for production use (which is documented). We will add more to other bug and are also looking at post-OSP10 plans for highlighting the unsupported fact more and other options.