Created attachment 1239941 [details] Cinder and pacemaker log from Controller running CInder. Description of problem: After rebooting physical host with virt UC OC + computes Fail to create Cinder volumes, volume status is error for any new volumes I create. This bug might be more Cinder that OSPD. Cinder volume logs reports 2017-01-12 13:46:45.448 15732 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-01-12 13:46:49.839 15732 WARNING cinder.volume.manager [req-17e81eda-c5b0-40ff-b853-9b65e4dba8ef - - - - -] Update driver status failed: (config name tripleo_iscsi) is uninitialized. 2017-01-12 13:46:55.453 15732 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". Version-Release number of selected component (if applicable): RHEL7.3 RHOS9 -> 9 -p 2017-01-05.2 openstack-tripleo-0.0.8-0.2.d81bd6dgit.el7ost.noarch python-tripleoclient-2.0.0-3.el7ost.noarch openstack-tripleo-puppet-elements-2.0.0-5.el7ost.noarch openstack-tripleo-heat-templates-liberty-2.0.0-42.el7ost.noarch openstack-tripleo-image-elements-0.9.9-7.el7ost.noarch openstack-tripleo-heat-templates-2.0.0-42.el7ost.noarch openstack-tripleo-common-2.0.0-8.el7ost.noarch Controller Cinder versions python-cinderclient-1.6.0-1.el7ost.noarch openstack-cinder-8.1.1-4.el7ost.noarch python-cinder-8.1.1-4.el7ost.noarch How reproducible: Unsure first time I hit this. Steps to Reproduce: 1. Infrared installed virt 3 controller + 2 compute nodes. 2. Cinder create volume from image worked fine. 3. Rebooted whole physical host including all VMs on top (UC OC + computes) 4. Now I can't create a volume either empty of from image +--------------------------------------+-----------+------------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+------------+------+-------------+----------+-------------+ | 6e57e673-5478-49b0-a8b8-8ed803fcaf1a | available | cirros_vol | 1 | - | true | | | 9898c6ed-b2e1-4062-ae32-d4cb656f717f | error | - | 1 | - | false | | +--------------------------------------+-----------+------------+------+-------------+----------+-------------+ Actual results: Failed to create Cinder volumes post reboot. Expected results: Recovery from reboot should return everything to working order. Including Cinder create volume. Additional info: Not sure this might be related/clone of this bz https://bugzilla.redhat.com/show_bug.cgi?id=1340589
Doesn't block 1273226, which has since been verified by Mike.
How reproducible: -> every time. I had just rebooted another physical host with RHOS9 virt env. Before rebooting was able to create Cinder volumes. After rebooting physical host, same error mentioned in bug description. Cinder volume.log sks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215 2017-01-22 10:21:30.945 13705 WARNING cinder.volume.manager [req-54508272-7c4a-45cc-a734-1b3713a0bff8 - - - - -] Update driver status failed: (config name tripleo_iscsi) is uninitialized. 2017-01-22 10:21:40.096 13705 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-01-22 10:21:50.104 13705 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-01-22 10:22:00.107 13705 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-01-22 10:22:10.119 13705 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-01-22 10:22:20.127 13705 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-01-22 10:22:30.131 13705 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-01-22 10:22:30.946 13705 DEBUG oslo_service.periodic_task [req-54508272-7c4a-45cc-a734-1b3713a0bff8 - - - - -] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
Same issue also happens on RHOS10 Version: python-cinderclient-1.9.0-4.el7ost.noarch puppet-cinder-9.4.1-3.el7ost.noarch openstack-cinder-9.1.1-3.el7ost.noarch python-cinder-9.1.1-3.el7ost.noarch
Move to Cinder and target OSP12 initially.
Paul, Lets wait for Eric's insight on this to be sure. Gut feeling it isn't a Cinder problem per se, more of a possible OSPD issue, or some other sub system/service. I've just created two cinder volumes on a packstack OSP9 deployment, one before and one after rebooting the server, without this issue. It seams this only hits OPSD deployments, that's why I opened issue on OSPD rather than on Cinder. It might only happen on virt OSPD systems also a possibility.
If I'm not mistaken the issue is that the tests are using the LVM driver with the volumes backed up by a loopback device, and after reboot the loopback doesn't exist so the Volume Group cannot be found. Without the VG the driver cannot initialize and Cinder will detect the error with the driver and not report heartbeats to avoid receiving volume creation requests, thus the service being down. This is not a Cinder issue, since it is not responsible of making the LVM persistent, configured LVM VG in cinder.conf should always be there. Here's the error from the logs: 2017-01-12 09:28:37.650 15732 ERROR cinder.volume.manager Stderr: u'File descriptor 10 (/dev/urandom) leaked on vgs invocation. Parent PID 15734: /usr/bin/python2\n Volume group "cinder-volumes" not found\n Cannot process volume group cinder-volumes\n'
See related bug in Director: https://bugzilla.redhat.com/show_bug.cgi?id=1271266
Fix has been merged upstream.
I just tired this on puppet-cinder-11.2.0-0.20170628010853.abddb13.el7ost.noarch On a virt OSPD, cinder (lvm) create worked. Rebooted host, waited a few minutes. Cinder create still fails (should be OK) /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager [req-ba0316dd-a403-4ecf-9657-47d02a2e87f5 - - - - -] Failed to initialize driver.: ProcessExecutionError: Unexpected error while running command. /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager Traceback (most recent call last): /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 443, in init_host /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager self.driver.check_for_setup_error() /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 301, in check_for_setup_error /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager self.configuration.lvm_suppress_fd_warnings)) /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 105, in __init__ /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager if self._vg_exists() is False: /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 138, in _vg_exists /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager run_as_root=True) /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 49, in _execute /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager result = self.__execute(*args, **kwargs) /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 123, in execute /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager return processutils.execute(*cmd, **kwargs) /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 400, in execute /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager cmd=sanitized_cmd) /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command. /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C vgs --noheadings -o name cinder-volumes /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager Exit code: 5 /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager Stdout: u'' /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager Stderr: u'File descriptor 12 (/dev/urandom) leaked on vgs invocation. Parent PID 11927: /usr/bin/python2\n Volume group "cinder-volumes" not found\n Cannot process volume group cinder-volumes\n' /var/log/cinder/volume.log:2017-07-04 14:05:31.049 11917 ERROR cinder.volume.manager /var/log/cinder/volume.log:2017-07-04 14:06:31.483 11917 ERROR cinder.utils [req-ba0316dd-a403-4ecf-9657-47d02a2e87f5 - - - - -] Volume driver LVMVolumeDriver not initialized /var/log/cinder/volume.log:2017-07-04 14:06:31.484 11917 ERROR cinder.volume.manager [req-ba0316dd-a403-4ecf-9657-47d02a2e87f5 - - - - -] Cannot complete RPC initialization because driver isn't initialized properly.: DriverNotInitialized: Volume driver not ready. /var/log/cinder/volume.log:2017-07-04 14:06:41.487 11917 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". /var/log/cinder/volume.log:2017-07-04 14:06:51.492 11917 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". /var/log/cinder/volume.log:2017-07-04 14:07:01.496 11917 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". /var/log/cinder/volume.log:2017-07-04 14:07:11.499 11917 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down".
I verified the tests Tzach performed did not confirm the fix, and repeated the results on my own system. The fix is not working, and needs to be revisited.
I have a case with a customer experiencing the same behavior on Red Hat OpenStack 10. I'll add more information as soon as i get sosreports. For now this is all i have: When i restart Overcloud node is have error and can't create cinder volum log: 2017-07-13 00:56:56.626 11298 WARNING cinder.volume.manager [req-2aeddf89-8d2a-474c-8ba7-9c40bff04e52 - - - - -] Update driver status failed: (config name tripleo_iscsi) is uninitialized. 2017-07-13 00:56:59.986 11298 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-07-13 00:57:09.996 11298 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-07-13 00:57:20.002 11298 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-07-13 00:57:30.012 11298 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-07-13 00:57:40.023 11298 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down". 2017-07-13 00:57:50.028 11298 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down".
Siggy, here's how you can get the customer back online. On the controller node: 1) Execute "sudo losetup -f /var/lib/cinder/cinder-volumes" 2) Wait a couple of seconds, and run "vgdisplay" to verify the "cinder-volumes" LVM group exists. 3) Restart the cinder-volume service
Hi, I have same experience on Virtual machine director deployed OSP10 environment. I just add rc.local to solve this. [root@overcloud-controller-0 rc.d]# pvs PV VG Fmt Attr PSize PFree /dev/loop0 cinder-volumes lvm2 a-- 10.04g 10.04g /dev/loop1 cinder-volumes lvm2 a-- 100.00g 70.00g [root@overcloud-controller-0 rc.d]# vgs VG #PV #LV #SN Attr VSize VFree cinder-volumes 2 1 0 wz--n- 110.03g 80.03g [root@overcloud-controller-0 rc.d]# [root@overcloud-controller-0 heat-admin]# ls -al /dev/loop* brw-rw----. 1 root disk 7, 3 Aug 30 02:23 /dev/loop3 brw-rw----. 1 root disk 7, 4 Aug 30 02:23 /dev/loop4 crw-rw----. 1 root disk 10, 237 Aug 30 02:17 /dev/loop-control [root@overcloud-controller-0 heat-admin]# cat /etc/rc.local #!/bin/bash losetup -fv /var/lib/cinder/cinder-volumes losetup -fv /var/lib/cinder/cinder-volumes1 [root@overcloud-controller-0 heat-admin]# chmod +x /etc/rc.d/rc.local
Reproduced with using latest master. Workaround: From the controller: sudo losetup -f /var/lib/cinder/cinder-volumes sudo vgdisplay sudo service openstack-cinder-volume restart
it happens to me in RHOSP10 [stack@telefonica-undercloud10 ~]$ openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | internal | available | | nova | available | | nova | available | | nova | available | | nova | available | +-----------+-------------+ [stack@telefonica-undercloud10 ~]$ openstack volume create --image 869c8e06-3848-48a3-a6a3-274c17f4c16c --size 8 --availability-zone nova my-new-volume +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2018-07-09T15:14:58.790872 | | description | None | | encrypted | False | | id | 3fd2db5a-274b-4dec-8952-7e39cacc0def | | migration_status | None | | multiattach | False | | name | my-new-volume | | properties | | | replication_status | disabled | | size | 8 | | snapshot_id | None | | source_volid | None | | status | creating | | type | None | | updated_at | None | | user_id | 0d6ceffde01d4f3dad7fd00474d25e52 | +---------------------+--------------------------------------+ [stack@telefonica-undercloud10 ~]$ openstack volume list +--------------------------------------+---------------+--------+------+-------------+ | ID | Display Name | Status | Size | Attached to | +--------------------------------------+---------------+--------+------+-------------+ | 3fd2db5a-274b-4dec-8952-7e39cacc0def | my-new-volume | error | 8 | | +--------------------------------------+---------------+--------+------+-------------+ https://bugzilla.redhat.com/show_bug.cgi?id=1412661#c15 resolve the situation to create a volume [stack@telefonica-undercloud10 ~]$ openstack volume create --image 869c8e06-3848-48a3-a6a3-274c17f4c16c --size 8 --availability-zone nova my-new-volume1 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2018-07-09T15:26:48.900167 | | description | None | | encrypted | False | | id | 58537856-dd59-4161-9285-962c6bb2defb | | migration_status | None | | multiattach | False | | name | my-new-volume1 | | properties | | | replication_status | disabled | | size | 8 | | snapshot_id | None | | source_volid | None | | status | creating | | type | None | | updated_at | None | | user_id | 0d6ceffde01d4f3dad7fd00474d25e52 | +---------------------+--------------------------------------+ [stack@telefonica-undercloud10 ~]$ openstack volume list +--------------------------------------+----------------+-----------+------+-------------+ | ID | Display Name | Status | Size | Attached to | +--------------------------------------+----------------+-----------+------+-------------+ | 58537856-dd59-4161-9285-962c6bb2defb | my-new-volume1 | available | 8 | | | 3fd2db5a-274b-4dec-8952-7e39cacc0def | my-new-volume | error | 8 | | +--------------------------------------+----------------+-----------+------+-------------+ rpm of the rhosp10 root@overcloud-controller-0 ~]# rpm -qa | grep cinder puppet-cinder-9.5.0-3.el7ost.noarch python-cinderclient-1.9.0-6.el7ost.noarch openstack-cinder-9.1.4-12.el7ost.noarch python-cinder-9.1.4-12.el7ost.noarch
reproduced with OSP13, any plan on fixing it or how to avoid this in production?
This may be fixed when time permits, but has not been a priority because Red Hat does not support the Cinder iSCSI / LVM backend in production.
Published KCS just in case https://access.redhat.com/solution/3524681
Fresh attempt to fix this in THT (puppet-cinder is no longer a viable option now that services run in containers).
Patch has merged on upstream master.
The upstream patch will not be ready for inclusion in z3.
Verified on: openstack-tripleo-heat-templates-8.0.7-15.el7ost.noarch Created an LVM backed volume (191af544-..), available state. Rebooted controller. Create a second LVM volume (b053e8d7-..) also available. Both are available, no need to "restore" LVM to active state after controller reboot as before the fix. Before creating the second volume, looking good to verify. (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+------+------+-------------+----------+-------------+ | 191af544-0e8c-4666-9252-81934c311e82 | available | - | 1 | - | false | | | b053e8d7-5ab2-4a48-89d6-e3869e14e569 | available | - | 1 | - | false | | +--------------------------------------+-----------+------+------+-------------+----------+-------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0068