Bug 1340589 - rhel-osp-director: after rebooting the overcloud with cinder node, not able to create cinder volume.the target service is down on the cinder node,
Summary: rhel-osp-director: after rebooting the overcloud with cinder node, not able ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: async
: 8.0 (Liberty)
Assignee: Angus Thomas
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-28 05:19 UTC by Alexander Chuzhoy
Modified: 2017-02-28 22:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-28 22:07:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alexander Chuzhoy 2016-05-28 05:19:07 UTC
rhel-osp-director:  after rebooting the overcloud with cinder node, not able to create cinder volume.the target service is down on the cinder node,


Environment:
instack-undercloud-2.2.7-7.el7ost.noarch
openstack-tripleo-heat-templates-0.8.14-13.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-13.el7ost.noarch
openstack-puppet-modules-7.0.19-1.el7ost.noarch


Steps to reproduce:
1. Deploy 7.3 with 1 cinder node
deployment command:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 0 --swift-storage-scale 0 --block-storage-scale 1 --neutron-tunnel-types vxlan,gre --neutron-network-type vxlan,gre --neutron-network-vlan-ranges datacentre:118:143 --neutron-bridge-mappings datacentre:br-ex  --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e ~/ssl-heat-templates/environments/enable-tls.yaml -e ~/ssl-heat-templates/environments/inject-trust-anchor.yaml

2. Upgrade to 8.0
3. reboot the entire setup
4. try to create a cinder volume upon boot


Result:
The created volume is in error state.
The following appears in /var/log/cinder/cinder-manage.log on a controller:
2016-05-28 05:01:38.734 19292 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down".
2016-05-28 05:01:48.744 19292 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down".
2016-05-28 05:01:58.753 19292 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down".
2016-05-28 05:02:08.757 19292 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down".

Checking the cinder node:
The target service is down and won't start:

-- Logs begin at Sat 2016-05-28 00:25:53 UTC, end at Sat 2016-05-28 05:10:02 UTC. --                                                    
May 28 00:56:03 overcloud-blockstorage-0.localdomain systemd[1]: Starting Restore LIO kernel target configuration...                    
May 28 00:56:03 overcloud-blockstorage-0.localdomain target[15233]: No saved config file at /etc/target/saveconfig.json, ok, exiting    
May 28 00:56:03 overcloud-blockstorage-0.localdomain systemd[1]: Started Restore LIO kernel target configuration.                       
-- Reboot --                                                                                                                            
May 28 02:49:51 overcloud-blockstorage-0.localdomain systemd[1]: Starting Restore LIO kernel target configuration...                    
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: Traceback (most recent call last):                                   
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: File "/usr/bin/targetctl", line 82, in <module>                      
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: main()                                                               
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: File "/usr/bin/targetctl", line 79, in main                          
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: funcs[sys.argv[1]](savefile)                                         
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: File "/usr/bin/targetctl", line 47, in restore                       
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: errors = RTSRoot().restore_from_file(restore_file=from_file)         
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: File "/usr/lib/python2.7/site-packages/rtslib_fb/root.py", line 267, in restore_from_file
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: config = json.loads(f.read())                                                            
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads                         
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: return _default_decoder.decode(s)                                                        
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: File "/usr/lib64/python2.7/json/decoder.py", line 365, in decode
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: File "/usr/lib64/python2.7/json/decoder.py", line 383, in raw_decode
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: raise ValueError("No JSON object could be decoded")
May 28 02:49:55 overcloud-blockstorage-0.localdomain target[1395]: ValueError: No JSON object could be decoded
May 28 02:49:55 overcloud-blockstorage-0.localdomain systemd[1]: target.service: main process exited, code=exited, status=1/FAILURE
May 28 02:49:55 overcloud-blockstorage-0.localdomain systemd[1]: Failed to start Restore LIO kernel target configuration.
May 28 02:49:55 overcloud-blockstorage-0.localdomain systemd[1]: Unit target.service entered failed state.
May 28 02:49:55 overcloud-blockstorage-0.localdomain systemd[1]: target.service failed.
May 28 05:07:30 overcloud-blockstorage-0.localdomain systemd[1]: Starting Restore LIO kernel target configuration...
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: Traceback (most recent call last):
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: File "/usr/bin/targetctl", line 82, in <module>
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: main()
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: File "/usr/bin/targetctl", line 79, in main
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: funcs[sys.argv[1]](savefile)
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: File "/usr/bin/targetctl", line 47, in restore
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: errors = RTSRoot().restore_from_file(restore_file=from_file)
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: File "/usr/lib/python2.7/site-packages/rtslib_fb/root.py", line 267, in restore_from_file
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: config = json.loads(f.read())
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: return _default_decoder.decode(s)
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: File "/usr/lib64/python2.7/json/decoder.py", line 365, in decode
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: File "/usr/lib64/python2.7/json/decoder.py", line 383, in raw_decode
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: raise ValueError("No JSON object could be decoded")
May 28 05:07:30 overcloud-blockstorage-0.localdomain target[11294]: ValueError: No JSON object could be decoded
May 28 05:07:30 overcloud-blockstorage-0.localdomain systemd[1]: target.service: main process exited, code=exited, status=1/FAILURE
May 28 05:07:30 overcloud-blockstorage-0.localdomain systemd[1]: Failed to start Restore LIO kernel target configuration.
May 28 05:07:30 overcloud-blockstorage-0.localdomain systemd[1]: Unit target.service entered failed state.
May 28 05:07:30 overcloud-blockstorage-0.localdomain systemd[1]: target.service failed.
May 28 05:09:51 overcloud-blockstorage-0.localdomain systemd[1]: Starting Restore LIO kernel target configuration...
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: Traceback (most recent call last):
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: File "/usr/bin/targetctl", line 82, in <module>
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: main()
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: File "/usr/bin/targetctl", line 79, in main
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: funcs[sys.argv[1]](savefile)
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: File "/usr/bin/targetctl", line 47, in restore
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: errors = RTSRoot().restore_from_file(restore_file=from_file)
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: File "/usr/lib/python2.7/site-packages/rtslib_fb/root.py", line 267, in restore_from_file
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: config = json.loads(f.read())
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: return _default_decoder.decode(s)
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: File "/usr/lib64/python2.7/json/decoder.py", line 365, in decode
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: File "/usr/lib64/python2.7/json/decoder.py", line 383, in raw_decode
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: raise ValueError("No JSON object could be decoded")
May 28 05:09:51 overcloud-blockstorage-0.localdomain target[11324]: ValueError: No JSON object could be decoded
May 28 05:09:51 overcloud-blockstorage-0.localdomain systemd[1]: target.service: main process exited, code=exited, status=1/FAILURE
May 28 05:09:51 overcloud-blockstorage-0.localdomain systemd[1]: Failed to start Restore LIO kernel target configuration.
May 28 05:09:51 overcloud-blockstorage-0.localdomain systemd[1]: Unit target.service entered failed state.
May 28 05:09:51 overcloud-blockstorage-0.localdomain systemd[1]: target.service failed.


ran targetcli without modifications and exited. 
Only then was able to start the target service.

Still unable to create new volumes,
the following appears in /var/log/cinder/volume.log on the cinder node.

2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager [req-23461e1b-bf77-428c-85c7-d88878c122ce - - - - -] Failed to initialize driver.
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager Traceback (most recent call last):
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 368, in init_host
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager     self.driver.check_for_setup_error()
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager     return f(*args, **kwargs)
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 269, in check_for_setup_error
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager     lvm_conf=lvm_conf_file)
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 86, in __init__
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager     if self._vg_exists() is False:
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 123, in _vg_exists
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager     run_as_root=True)
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 155, in execute
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager     return processutils.execute(*cmd, **kwargs)
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 275, in execute
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager     cmd=sanitized_cmd)
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C vgs --noheadings -o name cinder-volumes
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager Exit code: 5
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager Stdout: u''
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager Stderr: u'  Volume group "cinder-volumes" not found\n  Cannot process volume group cinder-volumes\n'
2016-05-28 05:16:56.558 11464 ERROR cinder.volume.manager



Expected result:
Able to created cinder volumes with no issues.

Comment 3 Alexander Chuzhoy 2016-05-28 15:00:19 UTC
The issue reproduced when deployed without a cinder node.

 Deployment command: openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 0 --swift-storage-scale 0 --block-storage-scale 0 --neutron-tunnel-types vxlan,gre --neutron-network-type vxlan,gre --neutron-network-vlan-ranges datacentre:118:143 --neutron-bridge-mappings datacentre:br-ex  --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e ~/ssl-heat-templates/environments/enable-tls.yaml -e ~/ssl-heat-templates/environments/inject-trust-anchor.yaml



The target service is failed on controllers this time.

May 28 14:56:06 overcloud-controller-0 cinder-volume: 2016-05-28 14:56:06.076 19727 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_iscsi is reporting problems, not sending heartbeat. Service will appear "down





Do we need to include the storage-environment.yaml without ceph?

Comment 4 Omri Hochman 2016-06-16 13:31:01 UTC
Reproduced on clean deployment of ospd-9 on Bare-Metal 
( it was Not update/upgrade ) : 

environment:
-------------
openstack-cinder-8.0.0-4.el7ost.noarch
python-cinderclient-1.6.0-1.el7ost.noarch
python-cinder-8.0.0-4.el7ost.noarch
openstack-heat-engine-6.0.0-4.el7ost.noarch
openstack-heat-api-6.0.0-4.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-9.el7ost.noarch
openstack-tripleo-heat-templates-kilo-2.0.0-9.el7ost.noarch
heat-cfntools-1.3.0-2.el7ost.noarch
openstack-heat-common-6.0.0-4.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
openstack-heat-api-cfn-6.0.0-4.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-9.el7ost.noarch
python-heatclient-1.2.0-1.el7ost.noarch


scenario: 
----------
(1) deploy setup with ceph nodes using ospd9  
(2) reboot undercloud + overcloud 
(3) attempt to create cinder-volume and attach to instance 

results: 
--------
cinder list --> shows volume with ERROR
 
/var/log/cinder/volume.log : 
------------------------------
2016-06-16 02:34:39.511 15069 INFO cinder.volume.manager [req-63f7580d-434a-4843-a6c5-6069a68f638d - - - - -] Determined volume DB was empty at startup.
2016-06-16 02:34:39.835 15069 INFO cinder.volume.manager [req-63f7580d-434a-4843-a6c5-6069a68f638d - - - - -] Image-volume cache disabled for host hostgroup@t
ripleo_iscsi.
2016-06-16 02:34:39.838 15069 INFO oslo_service.service [req-63f7580d-434a-4843-a6c5-6069a68f638d - - - - -] Starting 1 workers
2016-06-16 02:34:39.844 15249 INFO cinder.service [-] Starting cinder-volume node (version 8.0.0)
2016-06-16 02:34:39.846 15249 INFO cinder.volume.manager [req-e00abecd-4556-456a-8d08-eddff08a3398 - - - - -] Starting volume driver LVMVolumeDriver (3.0.0)
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager [req-e00abecd-4556-456a-8d08-eddff08a3398 - - - - -] Failed to initialize driver.
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager Traceback (most recent call last):
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 426, in init_host
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager     self.driver.check_for_setup_error()
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 283, in check_for_setup_error
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager     lvm_conf=lvm_conf_file)
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 95, in __init__
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager     if self._vg_exists() is False:
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 128, in _vg_exists
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager     run_as_root=True)
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 148, in execute
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager     return processutils.execute(*cmd, **kwargs)
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 371, in execute
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager     cmd=sanitized_cmd)
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command.
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C vgs --noheadings -o name cinder-volumes
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager Exit code: 5
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager Stdout: u''
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager Stderr: u'File descriptor 10 (/dev/urandom) leaked on vgs invocation. Parent PID 15253: /usr/bin/python2\n  Volume group "cinder-volumes" not found\n  Cannot process volume group cinder-volumes\n'
2016-06-16 02:34:40.060 15249 ERROR cinder.volume.manager 
2016-06-16 02:34:40.177 15249 INFO cinder.volume.manager [req-e00abecd-4556-456a-8d08-eddff08a3398 - - - - -] Initializing RPC dependent components of volume

Comment 5 Omri Hochman 2016-06-16 15:54:49 UTC
further investigation showed that the deployment command on my setup (with Ceph) there was a missing argument for the storage-environment.yaml, which causes to create an LV for cinder-volume, which is known as not being re-mounted post reboot  ( afaik this use case is more for POCs ) . 


removing blocker-flags and lower the bz priority.

Comment 6 James Slagle 2017-02-28 22:07:19 UTC
the cinder-volumes lvm group is not persisted across reboot. that is not planning on being fixed as no one should be using cinder with the lvm driver backed by a loopback device anyway, nor is it supported.


Note You need to log in before you can comment on or make changes to this bug.