Bug 1298264

Summary: Cinder volumes stuck in downloading state
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: python-oslo-messagingAssignee: Eric Harney <eharney>
Status: CLOSED ERRATA QA Contact: lkuchlan <lkuchlan>
Severity: high Docs Contact:
Priority: high    
Version: 6.0 (Juno)CC: apevec, dcain, dmaley, eharney, fwissing, jeckersb, jmelvin, jobernar, lhh, mabaakou, oblaut, rich, yeylon
Target Milestone: asyncKeywords: ZStream
Target Release: 6.0 (Juno)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-oslo-messaging-1.4.1-8.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1301634 (view as bug list) Environment:
Last Closed: 2016-02-18 17:44:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2016-01-13 16:00:21 UTC
Description of problem:
[root@phllcu401 ~(cloud_admin_v2)]# cinder show af405553-e0cf-4f3c-a8b0-dc772422ed40
+---------------------------------------+--------------------------------------------------------------------------------------------+
|                Property               |                                           Value                                            |
+---------------------------------------+--------------------------------------------------------------------------------------------+
|              attachments              |                                             []                                             |
|           availability_zone           |                                  SungardAS-IntegrationLab                                  |
|                bootable               |                                           false                                            |
|               created_at              |                                 2016-01-13T06:53:25.000000                                 |
|          display_description          |                                                                                            |
|              display_name             |                                                                                            |
|               encrypted               |                                           False                                            |
|                   id                  |                            af405553-e0cf-4f3c-a8b0-dc772422ed40                            |
|                metadata               |                                             {}                                             |
|         os-vol-host-attr:host         | phllcu402@netapp#phllnasnet1000-storage.edisonqa.sgns.net:/phllnasnet1000v_vol8_uat_cinder |
|     os-vol-mig-status-attr:migstat    |                                            None                                            |
|     os-vol-mig-status-attr:name_id    |                                            None                                            |
|      os-vol-tenant-attr:tenant_id     |                              396f6c55aba34d8fade57c21ae00ed5d                              |
|   os-volume-replication:driver_data   |                                            None                                            |
| os-volume-replication:extended_status |                                            None                                            |
|                  size                 |                                             80                                             |
|              snapshot_id              |                                            None                                            |
|              source_volid             |                                            None                                            |
|                 status                |                                        downloading                                         |
|              volume_type              |                                           netapp                                           |
+---------------------------------------+--------------------------------------------------------------------------------------------+
 
 
 
21672: ERROR cinder.volume.drivers.netapp.nfs [req-818755ac-04de-4447-a5be-56034df0cfbd 2fa2cef88f9742448fd8180054eeade5 396f6c55aba34d8fade57c21ae00ed5d - - -] Error in workflow copy from cache. Unexpected error while running command.
Command: None
Exit code: -
Stdout: u"Unexpected error while running command.\nCommand: /usr/bin/na_copyoffload_64 10.140.231.6 10.140.231.6 /phllnasnet1000v_vol9_uat_cinder/img-cache-0890828d-41f9-4268-8834-dffe250a4ab4 /phllnasnet1000v_vol8_uat_cinder/volume-af405553-e0cf-4f3c-a8b0-dc772422ed40\nExit code: 24\nStdout: u'Program exiting with return code 10008.\\n'\nStderr: u''"
Stderr: None.
21672: TRACE cinder.volume.drivers.netapp.nfs Traceback (most recent call last):
21672: TRACE cinder.volume.drivers.netapp.nfs   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/netapp/nfs.py", line 1226, in _copy_from_cache
21672: TRACE cinder.volume.drivers.netapp.nfs     check_exit_code=0)
21672: TRACE cinder.volume.drivers.netapp.nfs   File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 142, in execute
21672: TRACE cinder.volume.drivers.netapp.nfs     return processutils.execute(*cmd, **kwargs)
21672: TRACE cinder.volume.drivers.netapp.nfs   File "/usr/lib/python2.7/site-packages/cinder/openstack/common/processutils.py", line 200, in execute
21672: TRACE cinder.volume.drivers.netapp.nfs     cmd=sanitized_cmd)
21672: TRACE cinder.volume.drivers.netapp.nfs ProcessExecutionError: Unexpected error while running command.
21672: TRACE cinder.volume.drivers.netapp.nfs Command: /usr/bin/na_copyoffload_64 10.140.231.6 10.140.231.6 /phllnasnet1000v_vol9_uat_cinder/img-cache-0890828d-41f9-4268-8834-dffe250a4ab4 /phllnasnet1000v_vol8_uat_cinder/volume-af405553-e0cf-4f3c-a8b0-dc772422ed40
21672: TRACE cinder.volume.drivers.netapp.nfs Exit code: 24
21672: TRACE cinder.volume.drivers.netapp.nfs Stdout: u'Program exiting with return code 10008.\n'
21672: TRACE cinder.volume.drivers.netapp.nfs Stderr: u''
21672: TRACE cinder.volume.drivers.netapp.nfs



Version-Release number of selected component (if applicable):
openstack-cinder-2014.2.3-3.el7ost.noarch 
python-cinderclient-1.1.1-1.el7ost.noarch  

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Restart of cinder services, or server reboot solve the problem temporarily, but it seems to occur again fairly quiclky.

Comment 2 Jon Bernard 2016-01-13 16:37:59 UTC
Huge number of rabbit missing exchanges, I think your messaging configuration is suspect.  I didn't see any proper cinder exceptions there beyond the copyoffload, but that will fallback to an unoptimized operation.

Comment 9 Jeremy 2016-01-18 14:40:31 UTC
Hello,
We are still having issues with volumes stuck in downloading state after patching rabbitmq. Below is the error we are seeing:

volume.log:Stdout: u"Unexpected error while running command.\nCommand: /usr/bin/na_copyoffload_64 10.140.231.6 10.140.231.6 /phllnasnet1000v_vol9_uat_cinder/img-cache-d5b5ff16-e609-4aed-938c-7bda3921fc64 /phllnasnet1000v_vol8_uat_cinder/volume-07f0cd02-6cf4-4b83-b370-5f8d05b4f0cf\nExit code: 24\nStdout: u'Program exiting with return code 10008.\\n'\nStderr: u''"
volume.log:14172: TRACE cinder.volume.drivers.netapp.nfs Stdout: u'Program exiting with return code 10008.\n'

Comment 10 Mehdi ABAAKOUK 2016-01-18 15:22:09 UTC
Do you still seen issue with oslo.messaging/rabbit ? Or only the cinder one is present ?

Comment 11 Freddy Wissing 2016-01-18 15:24:23 UTC
They have reported that the oslo.messaging/rabbit issue no longer manifests, but the original symptom of it (the cinder volumes stuck in downloading state) is still there.

Comment 14 Rich Boyce 2016-01-21 11:57:36 UTC
We are experiencing this issue in our environment as well. Here is the relevant part of /var/log/cinder/volume.log:

2016-01-21 11:17:32.702 65499 ERROR cinder.volume.drivers.netapp.nfs [req-2030a5c0-258f-455b-b312-431448f1d6e2 a1c9ba1fc9964a0f845aef41a22de7b5 2dc4661120364eaba4744eb0348a81b3 - - -] Error 
in workflow copy from cache. Unexpected error while running command.
Command: None
Exit code: -
Stdout: u"Unexpected error while running command.\nCommand: /etc/cinder/na_copyoffload_64 10.35.4.12 10.35.4.12 /vol/hh_extcl03_nfsprod01_vol01/img-cache-8f74dd15-46ae-4ce2-b33a-75459ff8869a /vol/hh_extcl03_nfsprod01_vol03/volume-1f75a6ee-6de8-43fb-9ac8-3aceb990bf3d\nExit code: 24\nStdout: u'Program exiting with return code 10008.\\n'\nStderr: u''"
Stderr: None.
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs Traceback (most recent call last):
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/netapp/nfs.py", line 1226, in _copy_from_cache
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs     check_exit_code=0)
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs   File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 142, in execute
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs     return processutils.execute(*cmd, **kwargs)
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs   File "/usr/lib/python2.7/site-packages/cinder/openstack/common/processutils.py", line 200, in execute
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs     cmd=sanitized_cmd)
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs ProcessExecutionError: Unexpected error while running command.
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs Command: /etc/cinder/na_copyoffload_64 10.35.4.12 10.35.4.12 /vol/hh_extcl03_nfsprod01_vol01/img-cache-8f74dd15-46ae-4ce2-b33a-75459ff8869a /vol/hh_extcl03_nfsprod01_vol03/volume-1f75a6ee-6de8-43fb-9ac8-3aceb990bf3d
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs Exit code: 24
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs Stdout: u'Program exiting with return code 10008.\n'
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs Stderr: u''
2016-01-21 11:17:32.702 65499 TRACE cinder.volume.drivers.netapp.nfs 

Here's the resulting cinder volume:

[root@ctrl002 mnt(openstack_admin)]# cinder show 1f75a6ee-6de8-43fb-9ac8-3aceb990bf3d
+---------------------------------------+----------------------------------------------------------+
|                Property               |                          Value                           |
+---------------------------------------+----------------------------------------------------------+
|              attachments              |                            []                            |
|           availability_zone           |                           nova                           |
|                bootable               |                          false                           |
|               created_at              |                2016-01-21T11:15:53.000000                |
|          display_description          |                                                          |
|              display_name             |                                                          |
|               encrypted               |                          False                           |
|                   id                  |           1f75a6ee-6de8-43fb-9ac8-3aceb990bf3d           |
|                metadata               |                            {}                            |
|         os-vol-host-attr:host         | ha-controller#10.35.4.12:/vol/hh_extcl03_nfsprod01_vol03 |
|     os-vol-mig-status-attr:migstat    |                           None                           |
|     os-vol-mig-status-attr:name_id    |                           None                           |
|      os-vol-tenant-attr:tenant_id     |             2dc4661120364eaba4744eb0348a81b3             |
|   os-volume-replication:driver_data   |                           None                           |
| os-volume-replication:extended_status |                           None                           |
|                  size                 |                            20                            |
|              snapshot_id              |                           None                           |
|              source_volid             |                           None                           |
|                 status                |                       downloading                        |
|              volume_type              |                           None                           |
+---------------------------------------+----------------------------------------------------------+


And when running the /etc/cinder/na_copyoffload_64 command manually:

[root@ctrl002 mnt]# /etc/cinder/na_copyoffload_64 10.35.4.12 10.35.4.12 /vol/hh_extcl03_nfsprod01_vol01/img-cache-8f74dd15-46ae-4ce2-b33a-75459ff8869a /vol/hh_extcl03_nfsprod01_vol03/rbtest210116
Program exiting with return code 10008.

We are not seeing any rabbitmq issues as far as I can tell - what logs should I check for that?

Comment 16 Lon Hohberger 2016-01-28 21:44:36 UTC
Is this bug a cinder issue or oslo-messaging issue?  If it's "both", please clone to cinder, as we have a fix for oslo-messaging.

Comment 17 lkuchlan 2016-02-01 08:00:47 UTC
The latest rhos6 does not contain the petch python-oslo-messaging-1.4.1-8.el7ost

Tested using:
python-oslo-messaging-1.4.1-7.el7ost.noarch

Comment 19 lkuchlan 2016-02-04 17:05:49 UTC
Tested using:
python-oslo-messaging-1.4.1-8.el7ost.noarch
python-cinder-2014.2.3-11.el7ost.noarch
python-cinderclient-1.1.1-1.el7ost.noarch
openstack-cinder-2014.2.3-11.el7ost.noarch

Verification instructions:
1. Configure cinder and glance for using Netapp NFS driver as back-end
2. Create a volume from image

Results:

[root@cougar01 ~]# mount | grep 'glance\|cinder'
10.35.64.11:/vol/vol_rhos_auto_nfs on /var/lib/cinder/mnt/b412ab27c39458e94f0027cd1aff3877 type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.35.64.11,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.35.64.11)
10.35.64.11:/vol/vol_rhos_auto_nfs on /var/lib/glance/images type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.35.64.11,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.35.64.11)

[root@cougar01 ~(keystone_admin)]# cinder create 1 --image cbe8b84d-8269-4126-8216-286fd42b454e
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2016-02-04T15:27:44.371045      |
| display_description |                 None                 |
|     display_name    |                 None                 |
|      encrypted      |                False                 |
|          id         | 2615ab3b-a649-4d2f-b4e9-0d880d7f8d79 |
|       image_id      | cbe8b84d-8269-4126-8216-286fd42b454e |
|       metadata      |                  {}                  |
|         size        |                  1                   |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+

[root@cougar01 ~(keystone_admin)]# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 2615ab3b-a649-4d2f-b4e9-0d880d7f8d79 | available |     None     |  1   |     None    |   true   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+

Comment 21 errata-xmlrpc 2016-02-18 17:44:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0267.html

Comment 22 Red Hat Bugzilla 2023-09-14 03:16:03 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days