Description of problem: Trying to build a cinder volume using image. The image is about 17 GB. The volume we are trying to create is 100 GB Intermittently, the request fails Version-Release number of selected component (if applicable): How reproducible: 75% Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: There is no LB. Three controllers managed by pacemaker. glance has a VIP configured and I beleive is managed by HaProxy. Trying to build a cinder volume using image. The image is about 17 GB. The volume we are trying to create is 100 GB Intermittently, the request fails. The glance logs show 2016-01-25 11:33:06.863 22647 INFO glance.wsgi.server [d552da60-551f-47b8-9295-ddbba514a047 e998e2dabe8e4d1eb2a6b778b4422c01 24341f87ac894f78970abef5231f1a67 - - -] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 406, in handle_one_response write(''.join(towrite)) File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 354, in write _writelines(towrite) File "/usr/lib64/python2.7/socket.py", line 334, in writelines self.flush() File "/usr/lib64/python2.7/socket.py", line 303, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 309, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send total_sent += fd.send(data[total_sent:], flags) error: [Errno 32] Broken pipe The same request issued again will succeed.
nova volume-list +--------------------------------------+-----------+-----------------------+------+-------------+--------------------------------------+ | ID | Status | Display Name | Size | Volume Type | Attached to | +--------------------------------------+-----------+-----------------------+------+-------------+--------------------------------------+ | 664c62b8-ec7a-4431-9978-7eef71e31c14 | error | ceph_win2012_vol | 100 | ceph | | | 82ea6e36-4139-4e5e-8daf-4b10ad5a942e | in-use | | 100 | nfs | 30db6f61-68fb-4221-ace7-0b1106c58456 | | 00b543bf-d1d2-4ff2-ad38-db4a36a46e09 | available | vol-windows-2012-08 | 100 | nfs | | | 2e3c5c10-2a0f-44d0-8b3b-571a84068c50 | available | vol-windows-2012-07 | 100 | nfs | | | 6b631cf3-c113-4d17-b009-1e8b71569b4b | available | vol-windows-2012-06 | 100 | nfs | | | eb48f28a-0051-488b-946a-30d0664b6f2e | error | vol-windows-2012-05 | 100 | nfs | | | 91d008cb-7475-4741-b3f1-9f3b0d8e336b | available | vol-windows-2012-04 | 100 | nfs | | | 9666e2ad-6369-4ee2-87bc-7fa309f8dc15 | in-use | bvpkura_win_02 | 100 | nfs | f68562cd-69f8-4fa0-ba21-dc53b103b65a | | 7c2f50d9-4b41-4ef5-aec4-9922459aa8f8 | available | test-nfs-win-vol02 | 100 | nfs | | | 88366c98-0ffd-4190-87e5-b5c59eac3551 | available | test-ceph-linux-vol01 | 40 | ceph | | | a2527631-1945-4c10-80e8-45bfc562d38f | in-use | bvpkwin01_nfs | 100 | nfs | d9021577-4f10-451f-a72b-424187ad6d8c | | 81c0094f-6f4e-43c1-8d08-956e893a8465 | in-use | bvpklinux01_nfs | 25 | nfs | 6c2f1b1f-0b33-4389-8bd4-d4f82ddf473f | +--------------------------------------+-----------+-----------------------+------+-------------+--------------------------------------+ The following is the failed on in the list eb48f28a-0051-488b-946a-30d0664b6f2e | error | vol-windows-2012-05 | 100 | nfs [root@xlabostkctrl1 scripts(pkura_lab_hfd)]# glance image-list +--------------------------------------+-------------------------+-------------+------------------+-------------+--------+ | ID | Name | Disk Format | Container Format | Size | Status | +--------------------------------------+-------------------------+-------------+------------------+-------------+--------+ | c993bb22-57c3-46ba-884e-4cb8621f817b | WIN2012-OS-IMG | qcow2 | bare | 18008834048 | active | +--------------------------------------+-------------------------+-------------+------------------+-------------+--------+
@Jeremy The logs are not in the collab-shell. Could you upload them there? Also, the traceback in the description is incomplete, could you paste the full traceback?
I found the logs in the case. Broken Pipes normally happen when the connection drops on the other end. By looking at the logs, it seems that this environment has serious network issues as there are *many* broken pipes in nova/cinder logs. It also looses connection to MySQL. I'd recommend debugging that first.
I'm seeing this is in /var/log/cinder/volume.log 2016-01-27 14:52:35.853 16078 ERROR cinder.volume.flows.manager.create_volume [req-83e7003f-8d6c-4bb1-ab29-00b6407877a8 e998e2dabe8e4d1eb2a6b778b4422c01 24341f87ac894f78970abef5231f1a67 - - -] Failed to copy image f5618dd7-cdfb-4b04-80c2-26cc75033860 to volume: 23206ea2-615c-419b-af03-42a40e195d67, error: [Errno 32] Corrupted image. Checksum was f8840d9261d652b8366b62c8302231f4 expected 8cc30337d32a01957c48b9d91e419866 2016-01-27 14:52:35.856 16078 DEBUG cinder.volume.flows.common [req-83e7003f-8d6c-4bb1-ab29-00b6407877a8 e998e2dabe8e4d1eb2a6b778b4422c01 24341f87ac894f78970abef5231f1a67 - - -] Updating volume: 23206ea2-615c-419b-af03-42a40e195d67 with {'status': 'error'} due to: ??? error_out_volume /usr/lib/python2.7/site-packages/cinder/volume/flows/common.py:87 2016-01-27 14:52:35.893 16078 DEBUG cinder.openstack.common.periodic_task [-] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.7/site-packages/cinder/openstack/common/periodic_task.py:178 2016-01-27 14:52:35.893 16078 DEBUG cinder.manager [-] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:128 2016-01-27 14:52:35.894 16078 WARNING cinder.openstack.common.loopingcall [-] task run outlasted interval by 32.45048 sec 2016-01-27 14:52:35.895 16078 ERROR cinder.volume.flows.manager.create_volume [req-83e7003f-8d6c-4bb1-ab29-00b6407877a8 e998e2dabe8e4d1eb2a6b778b4422c01 24341f87ac894f78970abef5231f1a67 - - -] Volume 23206ea2-615c-419b-af03-42a40e195d67: create failed
Data integrity check fails most probably because the image is only partially downloaded, but just to be on the safe side let's verify that image in Glance is not corrupted. According to the log they use a Glance file store so its relatively easy to confirm, just find an image and run md5sum. Logs support the the partial download theory, it looks the client terminated connection at some point either due to the networking issue or maybe because of the timeout somewhere. Does glance api work behind HA proxy, if it does can we get haproxy logs?
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days