Bug 1302032 - Large image is downloaded from glance during the volume creation and fails [NEEDINFO]
Large image is downloaded from glance during the volume creation and fails
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-glance (Show other bugs)
5.0 (RHEL 7)
Unspecified Unspecified
medium Severity medium
: ---
: 5.0 (RHEL 7)
Assigned To: Flavio Percoco
nlevinki
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-26 10:10 EST by Jeremy
Modified: 2016-04-26 14:59 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-22 01:11:03 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
sgotliv: needinfo? (jmelvin)


Attachments (Terms of Use)

  None (edit)
Description Jeremy 2016-01-26 10:10:41 EST
Description of problem: Trying to build a cinder volume using image. The image is about 17 GB.
The volume we are trying to create is 100 GB

Intermittently, the request fails


Version-Release number of selected component (if applicable):


How reproducible:
75%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
There is no LB. Three controllers managed by pacemaker.
glance has a VIP configured and I beleive is managed by HaProxy.



Trying to build a cinder volume using image. The image is about 17 GB.
The volume we are trying to create is 100 GB

Intermittently, the request fails. The glance logs show
2016-01-25 11:33:06.863 22647 INFO glance.wsgi.server [d552da60-551f-47b8-9295-ddbba514a047 e998e2dabe8e4d1eb2a6b778b4422c01 24341f87ac894f78970abef5231f1a67 - - -] Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 406, in handle_one_response
    write(''.join(towrite))
  File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 354, in write
    _writelines(towrite)
  File "/usr/lib64/python2.7/socket.py", line 334, in writelines
    self.flush()
  File "/usr/lib64/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
  File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 309, in sendall
    tail = self.send(data, flags)
  File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send
    total_sent += fd.send(data[total_sent:], flags)
error: [Errno 32] Broken pipe

The same request issued again will succeed.
Comment 2 Jeremy 2016-01-26 10:33:11 EST
nova volume-list
+--------------------------------------+-----------+-----------------------+------+-------------+--------------------------------------+
| ID                                   | Status    | Display Name          | Size | Volume Type | Attached to                          |
+--------------------------------------+-----------+-----------------------+------+-------------+--------------------------------------+
| 664c62b8-ec7a-4431-9978-7eef71e31c14 | error     | ceph_win2012_vol      | 100  | ceph        |                                      |
| 82ea6e36-4139-4e5e-8daf-4b10ad5a942e | in-use    |                       | 100  | nfs         | 30db6f61-68fb-4221-ace7-0b1106c58456 |
| 00b543bf-d1d2-4ff2-ad38-db4a36a46e09 | available | vol-windows-2012-08   | 100  | nfs         |                                      |
| 2e3c5c10-2a0f-44d0-8b3b-571a84068c50 | available | vol-windows-2012-07   | 100  | nfs         |                                      |
| 6b631cf3-c113-4d17-b009-1e8b71569b4b | available | vol-windows-2012-06   | 100  | nfs         |                                      |
| eb48f28a-0051-488b-946a-30d0664b6f2e | error     | vol-windows-2012-05   | 100  | nfs         |                                      |
| 91d008cb-7475-4741-b3f1-9f3b0d8e336b | available | vol-windows-2012-04   | 100  | nfs         |                                      |
| 9666e2ad-6369-4ee2-87bc-7fa309f8dc15 | in-use    | bvpkura_win_02        | 100  | nfs         | f68562cd-69f8-4fa0-ba21-dc53b103b65a |
| 7c2f50d9-4b41-4ef5-aec4-9922459aa8f8 | available | test-nfs-win-vol02    | 100  | nfs         |                                      |
| 88366c98-0ffd-4190-87e5-b5c59eac3551 | available | test-ceph-linux-vol01 | 40   | ceph        |                                      |
| a2527631-1945-4c10-80e8-45bfc562d38f | in-use    | bvpkwin01_nfs         | 100  | nfs         | d9021577-4f10-451f-a72b-424187ad6d8c |
| 81c0094f-6f4e-43c1-8d08-956e893a8465 | in-use    | bvpklinux01_nfs       | 25   | nfs         | 6c2f1b1f-0b33-4389-8bd4-d4f82ddf473f |
+--------------------------------------+-----------+-----------------------+------+-------------+--------------------------------------+

The following is the failed on in the list
eb48f28a-0051-488b-946a-30d0664b6f2e | error     | vol-windows-2012-05   | 100  | nfs         





[root@xlabostkctrl1 scripts(pkura_lab_hfd)]# glance image-list
+--------------------------------------+-------------------------+-------------+------------------+-------------+--------+
| ID                                   | Name                    | Disk Format | Container Format | Size        | Status |
+--------------------------------------+-------------------------+-------------+------------------+-------------+--------+
| c993bb22-57c3-46ba-884e-4cb8621f817b | WIN2012-OS-IMG          | qcow2       | bare             | 18008834048 | active |
+--------------------------------------+-------------------------+-------------+------------------+-------------+--------+
Comment 3 Flavio Percoco 2016-01-27 07:37:09 EST
@Jeremy

The logs are not in the collab-shell. Could you upload them there?

Also, the traceback in the description is incomplete, could you paste the full traceback?
Comment 4 Flavio Percoco 2016-01-27 08:50:41 EST
I found the logs in the case.

Broken Pipes normally happen when the connection drops on the other end. By looking at the logs, it seems that this environment has serious network issues as there are *many* broken pipes in nova/cinder logs. It also looses connection to MySQL.

I'd recommend debugging that first.
Comment 5 Jeremy 2016-01-27 16:07:51 EST
I'm seeing this is in /var/log/cinder/volume.log

2016-01-27 14:52:35.853 16078 ERROR cinder.volume.flows.manager.create_volume [req-83e7003f-8d6c-4bb1-ab29-00b6407877a8 e998e2dabe8e4d1eb2a6b778b4422c01 24341f87ac894f78970abef5231f1a67 - - -] Failed to copy image f5618dd7-cdfb-4b04-80c2-26cc75033860 to volume: 23206ea2-615c-419b-af03-42a40e195d67, error: [Errno 32] Corrupted image. Checksum was f8840d9261d652b8366b62c8302231f4 expected 8cc30337d32a01957c48b9d91e419866
2016-01-27 14:52:35.856 16078 DEBUG cinder.volume.flows.common [req-83e7003f-8d6c-4bb1-ab29-00b6407877a8 e998e2dabe8e4d1eb2a6b778b4422c01 24341f87ac894f78970abef5231f1a67 - - -] Updating volume: 23206ea2-615c-419b-af03-42a40e195d67 with {'status': 'error'} due to: ??? error_out_volume /usr/lib/python2.7/site-packages/cinder/volume/flows/common.py:87
2016-01-27 14:52:35.893 16078 DEBUG cinder.openstack.common.periodic_task [-] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.7/site-packages/cinder/openstack/common/periodic_task.py:178
2016-01-27 14:52:35.893 16078 DEBUG cinder.manager [-] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:128
2016-01-27 14:52:35.894 16078 WARNING cinder.openstack.common.loopingcall [-] task run outlasted interval by 32.45048 sec
2016-01-27 14:52:35.895 16078 ERROR cinder.volume.flows.manager.create_volume [req-83e7003f-8d6c-4bb1-ab29-00b6407877a8 e998e2dabe8e4d1eb2a6b778b4422c01 24341f87ac894f78970abef5231f1a67 - - -] Volume 23206ea2-615c-419b-af03-42a40e195d67: create failed
Comment 6 Sergey Gotliv 2016-01-28 03:55:16 EST
Data integrity check fails most probably because the image is only partially downloaded, but just to be on the safe side let's verify that image in Glance is not corrupted. According to the log they use a Glance file store so its relatively easy to confirm, just find an image and run md5sum.

Logs support the the partial download theory, it looks the client terminated connection at some point either due to the networking issue or maybe because of the timeout somewhere. Does glance api work behind HA proxy, if it does can we get haproxy logs?

Note You need to log in before you can comment on or make changes to this bug.