1119226 – Thin provisioning disks broken on block storage when using pthreading 1.3

Bug 1119226 - Thin provisioning disks broken on block storage when using pthreading 1.3

Summary: Thin provisioning disks broken on block storage when using pthreading 1.3

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	python-pthreading
Sub Component:
Version:	3.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.4.1
Assignee:	Yaniv Bronhaim
QA Contact:	Jiri Belka
Docs Contact:	Ruediger Landmann
URL:
Whiteboard:	infra
Depends On:	1117795
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-14 10:26 UTC by rhev-integ
Modified:	2016-02-10 19:38 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, when using pthreading 1.3, disk extend requests sent over the mailbox were not handled, which caused virtual machines to pause when their disk(s) became full. This happened because a locked() method was incorrectly handled by VDSM. Now, a patch adds the missing method, implementing it in the same way that Python implements it, and VDSM interacts correctly with pthreading 1.3.
Clone Of:	1117795
Environment:
Last Closed:	2014-07-29 14:19:29 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Fake dd simulating errors when accessing the inbox (641 bytes, text/plain) 2014-07-22 12:33 UTC, Nir Soffer	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:0975	0	normal	SHIPPED_LIVE	python-pthreading bug fix and enhancement update	2014-07-29 18:18:01 UTC

Comment 1 Yaniv Bronhaim 2014-07-14 14:31:47 UTC

Fixed in:
https://github.com/oVirt/pthreading/commit/b42f0acba4ad5a8fb971733fedd295e7d075afbc

Comment 2 Jiri Belka 2014-07-18 13:16:05 UTC

Please provide exact steps for reproduction/verification. If not possible would be enough to check python code if it contains the hack with Lock/_Lock class?

Comment 3 Jiri Belka 2014-07-22 08:13:05 UTC

Could you provide exact steps to reproduce this issue? I can't reproduce it with ISCSI-based VM disk (writing into it and it got extended without pause) on 0.1.3-1?

Comment 4 Nir Soffer 2014-07-22 12:32:27 UTC

This happens when you have some io error when reading from the inbox lv
on the master domain. The current code will log an error and try again,
while the old code was checking if a lock is locked, and because the 
locked() method was not implemented by the lock in pthreading < 0.1.3-3,
the thread would exit.

So to reproduce, you have to cause an io error when accessing the inbox
lv on the master domain.

One option to do this is to replace /usr/bin/dd with a script simulating
errors. Check the dd.fake attachment for details.

Comment 5 Nir Soffer 2014-07-22 12:33:14 UTC

Created attachment 919911 [details]
Fake dd simulating errors when accessing the inbox

Comment 6 Jiri Belka 2014-07-22 14:43:11 UTC

ok, 0.1.3-3. (with older version no msg for mailbox lv io error were logged.)

simulating io issue from inbox lv:

IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
libvirtEventLoop::INFO::2014-07-22 14:38:13,548::vm::4574::vm.Vm::(_onIOError) vmId=`cb025f1d-e18f-44cc-96f8-8aab6c99de8a`::abnormal vm stop device virtio-disk0 error eother

the VM gets paused.

stop simulating io issue makes the VM change status to Up.

Comment 7 Jiri Belka 2014-07-23 07:11:03 UTC

Verified based on #6.

Comment 8 Gal Amado 2014-07-24 12:25:21 UTC

seems like we need to sort out the Installation instruction (inline as comment on dd.fake)

What worked for me was :
...
chmod +x dd.fake
cd /usr/bin
mv /bin/dd /usr/bin/dd.real
ln -sf /usr/bin/dd.fake /bin/dd
...

Comment 9 Jiri Belka 2014-07-24 12:32:57 UTC

You discovered difference between Fedora and RHEL :)

Comment 10 Nir Soffer 2014-07-24 13:07:36 UTC

(In reply to Gal Amado from comment #8)
> seems like we need to sort out the Installation instruction (inline as
> comment on dd.fake)
> 
> What worked for me was :
> ...
> chmod +x dd.fake
> cd /usr/bin
> mv /bin/dd /usr/bin/dd.real

This leaves you fro a moment without a dd program, so if vdsm try to run dd now, it will fail.

This is why I was using ln, which create a hard link of dd - you have now 2 dd
programs.

> ln -sf /usr/bin/dd.fake /bin/dd

And this line replaces one of the real dd programs with a symbolic link atomically.

Comment 12 errata-xmlrpc 2014-07-29 14:19:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0975.html

Note You need to log in before you can comment on or make changes to this bug.