Bug 1119226 - Thin provisioning disks broken on block storage when using pthreading 1.3
Summary: Thin provisioning disks broken on block storage when using pthreading 1.3
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: python-pthreading
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.4.1
Assignee: Yaniv Bronhaim
QA Contact: Jiri Belka
Ruediger Landmann
URL:
Whiteboard: infra
Depends On: 1117795
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-14 10:26 UTC by rhev-integ
Modified: 2016-02-10 19:38 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, when using pthreading 1.3, disk extend requests sent over the mailbox were not handled, which caused virtual machines to pause when their disk(s) became full. This happened because a locked() method was incorrectly handled by VDSM. Now, a patch adds the missing method, implementing it in the same way that Python implements it, and VDSM interacts correctly with pthreading 1.3.
Clone Of: 1117795
Environment:
Last Closed: 2014-07-29 14:19:29 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fake dd simulating errors when accessing the inbox (641 bytes, text/plain)
2014-07-22 12:33 UTC, Nir Soffer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0975 0 normal SHIPPED_LIVE python-pthreading bug fix and enhancement update 2014-07-29 18:18:01 UTC

Comment 2 Jiri Belka 2014-07-18 13:16:05 UTC
Please provide exact steps for reproduction/verification. If not possible would be enough to check python code if it contains the hack with Lock/_Lock class?

Comment 3 Jiri Belka 2014-07-22 08:13:05 UTC
Could you provide exact steps to reproduce this issue? I can't reproduce it with ISCSI-based VM disk (writing into it and it got extended without pause) on 0.1.3-1?

Comment 4 Nir Soffer 2014-07-22 12:32:27 UTC
This happens when you have some io error when reading from the inbox lv
on the master domain. The current code will log an error and try again,
while the old code was checking if a lock is locked, and because the 
locked() method was not implemented by the lock in pthreading < 0.1.3-3,
the thread would exit.

So to reproduce, you have to cause an io error when accessing the inbox
lv on the master domain.

One option to do this is to replace /usr/bin/dd with a script simulating
errors. Check the dd.fake attachment for details.

Comment 5 Nir Soffer 2014-07-22 12:33:14 UTC
Created attachment 919911 [details]
Fake dd simulating errors when accessing the inbox

Comment 6 Jiri Belka 2014-07-22 14:43:11 UTC
ok, 0.1.3-3. (with older version no msg for mailbox lv io error were logged.)

simulating io issue from inbox lv:

IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/b7ee9232-7844-4d06-9f66-d347cb9e0f66/mastersd/dom_md/inbox
libvirtEventLoop::INFO::2014-07-22 14:38:13,548::vm::4574::vm.Vm::(_onIOError) vmId=`cb025f1d-e18f-44cc-96f8-8aab6c99de8a`::abnormal vm stop device virtio-disk0 error eother

the VM gets paused.

stop simulating io issue makes the VM change status to Up.

Comment 7 Jiri Belka 2014-07-23 07:11:03 UTC
Verified based on #6.

Comment 8 Gal Amado 2014-07-24 12:25:21 UTC
seems like we need to sort out the Installation instruction (inline as comment on dd.fake)

What worked for me was :
...
chmod +x dd.fake
cd /usr/bin
mv /bin/dd /usr/bin/dd.real
ln -sf /usr/bin/dd.fake /bin/dd
...

Comment 9 Jiri Belka 2014-07-24 12:32:57 UTC
You discovered difference between Fedora and RHEL :)

Comment 10 Nir Soffer 2014-07-24 13:07:36 UTC
(In reply to Gal Amado from comment #8)
> seems like we need to sort out the Installation instruction (inline as
> comment on dd.fake)
> 
> What worked for me was :
> ...
> chmod +x dd.fake
> cd /usr/bin
> mv /bin/dd /usr/bin/dd.real

This leaves you fro a moment without a dd program, so if vdsm try to run dd now, it will fail.

This is why I was using ln, which create a hard link of dd - you have now 2 dd
programs.

> ln -sf /usr/bin/dd.fake /bin/dd

And this line replaces one of the real dd programs with a symbolic link atomically.

Comment 12 errata-xmlrpc 2014-07-29 14:19:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0975.html


Note You need to log in before you can comment on or make changes to this bug.