Bug 1117795

Summary: Thin provisioning disks broken on block storage when using pthreading 1.3
Product: Red Hat Enterprise Virtualization Manager Reporter: Nir Soffer <nsoffer>
Component: python-pthreadingAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED CURRENTRELEASE QA Contact: Jiri Belka <jbelka>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.4.0CC: amureini, bazulay, danken, dougsland, gklein, iheim, oourfali, scohen, tnisan, ybronhei, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: python-pthreading-0.1.3-3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1118429 1119025 1119226 (view as bug list) Environment:
Last Closed: 2015-02-17 17:13:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1118429, 1119025, 1119226, 1142923, 1156165    

Description Nir Soffer 2014-07-09 12:07:59 UTC
Description of problem:

python-pthreading 1.3 introduced a change which is not compatible
with vdsm.

When vdsm is monitoring the mailbox and a read operation fail,
vdsm try to log the error. However the code is ensuring first
that a lock is released by calling the locked() method of the
lock object. Unfortunately, pthreading.Lock does not implement
this method, which cause a new exception to be raised, causing
the mailbox thread to exit. From this point, disk extend request
sent over the mailbox are not handled, leading to pause of vms
when their disk become full.

This issue in pthreading exists since the first version - it never
implemented the locked() method, probably because it is not well
documented. The error was hidden by the fact that mailbox does not
use threading.Lock, which is replaced by pthreading.Lock when using
pthreading, but thread.allocate_lock. In version 1.3, pthreading
started to replace also thread.allocate_lock with pthreading.Lock,
causing mailbox code to fail.

Version-Release number of selected component (if applicable):
python-pthreading-0.1.3-1.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Wait until there is io error on the master domain and mailbox
   monitor thread exit
2. Watch vms pause

Workaround:

Downgrading python-pthreading to version python-pthreading-0.1.2-1.el6ev.noarch eliminates this issue.

Comment 1 Nir Soffer 2014-07-10 09:28:35 UTC
The vdsm patch eliminate this error in vdsm even with the broken pthreading version.

Not setting this to POST since pthreading fix is required to support older vedsm version anyway.

Comment 4 Nir Soffer 2014-07-10 14:18:03 UTC
The pthreading patch is already merge in upstream, and we don't have a downstream branch for this project, so this can probably be MODIFIED now.

Comment 5 Nir Soffer 2014-07-10 14:18:04 UTC
The pthreading patch is already merge in upstream, and we don't have a downstream branch for this project, so this can probably be MODIFIED now.

Comment 6 Douglas Schilling Landgraf 2014-07-10 16:52:24 UTC
(In reply to Nir Soffer from comment #4)
> The pthreading patch is already merge in upstream, and we don't have a
> downstream branch for this project, so this can probably be MODIFIED now.

We need to build downstream. However, we need the triple acks first as Allon shared in comment#3.

Comment 7 Barak 2014-07-13 17:03:17 UTC
Yaniv,

1 - need to change vdsm dependency as well
2 - need a new build for vdsm 
3 - what about the 3.4.z flag ? is nir's fix enough ? or do we need also to 
    backport the pthreading fix ?

Comment 8 Yaniv Bronhaim 2014-07-13 17:09:00 UTC
1. we wait to have the new pthreading in stable repo. already submitted
2. yes, we need new build for 3.4 , and build for pthreading that already in testing
3. nir fix is enough for this case, but better to have the pthreading fix available. which we do both as fast as possible

Comment 12 Nir Soffer 2014-07-18 18:47:38 UTC
How python-pthreading bug is fixed by vdsm?

Comment 13 Jiri Belka 2014-07-23 12:41:24 UTC
Fixed in version seems wrong, shouldn't it be python-pthreading? Please correct.

Comment 14 Jiri Belka 2014-07-23 12:59:23 UTC
ok, same version as for zstream BZ1119226.

Comment 16 Eyal Edri 2015-02-17 17:13:29 UTC
rhev 3.5.0 was released. closing.