Bug 711089

Summary: 3.1 - VDSM: when blocking connectivity to master SD while running task createVolume the host will reboot
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: medium    
Version: 6.1CC: abaron, bazulay, danken, hateya, iheim, ykaul
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-25 10:20:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
logs
none
vdsm log
none
spm lock none

Description Dafna Ron 2011-06-06 14:03:56 UTC
Created attachment 503241 [details]
logs

Description of problem:

if you have a task running on the host (such as createVolume) and you block connectivity to master SD, the host will reboot

Version-Release number of selected component (if applicable):

tested on both:

vdsm-4.9-73.el6.x86_64
vdsm-4.9-74.el6.x86_64

How reproducible:

75%

Steps to Reproduce:
1. create a new disk for a VM
2. block connectivity to master SD from the host using iptables
3.
  
Actual results:


host will reboot

Expected results:

only the task should fail - host should not reboot

Additional info:logs for both vdsm 73 and 74

MainThread::INFO::2011-06-06 16:46:19,939::vdsm::71::vds::(run) I am the actual vdsm 4.9-74
MainThread::DEBUG::2011-06-06 16:46:20,130::lvm::379::OperationMutex::(_reloadpvs) Operation 'lvm reload operation' got the operation mutex
MainThread::DEBUG::2011-06-06 16:46:20,131::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm pvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/10077Daffi-Test1|/dev/mapper/10077Daffi-Test2%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_
write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_c
ount,pe_alloc_count,mda_count' (cwd None)
MainThread::DEBUG::2011-06-06 16:47:48,941::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  /dev/mapper/10077Daffi-Test2: read failed after 0 of 4096 at 21474770944: In
put/output error\n  /dev/mapper/10077Daffi-Test2: read failed after 0 of 4096 at 21474828288: Input/output error\n  /dev/mapper/10077Daffi-Test2: read failed after 0 of 4096 at 
0: Input/output error\n  /dev/mapper/10077Daffi-Test2: read failed after 0 of 4096 at 4096: Input/output error\n  /dev/mapper/10077Daffi-Test1: read failed after 0 of 4096 at 21
474770944: Input/output error\n  /dev/mapper/10077Daffi-Test1: read failed after 0 of 4MainThread::WARNING::2011-06-06 16:49:39,118::vdsmDebugPlugin::16::DebugInterpreter::(__tu
rnOnDebugPlugin) Starting Debug Interpreter. Tread lightly!
MainThread::INFO::2011-06-06 16:49:39,127::vdsm::71::vds::(run) I am the actual vdsm 4.9-74
MainThread::DEBUG::2011-06-06 16:49:39,635::lvm::379::OperationMutex::(_reloadpvs) Operation 'lvm reload operation' got the operation mutex
MainThread::DEBUG::2011-06-06 16:49:39,636::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm pvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 filter = [ \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain
_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count' (cwd None)
MainThread::DEBUG::2011-06-06 16:49:40,204::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
MainThread::DEBUG::2011-06-06 16:49:40,210::lvm::402::OperationMutex::(_reloadpvs) Operation 'lvm reload operation' released the operation mutex
MainThread::DEBUG::2011-06-06 16:49:40,211::lvm::412::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
MainThread::DEBUG::2011-06-06 16:49:40,212::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 filter = [ \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain
_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags' (cwd None)
MainThread::DEBUG::2011-06-06 16:49:40,469::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  No volume groups found\n'; <rc> = 0
MainThread::DEBUG::2011-06-06 16:49:40,470::lvm::439::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
MainThread::DEBUG::2011-06-06 16:49:40,471::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 filter = [ \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain
_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags' (cwd None)
MainThread::DEBUG::2011-06-06 16:49:40,722::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  No volume groups found\n'; <rc> = 0
MainThread::DEBUG::2011-06-06 16:49:40,901::resourceManager::353::ResourceManager::(registerNamespace) Registering namespace 'Storage'
MainThread::DEBUG::2011-06-06 16:49:40,902::threadPool::25::Misc.ThreadPool::(__init__) Enter - numThreads: 10.0, waitTimeout: 3, maxTasks: 500.0
MainThread::DEBUG::2011-06-06 16:49:40,910::spm::216::Storage.SPM::(__cleanupSPMLinks) cleaning links; ['/rhev/data-center/5af64842-af1f-4548-9098-0ea988cb133e/vms'] ['/rhev/dat
a-center/5af64842-af1f-4548-9098-0ea988cb133e/tasks']
MainThread::DEBUG::2011-06-06 16:49:40,911::spm::207::Storage.SPM::(__cleanupMasterMount) master `/rhev/data-center/mnt/blockSD/6ab53ec2-e582-4a0f-9406-26c47c9a0266/master` is n
ot mounted, skipping

Comment 3 Saggi Mizrahi 2012-04-30 20:58:10 UTC
Please retest with upstream VDSM,
VDSM should restart but prevent a host reboot

Comment 5 RHEL Program Management 2012-05-05 04:14:44 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 9 Saggi Mizrahi 2012-07-10 15:13:46 UTC
Dafa, Can this be closed then?

Comment 10 Dafna Ron 2012-07-11 11:01:43 UTC
I tested today with vdsm-4.9.6-17.0.el6.x86_64 on rhel6.3
if we close this bug we need to clone to downstream since its reproduced - host rebooted with reboot -f  

attaching logs (vdsm and spm-lock)

Comment 11 Dafna Ron 2012-07-11 11:33:01 UTC
Created attachment 597538 [details]
vdsm log

Comment 12 Dafna Ron 2012-07-11 11:34:45 UTC
Created attachment 597539 [details]
spm lock

Comment 21 Dafna Ron 2012-09-25 10:20:31 UTC
tested with si18.1
vdsm-4.9.6-34.0.el6_3.x86_64
the host did not reboot. 
closing bug