Bug 711089 - 3.1 - VDSM: when blocking connectivity to master SD while running task createVolume the host will reboot
Summary: 3.1 - VDSM: when blocking connectivity to master SD while running task create...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: beta
: ---
Assignee: Federico Simoncelli
QA Contact: Dafna Ron
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-06 14:03 UTC by Dafna Ron
Modified: 2014-07-01 11:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-25 10:20:31 UTC
Target Upstream Version:


Attachments (Terms of Use)
logs (1.11 MB, application/x-gzip)
2011-06-06 14:03 UTC, Dafna Ron
no flags Details
vdsm log (3.70 MB, application/octet-stream)
2012-07-11 11:33 UTC, Dafna Ron
no flags Details
spm lock (28.26 KB, application/octet-stream)
2012-07-11 11:34 UTC, Dafna Ron
no flags Details

Description Dafna Ron 2011-06-06 14:03:56 UTC
Created attachment 503241 [details]
logs

Description of problem:

if you have a task running on the host (such as createVolume) and you block connectivity to master SD, the host will reboot

Version-Release number of selected component (if applicable):

tested on both:

vdsm-4.9-73.el6.x86_64
vdsm-4.9-74.el6.x86_64

How reproducible:

75%

Steps to Reproduce:
1. create a new disk for a VM
2. block connectivity to master SD from the host using iptables
3.
  
Actual results:


host will reboot

Expected results:

only the task should fail - host should not reboot

Additional info:logs for both vdsm 73 and 74

MainThread::INFO::2011-06-06 16:46:19,939::vdsm::71::vds::(run) I am the actual vdsm 4.9-74
MainThread::DEBUG::2011-06-06 16:46:20,130::lvm::379::OperationMutex::(_reloadpvs) Operation 'lvm reload operation' got the operation mutex
MainThread::DEBUG::2011-06-06 16:46:20,131::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm pvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/10077Daffi-Test1|/dev/mapper/10077Daffi-Test2%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_
write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_c
ount,pe_alloc_count,mda_count' (cwd None)
MainThread::DEBUG::2011-06-06 16:47:48,941::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  /dev/mapper/10077Daffi-Test2: read failed after 0 of 4096 at 21474770944: In
put/output error\n  /dev/mapper/10077Daffi-Test2: read failed after 0 of 4096 at 21474828288: Input/output error\n  /dev/mapper/10077Daffi-Test2: read failed after 0 of 4096 at 
0: Input/output error\n  /dev/mapper/10077Daffi-Test2: read failed after 0 of 4096 at 4096: Input/output error\n  /dev/mapper/10077Daffi-Test1: read failed after 0 of 4096 at 21
474770944: Input/output error\n  /dev/mapper/10077Daffi-Test1: read failed after 0 of 4MainThread::WARNING::2011-06-06 16:49:39,118::vdsmDebugPlugin::16::DebugInterpreter::(__tu
rnOnDebugPlugin) Starting Debug Interpreter. Tread lightly!
MainThread::INFO::2011-06-06 16:49:39,127::vdsm::71::vds::(run) I am the actual vdsm 4.9-74
MainThread::DEBUG::2011-06-06 16:49:39,635::lvm::379::OperationMutex::(_reloadpvs) Operation 'lvm reload operation' got the operation mutex
MainThread::DEBUG::2011-06-06 16:49:39,636::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm pvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 filter = [ \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain
_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count' (cwd None)
MainThread::DEBUG::2011-06-06 16:49:40,204::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
MainThread::DEBUG::2011-06-06 16:49:40,210::lvm::402::OperationMutex::(_reloadpvs) Operation 'lvm reload operation' released the operation mutex
MainThread::DEBUG::2011-06-06 16:49:40,211::lvm::412::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
MainThread::DEBUG::2011-06-06 16:49:40,212::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 filter = [ \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain
_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags' (cwd None)
MainThread::DEBUG::2011-06-06 16:49:40,469::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  No volume groups found\n'; <rc> = 0
MainThread::DEBUG::2011-06-06 16:49:40,470::lvm::439::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
MainThread::DEBUG::2011-06-06 16:49:40,471::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 filter = [ \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain
_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags' (cwd None)
MainThread::DEBUG::2011-06-06 16:49:40,722::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  No volume groups found\n'; <rc> = 0
MainThread::DEBUG::2011-06-06 16:49:40,901::resourceManager::353::ResourceManager::(registerNamespace) Registering namespace 'Storage'
MainThread::DEBUG::2011-06-06 16:49:40,902::threadPool::25::Misc.ThreadPool::(__init__) Enter - numThreads: 10.0, waitTimeout: 3, maxTasks: 500.0
MainThread::DEBUG::2011-06-06 16:49:40,910::spm::216::Storage.SPM::(__cleanupSPMLinks) cleaning links; ['/rhev/data-center/5af64842-af1f-4548-9098-0ea988cb133e/vms'] ['/rhev/dat
a-center/5af64842-af1f-4548-9098-0ea988cb133e/tasks']
MainThread::DEBUG::2011-06-06 16:49:40,911::spm::207::Storage.SPM::(__cleanupMasterMount) master `/rhev/data-center/mnt/blockSD/6ab53ec2-e582-4a0f-9406-26c47c9a0266/master` is n
ot mounted, skipping

Comment 3 Saggi Mizrahi 2012-04-30 20:58:10 UTC
Please retest with upstream VDSM,
VDSM should restart but prevent a host reboot

Comment 5 RHEL Program Management 2012-05-05 04:14:44 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 9 Saggi Mizrahi 2012-07-10 15:13:46 UTC
Dafa, Can this be closed then?

Comment 10 Dafna Ron 2012-07-11 11:01:43 UTC
I tested today with vdsm-4.9.6-17.0.el6.x86_64 on rhel6.3
if we close this bug we need to clone to downstream since its reproduced - host rebooted with reboot -f  

attaching logs (vdsm and spm-lock)

Comment 11 Dafna Ron 2012-07-11 11:33:01 UTC
Created attachment 597538 [details]
vdsm log

Comment 12 Dafna Ron 2012-07-11 11:34:45 UTC
Created attachment 597539 [details]
spm lock

Comment 21 Dafna Ron 2012-09-25 10:20:31 UTC
tested with si18.1
vdsm-4.9.6-34.0.el6_3.x86_64
the host did not reboot. 
closing bug


Note You need to log in before you can comment on or make changes to this bug.