Bug 854019 - we do not recover from access list issues
Summary: we do not recover from access list issues
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: unspecified
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.3.0
Assignee: Ayal Baron
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-03 15:12 UTC by Dafna Ron
Modified: 2016-02-10 17:10 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-21 16:02:58 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
scohen: Triaged+


Attachments (Terms of Use)
logs (10.96 MB, application/x-gzip)
2012-09-03 15:12 UTC, Dafna Ron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0040 0 normal SHIPPED_LIVE vdsm bug fix and enhancement update 2014-01-21 20:26:21 UTC

Description Dafna Ron 2012-09-03 15:12:08 UTC
Created attachment 609408 [details]
logs

Description of problem:

In a setup where the domains are made out of luns from different storage's (extended) I removed my host's from one of the Storage's access list. 
after about 2 hours I restored the access list and the hosts can see the storage and yet we do not recover - vdsm log still shows that the domains are inaccessible

after vdsm restart we cannot see some of the device and spm becomes non-operational 
Version-Release number of selected component (if applicable):

si16
vdsm-4.9.6-31.0.el6_3.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create domains which have luns from different storage servers
2. remove the hosts from one of the storage's access list's
3.
  
Actual results:

we do not recover when hosts are added back to the storage access list although running vgs I can see that we are seeing the luns. 

Expected results:

we should be able to recover

Additional info:

 VG                                   #PV #LV #SN Attr   VSize   VFree  
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 352.75g
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g
  vg0                                    1   3   0 wz--n- 136.24g      0 

  VG                                   #PV #LV #SN Attr   VSize   VFree   PV                                           
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/360a98000572d45366b4a6d4156565377
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ec                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002eb                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002e9                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ed                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032a                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032c       

 VG                                   #PV #LV #SN Attr   VSize   VFree   PV                               
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/1Dafna-si16-031346574
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e2    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e7    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e8    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002ea    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580032d    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580032e    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580030f    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c5695800310    


hread-9850::DEBUG::2012-09-03 17:25:24,204::vm::739::vm.Vm::(_lvExtend) vmId=`689e4e9b-19cf-44f1-b014-cdcf17687c4e`::b1f82e00-4647-4df0-8190-6ba4e811f5a5/f948e894-8631-4fdb-81
8a-4e9b618e90dc (hda): apparentsize 2048 req 3072
Dummy-119::DEBUG::2012-09-03 17:25:24,361::__init__::1164::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/f570527f-004a-4cab-8bee-129fa589bec5/mastersd/dom_md/inbox iflag
=direct,fullblock count=1 bs=1024000' (cwd None)
Thread-50::ERROR::2012-09-03 17:25:24,408::domainMonitor::191::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain bc7fde7a-4d43-4dd4-874a-bff5ca517bae monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 169, in _monitorDomain
    self.domain.selftest()
  File "/usr/share/vdsm/storage/blockSD.py", line 714, in selftest
    raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('bc7fde7a-4d43-4dd4-874a-bff5ca517bae',)
Thread-48::ERROR::2012-09-03 17:25:24,410::domainMonitor::191::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 0dc1433f-72e6-4b62-9845-dc022a191f4f monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 169, in _monitorDomain
    self.domain.selftest()
  File "/usr/share/vdsm/storage/blockSD.py", line 714, in selftest
    raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('0dc1433f-72e6-4b62-9845-dc022a191f4f',)
Dummy-119::DEBUG::2012-09-03 17:25:24,510::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0778565 s, 13.2 MB/s\n'; <rc> = 0
Dummy-119::DEBUG::2012-09-03 17:25:24,511::storage_mailbox::580::Storage.MailBox.SpmMailMonitor::(_handleRequests) SPM_MailMonitor: Mailbox 1 validated, checking mail
Dummy-119::DEBUG::2012-09-03 17:25:24,515::storage_mailbox::580::Storage.MailBox.SpmMailMonitor::(_handleRequests) SPM_MailMonitor: Mailbox 2 validated, checking mail
Dummy-119::DEBUG::2012-09-03 17:25:24,525::__init__::1164::Storage.Misc.excCmd::(_log) 'dd of=/rhev/data-center/f570527f-004a-4cab-8bee-129fa589bec5/mastersd/dom_md/outbox oflag=direct iflag=fullblock conv=notrunc count=1 bs=1024000' (cwd None)
Dummy-119::DEBUG::2012-09-03 17:25:24,629::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0705315 s, 14.5 MB/s\n'; <rc> = 0


after vdsm restart the hosts cannot see devices on some of the domains and spm becomes non-operational: 

vgs  -o+pv_name
 VG                                   #PV #LV #SN Attr   VSize   VFree   PV                                           
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g /dev/mapper/1Dafna-si16-011346574            
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800315                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800332                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800333                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g unknown device                               
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g unknown device                               
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g unknown device                               
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/360a98000572d45366b4a6d4156565377
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ec                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002eb                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002e9                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ed                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032a                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032c                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/1Dafna-si16-021346574            
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e3                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e6                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e4                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e5                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c5695800330                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c5695800331                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g unknown device                               
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g unknown device                               
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002ee                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002ef                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f3                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f2                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f0                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f1                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c5695800327                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c5695800328                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/1Dafna-si16-031346574            
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e2                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e7                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e8                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002ea                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c569580032d                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c569580032e                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g unknown device                               
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g unknown device                               
  vg0                                    1   3   0 wz--n- 136.24g      0  /dev/sda2

Comment 1 Ayal Baron 2012-09-06 11:47:21 UTC
There are 7 devices listed as unknown in the output above so clearly multipath has not recovered these paths yet, making the domains partial (only a subset of the disks are accessible).
Closing as dup of: 854140

*** This bug has been marked as a duplicate of bug 854140 ***

Comment 7 RHEL Program Management 2012-12-14 07:52:46 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 8 Ayal Baron 2013-03-20 10:21:16 UTC
Haim, can you retest this as well? after the changes in getDeviceList it could be fixed.

Comment 10 Aharon Canan 2013-08-01 14:39:58 UTC
We need to retest.
no extra info needed here, removing the"needinfo" flag and taking it for verification.

Comment 11 Elad 2013-08-08 15:52:10 UTC
After mapping hosts back to LUN on the storage server, vdsm is able to activate the domain again.


Verified on RHEVM3.3-IS8
vdsm-4.12.0-rc3.13.git06ed3cc.el6ev.x86_64
rhevm-3.3.0-0.13.master.el6ev.noarch

Comment 12 Charlie 2013-11-28 00:27:25 UTC
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 13 errata-xmlrpc 2014-01-21 16:02:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html


Note You need to log in before you can comment on or make changes to this bug.