Bug 854019 - we do not recover from access list issues
we do not recover from access list issues
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
unspecified
x86_64 Linux
high Severity high
: ---
: 3.3.0
Assigned To: Ayal Baron
Elad
storage
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-03 11:12 EDT by Dafna Ron
Modified: 2016-02-10 12:10 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-21 11:02:58 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
scohen: Triaged+


Attachments (Terms of Use)
logs (10.96 MB, application/x-gzip)
2012-09-03 11:12 EDT, Dafna Ron
no flags Details

  None (edit)
Description Dafna Ron 2012-09-03 11:12:08 EDT
Created attachment 609408 [details]
logs

Description of problem:

In a setup where the domains are made out of luns from different storage's (extended) I removed my host's from one of the Storage's access list. 
after about 2 hours I restored the access list and the hosts can see the storage and yet we do not recover - vdsm log still shows that the domains are inaccessible

after vdsm restart we cannot see some of the device and spm becomes non-operational 
Version-Release number of selected component (if applicable):

si16
vdsm-4.9.6-31.0.el6_3.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create domains which have luns from different storage servers
2. remove the hosts from one of the storage's access list's
3.
  
Actual results:

we do not recover when hosts are added back to the storage access list although running vgs I can see that we are seeing the luns. 

Expected results:

we should be able to recover

Additional info:

 VG                                   #PV #LV #SN Attr   VSize   VFree  
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 352.75g
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g
  vg0                                    1   3   0 wz--n- 136.24g      0 

  VG                                   #PV #LV #SN Attr   VSize   VFree   PV                                           
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/360a98000572d45366b4a6d4156565377
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ec                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002eb                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002e9                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ed                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032a                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032c       

 VG                                   #PV #LV #SN Attr   VSize   VFree   PV                               
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/1Dafna-si16-031346574
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e2    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e7    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e8    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002ea    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580032d    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580032e    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580030f    
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c5695800310    


hread-9850::DEBUG::2012-09-03 17:25:24,204::vm::739::vm.Vm::(_lvExtend) vmId=`689e4e9b-19cf-44f1-b014-cdcf17687c4e`::b1f82e00-4647-4df0-8190-6ba4e811f5a5/f948e894-8631-4fdb-81
8a-4e9b618e90dc (hda): apparentsize 2048 req 3072
Dummy-119::DEBUG::2012-09-03 17:25:24,361::__init__::1164::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/f570527f-004a-4cab-8bee-129fa589bec5/mastersd/dom_md/inbox iflag
=direct,fullblock count=1 bs=1024000' (cwd None)
Thread-50::ERROR::2012-09-03 17:25:24,408::domainMonitor::191::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain bc7fde7a-4d43-4dd4-874a-bff5ca517bae monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 169, in _monitorDomain
    self.domain.selftest()
  File "/usr/share/vdsm/storage/blockSD.py", line 714, in selftest
    raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('bc7fde7a-4d43-4dd4-874a-bff5ca517bae',)
Thread-48::ERROR::2012-09-03 17:25:24,410::domainMonitor::191::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 0dc1433f-72e6-4b62-9845-dc022a191f4f monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 169, in _monitorDomain
    self.domain.selftest()
  File "/usr/share/vdsm/storage/blockSD.py", line 714, in selftest
    raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('0dc1433f-72e6-4b62-9845-dc022a191f4f',)
Dummy-119::DEBUG::2012-09-03 17:25:24,510::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0778565 s, 13.2 MB/s\n'; <rc> = 0
Dummy-119::DEBUG::2012-09-03 17:25:24,511::storage_mailbox::580::Storage.MailBox.SpmMailMonitor::(_handleRequests) SPM_MailMonitor: Mailbox 1 validated, checking mail
Dummy-119::DEBUG::2012-09-03 17:25:24,515::storage_mailbox::580::Storage.MailBox.SpmMailMonitor::(_handleRequests) SPM_MailMonitor: Mailbox 2 validated, checking mail
Dummy-119::DEBUG::2012-09-03 17:25:24,525::__init__::1164::Storage.Misc.excCmd::(_log) 'dd of=/rhev/data-center/f570527f-004a-4cab-8bee-129fa589bec5/mastersd/dom_md/outbox oflag=direct iflag=fullblock conv=notrunc count=1 bs=1024000' (cwd None)
Dummy-119::DEBUG::2012-09-03 17:25:24,629::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0705315 s, 14.5 MB/s\n'; <rc> = 0


after vdsm restart the hosts cannot see devices on some of the domains and spm becomes non-operational: 

vgs  -o+pv_name
 VG                                   #PV #LV #SN Attr   VSize   VFree   PV                                           
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g /dev/mapper/1Dafna-si16-011346574            
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800315                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800332                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800333                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g unknown device                               
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g unknown device                               
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz-pn- 352.38g 241.50g unknown device                               
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/360a98000572d45366b4a6d4156565377
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ec                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002eb                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002e9                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ed                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032a                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032c                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/1Dafna-si16-021346574            
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e3                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e6                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e4                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e5                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c5695800330                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c5695800331                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g unknown device                               
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz-pn- 416.62g 412.75g unknown device                               
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002ee                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002ef                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f3                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f2                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f0                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f1                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c5695800327                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c5695800328                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/1Dafna-si16-031346574            
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e2                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e7                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e8                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002ea                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c569580032d                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c569580032e                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g unknown device                               
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz-pn- 416.62g 410.75g unknown device                               
  vg0                                    1   3   0 wz--n- 136.24g      0  /dev/sda2
Comment 1 Ayal Baron 2012-09-06 07:47:21 EDT
There are 7 devices listed as unknown in the output above so clearly multipath has not recovered these paths yet, making the domains partial (only a subset of the disks are accessible).
Closing as dup of: 854140

*** This bug has been marked as a duplicate of bug 854140 ***
Comment 7 RHEL Product and Program Management 2012-12-14 02:52:46 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 8 Ayal Baron 2013-03-20 06:21:16 EDT
Haim, can you retest this as well? after the changes in getDeviceList it could be fixed.
Comment 10 Aharon Canan 2013-08-01 10:39:58 EDT
We need to retest.
no extra info needed here, removing the"needinfo" flag and taking it for verification.
Comment 11 Elad 2013-08-08 11:52:10 EDT
After mapping hosts back to LUN on the storage server, vdsm is able to activate the domain again.


Verified on RHEVM3.3-IS8
vdsm-4.12.0-rc3.13.git06ed3cc.el6ev.x86_64
rhevm-3.3.0-0.13.master.el6ev.noarch
Comment 12 Charlie 2013-11-27 19:27:25 EST
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.
Comment 13 errata-xmlrpc 2014-01-21 11:02:58 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html

Note You need to log in before you can comment on or make changes to this bug.