Bug 807351 - [ovirt] [vdsm] NFS ISO\Export domain will not recover after failure if they enter (deleted) state
Summary: [ovirt] [vdsm] NFS ISO\Export domain will not recover after failure if they e...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Saggi Mizrahi
QA Contact: Jakub Libosvar
URL:
Whiteboard: storage
Depends On:
Blocks: 907253 907255
TreeView+ depends on / blocked
 
Reported: 2012-03-27 15:09 UTC by Haim
Modified: 2022-07-09 05:34 UTC (History)
13 users (show)

Fixed In Version: vdsm-4.9.6-11
Doc Type: Bug Fix
Doc Text:
Previously, when an ISO domain lost SPM connectivity, connection to the ISO domain would fail to restore even though the mount was eligible. A patch to VDSM ensures that ISO domains are autorecovered after their connectivity is restored.
Clone Of:
: 907253 907255 (view as bug list)
Environment:
Last Closed: 2012-12-04 18:56:48 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:1508 0 normal SHIPPED_LIVE Important: rhev-3.1.0 vdsm security, bug fix, and enhancement update 2012-12-04 23:48:05 UTC

Description Haim 2012-03-27 15:09:55 UTC
Description of problem:

- have ISO domain running
- block domain connectivity from SPM
- wait an hour or so
- reinstate connection to mount
- vdsm will not be able to recover domain connectivity still altough mount is eligible

reason: vdsm uses '/proc/mounts' to know wither mount is active or not, in my case, operating system switched problematic mounts to '/040(deleted)' state, given the fact vdsm uses the following piece of code (taken from vdsm/storage/mount.py +72: 

 73 def getMountFromTarget(target):                                                                                                                                                                                  
 74     target = normpath(target)                                                                                                                                                                                    
 75     for rec in _iterMountRecords():                                                                                                                                                                              
 76         if rec.fs_file == target:                                                                                                                                                                                
 77             return Mount(rec.fs_spec, rec.fs_file) 

on line 76, we are performing comparison between strings of current mount point states in file, and given target, whereas, in my case

target != target+'\040(deleted)', so domain keeps being non-responsive. 


logs from vdsm:

Thread-95::ERROR::2012-03-27 17:08:12,905::domainMonitor::120::Storage.DomainMonitor::(_monitorDomain) Error while collecting domain `2612f0b2-8310-4d1f-adba-b708eb95f5cf` monitoring information                   
Traceback (most recent call last):                                                                                                                                                                                   
  File "/usr/share/vdsm/storage/domainMonitor.py", line 103, in _monitorDomain                                                                                                                                       
    domain.selftest()                                                                                                                                                                                                
  File "/usr/share/vdsm/storage/nfsSD.py", line 127, in selftest                                                                                                                                                     
    raise se.StorageDomainFSNotMounted(self.mountpoint)                                                                                                                                                              
StorageDomainFSNotMounted: Storage domain remote path not mounted: ('/rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_shared__iso__domain',)                                                            
Thread-92::ERROR::2012-03-27 17:08:14,927::domainMonitor::120::Storage.DomainMonitor::(_monitorDomain) Error while collecting domain `a965892a-2dd9-4e50-ad88-9912d299746c` monitoring information                   
Traceback (most recent call last):                                                                                                                                                                                   
  File "/usr/share/vdsm/storage/domainMonitor.py", line 103, in _monitorDomain                                                                                                                                       
    domain.selftest()                                                                                                                                                                                                
  File "/usr/share/vdsm/storage/nfsSD.py", line 127, in selftest                                                                                                                                                     
    raise se.StorageDomainFSNotMounted(self.mountpoint)                                                                                                                                                              
StorageDomainFSNotMounted: Storage domain remote path not mounted: ('/rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_hateya_webadmin-export',)

Comment 2 Dan Kenigsberg 2012-04-03 09:44:05 UTC
Saggi, please pet your BZs
http://gerrit.ovirt.org/3261

Comment 3 Dan Kenigsberg 2012-05-08 21:05:43 UTC
Haim, does this reproduce with a rhel-6.3 kernel?

Comment 4 Haim 2012-05-09 06:05:42 UTC
(In reply to comment #3)
> Haim, does this reproduce with a rhel-6.3 kernel?

I don't understand, Saggi sent a patch, where he fixes the bug, why should I check if its reproducible on rhel6.3 ?

Comment 5 Dan Kenigsberg 2012-05-09 07:55:10 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Haim, does this reproduce with a rhel-6.3 kernel?
> 
> I don't understand, Saggi sent a patch, where he fixes the bug, why should I
> check if its reproducible on rhel6.3 ?

Because I do not like Saggi's patch, and I have a feeling that neither he nor I really understand what is the state your "deleted" mountpoint is at. Maybe someone from kernel could help?

Comment 7 Jakub Libosvar 2012-05-15 08:38:37 UTC
I can't reproduce with rhel 6.3 and vdsm-4.9.6-10 therefore moving to Verified

Comment 11 errata-xmlrpc 2012-12-04 18:56:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html


Note You need to log in before you can comment on or make changes to this bug.