Bug 807351

Summary: [ovirt] [vdsm] NFS ISO\Export domain will not recover after failure if they enter (deleted) state
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: vdsmAssignee: Saggi Mizrahi <smizrahi>
Status: CLOSED ERRATA QA Contact: Jakub Libosvar <jlibosva>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.3CC: abaron, acathrow, bazulay, cpelland, danken, dyasny, iheim, jbiddle, mgoldboi, syeghiay, yeylon, ykaul, zdover
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: vdsm-4.9.6-11 Doc Type: Bug Fix
Doc Text:
Previously, when an ISO domain lost SPM connectivity, connection to the ISO domain would fail to restore even though the mount was eligible. A patch to VDSM ensures that ISO domains are autorecovered after their connectivity is restored.
Story Points: ---
Clone Of:
: 907253 907255 (view as bug list) Environment:
Last Closed: 2012-12-04 18:56:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 907253, 907255    

Description Haim 2012-03-27 15:09:55 UTC
Description of problem:

- have ISO domain running
- block domain connectivity from SPM
- wait an hour or so
- reinstate connection to mount
- vdsm will not be able to recover domain connectivity still altough mount is eligible

reason: vdsm uses '/proc/mounts' to know wither mount is active or not, in my case, operating system switched problematic mounts to '/040(deleted)' state, given the fact vdsm uses the following piece of code (taken from vdsm/storage/mount.py +72: 

 73 def getMountFromTarget(target):                                                                                                                                                                                  
 74     target = normpath(target)                                                                                                                                                                                    
 75     for rec in _iterMountRecords():                                                                                                                                                                              
 76         if rec.fs_file == target:                                                                                                                                                                                
 77             return Mount(rec.fs_spec, rec.fs_file) 

on line 76, we are performing comparison between strings of current mount point states in file, and given target, whereas, in my case

target != target+'\040(deleted)', so domain keeps being non-responsive. 


logs from vdsm:

Thread-95::ERROR::2012-03-27 17:08:12,905::domainMonitor::120::Storage.DomainMonitor::(_monitorDomain) Error while collecting domain `2612f0b2-8310-4d1f-adba-b708eb95f5cf` monitoring information                   
Traceback (most recent call last):                                                                                                                                                                                   
  File "/usr/share/vdsm/storage/domainMonitor.py", line 103, in _monitorDomain                                                                                                                                       
    domain.selftest()                                                                                                                                                                                                
  File "/usr/share/vdsm/storage/nfsSD.py", line 127, in selftest                                                                                                                                                     
    raise se.StorageDomainFSNotMounted(self.mountpoint)                                                                                                                                                              
StorageDomainFSNotMounted: Storage domain remote path not mounted: ('/rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_shared__iso__domain',)                                                            
Thread-92::ERROR::2012-03-27 17:08:14,927::domainMonitor::120::Storage.DomainMonitor::(_monitorDomain) Error while collecting domain `a965892a-2dd9-4e50-ad88-9912d299746c` monitoring information                   
Traceback (most recent call last):                                                                                                                                                                                   
  File "/usr/share/vdsm/storage/domainMonitor.py", line 103, in _monitorDomain                                                                                                                                       
    domain.selftest()                                                                                                                                                                                                
  File "/usr/share/vdsm/storage/nfsSD.py", line 127, in selftest                                                                                                                                                     
    raise se.StorageDomainFSNotMounted(self.mountpoint)                                                                                                                                                              
StorageDomainFSNotMounted: Storage domain remote path not mounted: ('/rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_hateya_webadmin-export',)

Comment 2 Dan Kenigsberg 2012-04-03 09:44:05 UTC
Saggi, please pet your BZs
http://gerrit.ovirt.org/3261

Comment 3 Dan Kenigsberg 2012-05-08 21:05:43 UTC
Haim, does this reproduce with a rhel-6.3 kernel?

Comment 4 Haim 2012-05-09 06:05:42 UTC
(In reply to comment #3)
> Haim, does this reproduce with a rhel-6.3 kernel?

I don't understand, Saggi sent a patch, where he fixes the bug, why should I check if its reproducible on rhel6.3 ?

Comment 5 Dan Kenigsberg 2012-05-09 07:55:10 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Haim, does this reproduce with a rhel-6.3 kernel?
> 
> I don't understand, Saggi sent a patch, where he fixes the bug, why should I
> check if its reproducible on rhel6.3 ?

Because I do not like Saggi's patch, and I have a feeling that neither he nor I really understand what is the state your "deleted" mountpoint is at. Maybe someone from kernel could help?

Comment 7 Jakub Libosvar 2012-05-15 08:38:37 UTC
I can't reproduce with rhel 6.3 and vdsm-4.9.6-10 therefore moving to Verified

Comment 11 errata-xmlrpc 2012-12-04 18:56:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html