Bug 788559 - ISO domain is not umounted if network is dropped
Summary: ISO domain is not umounted if network is dropped
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-08 13:34 UTC by Jakub Libosvar
Modified: 2016-04-18 06:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-17 07:46:26 UTC
oVirt Team: ---


Attachments (Terms of Use)
logs (166.72 KB, application/x-gzip)
2012-02-08 13:34 UTC, Jakub Libosvar
no flags Details

Description Jakub Libosvar 2012-02-08 13:34:07 UTC
Created attachment 560258 [details]
logs

Description of problem:
I have one host, two iscsi data domains - each on different server and one iso domain that shares storage server with master storage domain. I dropped incoming packets using iptables to server exporting iso and master domain (produces bug 787673). After connection is unblocked, ISO domain status changes to Active but backend cannot obtain statistics about it (same with former master domain but this is fixed as soon as some operation is performed there).

MainThread::DEBUG::2012-02-08 13:18:22,602::mount::111::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/umount /rhev/data-center/mnt/blockSD/a777bffb-acce-4c42-a379-1a88a840ca14/master' (cwd None)
Thread-417::ERROR::2012-02-08 13:18:29,326::domainMonitor::120::Storage.DomainMonitor::(_monitorDomain) Error while collecting domain `f1621233-13f4-4c6b-99e3-86fbd4dbdad2` monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 103, in _monitorDomain
    domain.selftest()
  File "/usr/share/vdsm/storage/nfsSD.py", line 131, in selftest
    fileSD.FileStorageDomain.selftest(self)
  File "/usr/share/vdsm/storage/fileSD.py", line 295, in selftest
    self.oop.os.statvfs(self.domaindir)
  File "/usr/share/vdsm/storage/processPool.py", line 53, in wrapper
    return self.runExternally(func, *args, **kwds)
  File "/usr/share/vdsm/storage/processPool.py", line 64, in runExternally
    return self._procPool.runExternally(*args, **kwargs)
  File "/usr/share/vdsm/storage/processPool.py", line 145, in runExternally
    raise Timeout("Operation Stuck")
Timeout: Operation Stuck

...

Thread-1494::DEBUG::2012-02-08 13:47:45,570::clientIF::261::Storage.Dispatcher.Protect::(wrapper) [10.34.63.26]
Thread-1494::DEBUG::2012-02-08 13:47:45,570::task::588::TaskManager.Task::(_updateState) Task=`32285a86-3126-4527-802f-719597d25824`::moving from state init -> state preparing
Thread-1494::INFO::2012-02-08 13:47:45,571::logUtils::37::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=1, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection': '10.34.63.202:/home/iso/shared', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '117a86c3-be84-451a-b9a6-5680d3b617b6', 'port': ''}], options=None)
Thread-1494::INFO::2012-02-08 13:47:45,571::storage_connection::167::Storage.ServerConnection::(disconnect) Request to disconnect NFS storage server
Thread-1494::WARNING::2012-02-08 13:47:45,573::storage_connection::382::Storage.ServerConnection::(__disconnectNFSServer) Cannot remove mountpoint after umount()
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storage_connection.py", line 378, in __disconnectNFSServer
    os.rmdir(mnt.fs_file)
OSError: [Errno 16] Device or resource busy: '/rhev/data-center/mnt/10.34.63.202:_home_iso_shared'
Thread-1494::DEBUG::2012-02-08 13:47:45,574::misc::1017::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage)
Thread-1494::DEBUG::2012-02-08 13:47:45,574::misc::1019::SamplingMethod::(__call__) Got in to sampling method
Thread-1494::DEBUG::2012-02-08 13:47:45,574::misc::1017::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan)
Thread-1494::DEBUG::2012-02-08 13:47:45,575::misc::1019::SamplingMethod::(__call__) Got in to sampling method
Thread-1494::DEBUG::2012-02-08 13:47:45,575::iscsi::389::Storage.Misc.excCmd::(rescan) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None)
Thread-1494::DEBUG::2012-02-08 13:47:45,600::iscsi::389::Storage.Misc.excCmd::(rescan) SUCCESS: <err> = ''; <rc> = 0
Thread-1494::DEBUG::2012-02-08 13:47:45,601::misc::1027::SamplingMethod::(__call__) Returning last result



 After a while ISO domain goes to Inactive but is still mounted on the host. As soon as ISO is activated by user, it fails due to it's already mounted with:

Thread-1567::DEBUG::2012-02-08 13:49:17,569::mount::111::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6 10.34.63.202:/home/iso/shared /tmp/tmp8sLigH' (cwd None)
Thread-1567::DEBUG::2012-02-08 13:49:17,638::mount::111::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/umount /tmp/tmp8sLigH' (cwd None)
Thread-1653::DEBUG::2012-02-08 13:50:56,232::clientIF::261::Storage.Dispatcher.Protect::(wrapper) [10.34.63.26]
Thread-1653::DEBUG::2012-02-08 13:50:56,232::task::588::TaskManager.Task::(_updateState) Task=`cacaae64-48ae-4c78-b831-f34521dcdbbf`::moving from state init -> state preparing
Thread-1653::INFO::2012-02-08 13:50:56,233::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'connection': '10.34.63.202:/home/iso/shared', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '117a86c3-be84-451a-b9a6-5680d3b617b6', 'port': ''}], options=None)
Thread-1653::INFO::2012-02-08 13:50:56,233::storage_connection::146::Storage.ServerConnection::(connect) Request to connect NFS storage server
Thread-1653::DEBUG::2012-02-08 13:50:56,237::mount::111::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6 10.34.63.202:/home/iso/shared /rhev/data-center/mnt/10.34.63.202:_home_iso_shared' (cwd None)
Thread-1653::ERROR::2012-02-08 13:50:56,347::storage_connection::255::Storage.ServerConnection::(__connectNFSServer) Error during storage connection
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storage_connection.py", line 253, in __connectNFSServer
    mnt.mount(getNfsOptions(con), mount.VFS_NFS, timeout=CON_TIMEOUT)
  File "/usr/share/vdsm/storage/mount.py", line 107, in mount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 122, in _runcmd
    raise MountError(rc, ";".join(out + err))
MountError: (32, 'm;o;u;n;t;.;n;f;s;:; ;m;o;u;n;t;i;n;g; ;1;0;.;3;4;.;6;3;.;2;0;2;:;/;h;o;m;e;/;i;s;o;/;s;h;a;r;e;d; ;f;a;i;l;e;d;,; ;r;e;a;s;o;n; ;g;i;v;e;n; ;b;y; ;s;e;r;v;e;r;:;\n; ; ;N;o; ;s;u;c;h; ;f;i;l;e; ;o;r; ;d;i;r;e;c;t;o;r;y;\n')
Thread-1653::DEBUG::2012-02-08 13:50:56,353::lvm::460::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-1653::DEBUG::2012-02-08 13:50:56,353::lvm::462::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-1653::DEBUG::2012-02-08 13:50:56,354::lvm::472::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-1653::DEBUG::2012-02-08 13:50:56,354::lvm::474::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-1653::DEBUG::2012-02-08 13:50:56,355::lvm::493::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-1653::DEBUG::2012-02-08 13:50:56,355::lvm::495::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex


Version-Release number of selected component (if applicable):
vdsm-4.9.3.2-0.fc16.x86_64
ovirt-engine-3.0.0_0001-1.4.fc16.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Block connection to ISO domain
2. Unblock connection
3. Activate ISO
  
Actual results:
Fails

Expected results:
Succeeds

Additional info:

ISO domain is still mounted on host
10.34.63.202:/home/iso/shared on /rhev/data-center/mnt/10.34.63.202:_home_iso_shared type nfs4 (rw,relatime,vers=4,rsize=524288,wsize=524288,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=10.34.63.208,minorversion=0,local_lock=none,addr=10.34.63.202)


is accessible on filesystem level: 
[root@srh-03 ~]# ll /rhev/data-center/mnt/10.34.63.202\:_home_iso_shared/
total 4
drwxr-xr-x. 4 vdsm kvm 4096 Jul 27  2011 f1621233-13f4-4c6b-99e3-86fbd4dbdad2


but lsof returns warning
[root@srh-03 ~]# lsof | grep shared
lsof: WARNING: can't stat() nfs4 file system /rhev/data-center/mnt/10.34.63.202:_home_iso_shared (deleted)
      Output information may be incomplete.

Comment 1 Jakub Libosvar 2012-02-08 14:12:15 UTC
This is probably not a vdsm bug, umount hangs and nfs doesn't timeout on specified 60 seconds. Will change component soon.

Comment 2 Jakub Libosvar 2012-02-08 15:46:30 UTC
Please ignore the first piece of logs in description - they are not relevant to this issue. The 'operation stuck' error is on spmStart
Thread-255::INFO::2012-02-08 13:28:58,385::logUtils::37::dispatcher::(wrapper) Run and protect: spmStart(spUUID='bb97353c-fb19-4f46-bda4-46dc37f5c5d6', prevID=-1, prevLVER='-1', recoveryMode=0, scsiFencing='false', maxHostID=250, domVersion='2', options=None)

Comment 3 Dan Kenigsberg 2012-04-18 20:24:18 UTC
that's tough, but umount cannot do its thing (incl. flushing fs data to server) with no networking. nfs considers this as a feature. Is there really a Vdsm bug here?

Comment 4 Jakub Libosvar 2012-05-17 07:46:26 UTC
I cannot reproduce anymore - closing as notabug


Note You need to log in before you can comment on or make changes to this bug.