Created attachment 601271 [details] vdsm.log Creation of a PosixFS/GlusterFS domain fails with the following exception. Mounting the same path with "mount -t glusterfs" works fine on the same host. Thread-61564::ERROR::2012-07-30 16:31:01,614::task::853::TaskManager.Task::(_setError) Task=`8f2ec307-39ea-46f4-9d30-df386002fbc8`:: Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2154, in createStorageDomain typeSpecificArg, storageType, domVersion) File "/usr/share/vdsm/storage/nfsSD.py", line 75, in create cls._preCreateValidation(sdUUID, mntPoint, remotePath, version) File "/usr/share/vdsm/storage/nfsSD.py", line 45, in _preCreateValidation raise se.StorageDomainFSNotMounted(domPath) StorageDomainFSNotMounted: Storage domain remote path not mounted: ('/rhev/data-center/mnt/10.35.97.44:_rep1',)
(In reply to comment #0) > Created attachment 601271 [details] > vdsm.log > > Creation of a PosixFS/GlusterFS domain fails with the following exception. > Mounting the same path with "mount -t glusterfs" works fine on the same host. Did you check the content? Because at the moment this looks like something completely related to gluster: # mount -t glusterfs 10.35.97.44:/rep1 /mnt/fsimonce # ls -l /mnt/fsimonce ls: cannot access /mnt/fsimonce/f3c35b5c-7ed8-4b09-ada9-d1f8f7acf58f: No such file or directory total 0 ?????????? ? ? ? ? ? f3c35b5c-7ed8-4b09-ada9-d1f8f7acf58f
Thread-61560::INFO::2012-07-30 16:29:33,488::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': '10.35.97.44:/rep1', 'mnt_options': 'vers=3', 'portal': '', 'user': '', 'iqn': '', 'vfs_type': 'glusterfs', 'password': '******', 'id': '5e27c083-ed1e-4e30-8133-925ded4ded2a'}], options=None) Thread-61560::DEBUG::2012-07-30 16:29:33,491::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /bin/mount -t glusterfs -o vers=3 10.35.97.44:/rep1 /rhev/data-center/mnt/10.35.97.44:_rep1' (cwd None) Thread-61564::INFO::2012-07-30 16:29:38,838::logUtils::37::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=6, sdUUID='b88f223e-af31-4262-bb6e-fe8e5b61e12c', domainName='posix3', typeSpecificArg='10.35.97.44:/rep1', domClass=1, domVersion='0', options=None) Thread-61601::INFO::2012-07-30 16:30:38,889::logUtils::37::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': '10.35.97.44:/rep1', 'mnt_options': 'vers=3', 'portal': '', 'user': '', 'iqn': '', 'vfs_type': 'glusterfs', 'password': '******', 'id': '00000000-0000-0000-0000-000000000000'}], options=None) Thread-61564::ERROR::2012-07-30 16:31:01,614::task::853::TaskManager.Task::(_setError) Task=`8f2ec307-39ea-46f4-9d30-df386002fbc8`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2154, in createStorageDomain typeSpecificArg, storageType, domVersion) File "/usr/share/vdsm/storage/nfsSD.py", line 75, in create cls._preCreateValidation(sdUUID, mntPoint, remotePath, version) File "/usr/share/vdsm/storage/nfsSD.py", line 45, in _preCreateValidation raise se.StorageDomainFSNotMounted(domPath) StorageDomainFSNotMounted: Storage domain remote path not mounted: ('/rhev/data-center/mnt/10.35.97.44:_rep1',)
As reported in comment 5 the engine is sending a disconnectStorageServer right after (1 second) the createStorageDomain call. Moving to backend.
Daniel, please provide engine log.
Created attachment 601901 [details] engine.log Attaching engine log.
Federico, the creation of all storage domains shares the same flow: connect->create->disconnect, however only Gluster and POSIX fail. Also, commented out the disconnect code in my engine, still failed with the same error. Moreover, I noticed an lvm meesaage here (repo on my system): Thread-36611::DEBUG::2012-08-02 21:42:43,876::task::568::TaskManager.Task::(_updateState) Task=`aa32b38e-c775-4d4d-b7f5-5f44ab53cbc5`::moving from state init -> state preparing Thread-36611::INFO::2012-08-02 21:42:43,876::logUtils::37::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=6, sdUUID='0c0038ac-5858-4b61-8f1c-84ea142d8950', domainName='ppp', typeSpecificArg='multipass.eng.lab.tlv.redhat.com:/export/images/rnd/amureini/posix2', domClass=1, domVersion='3', options=None) Thread-36611::DEBUG::2012-08-02 21:42:43,877::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-36611::DEBUG::2012-08-02 21:42:43,878::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 0c0038ac-5858-4b61-8f1c-84ea142d8950' (cwd None) Thread-36611::DEBUG::2012-08-02 21:42:43,965::__init__::1164::Storage.Misc.excCmd::(_log) FAILED: <err> = ' Volume group "0c0038ac-5858-4b61-8f1c-84ea142d8950" not found\n'; <rc> = 5 Thread-36611::WARNING::2012-08-02 21:42:43,966::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "0c0038ac-5858-4b61-8f1c-84ea142d8950" not found'] Thread-36611::DEBUG::2012-08-02 21:42:43,966::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-36611::INFO::2012-08-02 21:42:43,975::nfsSD::64::Storage.StorageDomain::(create) sdUUID=0c0038ac-5858-4b61-8f1c-84ea142d8950 domainName=ppp remotePath=multipass.eng.lab.tlv.redhat.com:/export/images/rnd/amureini/posix2 domClass=1 Thread-36611::ERROR::2012-08-02 21:42:43,980::task::833::TaskManager.Task::(_setError) Task=`aa32b38e-c775-4d4d-b7f5-5f44ab53cbc5`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2152, in createStorageDomain typeSpecificArg, storageType, domVersion) File "/usr/share/vdsm/storage/nfsSD.py", line 75, in create cls._preCreateValidation(sdUUID, mntPoint, remotePath, version) File "/usr/share/vdsm/storage/nfsSD.py", line 44, in _preCreateValidation raise se.StorageDomainFSNotMounted(domPath) I think revisitng VDSM may be in order. If it's really an engine issue, we can take it up together next week.
(In reply to comment #11) > Federico, the creation of all storage domains shares the same flow: > connect->create->disconnect, however only Gluster and POSIX fail. > > Also, commented out the disconnect code in my engine, still failed with the > same error. That is weird, could you attach the full logs? > Moreover, I noticed an lvm meesaage here (repo on my system): > > 21:42:43,878::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n > /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] > ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 > filter = [ \\"r%.*%\\" ] } global { locking_type=1 > prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 > retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o > uuid,name,attr,size,free,extent_size,extent_count,free_count,tags, > vg_mda_size,vg_mda_free 0c0038ac-5858-4b61-8f1c-84ea142d8950' (cwd None) > Thread-36611::DEBUG::2012-08-02 > 21:42:43,965::__init__::1164::Storage.Misc.excCmd::(_log) FAILED: <err> = ' > Volume group "0c0038ac-5858-4b61-8f1c-84ea142d8950" not found\n'; <rc> = 5 > Thread-36611::WARNING::2012-08-02 > 21:42:43,966::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' > Volume group "0c0038ac-5858-4b61-8f1c-84ea142d8950" not found'] That is correct, it is supposed to fail. We check that there's not another domain with the same uuid in the storages. > I think revisitng VDSM may be in order. > If it's really an engine issue, we can take it up together next week. Ok.
The domain is not mounted since the mount cmd produced fails with unknown type (see below) this is a vdsm bug. In addition: The engine continuing the create Storage Domain flow after an error during the connectStorageServer is wrong and all the following operations will fail. This will not resolved by now. Thread-144::INFO::2012-08-05 10:17:36,779::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': 'multipass.eng.lab.tlv.red hat.com:/export/images/rnd/amureini/posix-sd', 'iqn': '', 'portal': '', 'user': '', 'vfs_type': 'posixfs', 'password': '******', 'id': '266711d4-14c2-404b-8499-5e1417492728'}], options=None) Thread-144::DEBUG::2012-08-05 10:17:36,786::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /bin/mount -t posixfs multipass.eng.lab.tlv.redhat.com:/export/images/rnd/amureini/posix-sd /rhev/data-center/mnt/multipass.eng.lab .tlv.redhat.com:_export_images_rnd_amureini_posix-sd' (cwd None) Thread-144::ERROR::2012-08-05 10:17:36,808::hsm::1948::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 1945, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 179, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 190, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 206, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ";mount: unknown filesystem type 'posixfs'\n")
The actual code dows not support the value 'posixfs' for 'vfs_type'key.
Created attachment 602368 [details] vdsm.log referred in comment 13. Adding reproduction referred in comment #13.
The failed flow errors: * connectStorageServer fails. * createStorageDomain fails. * disconnectStorageServer fails. * monitor stop fails due to a failed released sanlock. Thread-144::INFO::2012-08-05 10:17:36,779::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': 'multipass.eng.lab.tlv.red hat.com:/export/images/rnd/amureini/posix-sd', 'iqn': '', 'portal': '', 'user': '', 'vfs_type': 'posixfs', 'password': '******', 'id': '266711d4-14c2-404b-8499-5e1417492728'}], options=None) Thread-144::DEBUG::2012-08-05 10:17:36,786::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /bin/mount -t posixfs multipass.eng.lab.tlv.redhat.com:/export/images/rnd/amureini/posix-sd /rhev/data-center/mnt/multipass.eng.lab .tlv.redhat.com:_export_images_rnd_amureini_posix-sd' (cwd None) Thread-144::ERROR::2012-08-05 10:17:36,808::hsm::1948::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 1945, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 179, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 190, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 206, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ";mount: unknown filesystem type 'posixfs'\n") Thread-144::INFO::2012-08-05 10:17:36,811::logUtils::39::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 477, 'id': '266711d4-14c2-404b-8499-5e1417492728'}]} Thread-145::INFO::2012-08-05 10:17:37,008::logUtils::37::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=6, sdUUID='a06f358b-1fb8-4a1d-971d-05e41ccf035e', domainName='posix-storage-domain', typeSpecificArg='multipa ss.eng.lab.tlv.redhat.com:/export/images/rnd/amureini/posix-sd', domClass=1, domVersion='3', options=None) Thread-145::INFO::2012-08-05 10:17:37,800::nfsSD::64::Storage.StorageDomain::(create) sdUUID=a06f358b-1fb8-4a1d-971d-05e41ccf035e domainName=posix-storage-domain remotePath=multipass.eng.lab.tlv.redhat.com:/export/images/rnd/amureini/posix-sd domClass=1 Thread-145::ERROR::2012-08-05 10:17:37,807::task::833::TaskManager.Task::(_setError) Task=`024ea012-33dc-4e9d-b186-dfa0f4b76e68`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2152, in createStorageDomain typeSpecificArg, storageType, domVersion) File "/usr/share/vdsm/storage/nfsSD.py", line 75, in create cls._preCreateValidation(sdUUID, mntPoint, remotePath, version) File "/usr/share/vdsm/storage/nfsSD.py", line 44, in _preCreateValidation raise se.StorageDomainFSNotMounted(domPath) StorageDomainFSNotMounted: Storage domain remote path not mounted: ('/rhev/data-center/mnt/multipass.eng.lab.tlv.redhat.com:_export_images_rnd_amureini_posix-sd',) Thread-164::INFO::2012-08-05 10:17:37,981::logUtils::37::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': 'multipass.eng.lab.tlv.redhat.com:/export/images/rnd/amureini/posix-sd', 'iqn': '', 'portal': '', 'user': '', 'vfs_type': 'posixfs', 'password': '******', 'id': '00000000-0000-0000-0000-000000000000'}], options=None) Thread-164::DEBUG::2012-08-05 10:17:37,981::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /bin/umount -f -l /rhev/data-center/mnt/multipass.eng.lab.tlv.redhat.com:_export_images_rnd_amureini_posix-sd' (cwd None) Thread-164::ERROR::2012-08-05 10:17:37,998::hsm::2022::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2019, in disconnectStorageServer conObj.disconnect() File "/usr/share/vdsm/storage/storageServer.py", line 185, in disconnect self._mount.umount(True, True) File "/usr/share/vdsm/storage/mount.py", line 218, in umount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 206, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (1, ';umount: /rhev/data-center/mnt/multipass.eng.lab.tlv.redhat.com:_export_images_rnd_amureini_posix-sd: not mounted\n') Thread-164::INFO::2012-08-05 10:17:38,738::logUtils::39::dispatcher::(wrapper) Run and protect: disconnectStorageServer, Return response: {'statuslist': [{'status': 477, 'id': '00000000-0000-0000-0000-000000000000'}]} Thread-35::DEBUG::2012-08-05 10:17:57,221::domainMonitor::191::Storage.DomainMonitor::(_monitorDomain) Unable to release the host id 1 for the domain 1e49d867-4438-4484-a1ca-f6ab7013a319 Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 188, in _monitorDomain domain.releaseHostId(hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 426, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/safelease.py", line 190, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: ('1e49d867-4438-4484-a1ca-f6ab7013a319', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))
(In reply to comment #13) > The domain is not mounted since the mount cmd produced fails with unknown > type (see below) this is a vdsm bug. That is because you used posixfs (nonexistent) as vfs_type. (It's not a vdsm bug) (In reply to comment #14) > The actual code dows not support the value 'posixfs' for 'vfs_type'key. Posixfs doesn't exist (it's a category of filesystems). If you look at the original vdsm log you see that the vfs_type should be glusterfs: Thread-61005::INFO::2012-07-30 16:11:15,202::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=6, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': '10.35.97.44:/rep1', 'mnt_options': 'vers=3', 'portal': '', 'user': '', 'iqn': '', 'vfs_type': 'glusterfs', 'password': '******', 'id': '5e27c083-ed1e-4e30-8133-925ded4ded2a'}], options=None) Everything else referred to comment #13 and its log looks irrelevant to glusterfs and this bug. (In reply to comment #16) > The failed flow errors: > * connectStorageServer fails. > * createStorageDomain fails. > * disconnectStorageServer fails. > * monitor stop fails due to a failed released sanlock. The monitor stop doesn't fail, it just logs the exception.
(In reply to comment #5) > Thread-61564::INFO::2012-07-30 > 16:29:38,838::logUtils::37::dispatcher::(wrapper) Run and protect: > createStorageDomain(storageType=6, > sdUUID='b88f223e-af31-4262-bb6e-fe8e5b61e12c', domainName='posix3', > typeSpecificArg='10.35.97.44:/rep1', domClass=1, domVersion='0', > options=None) > > Thread-61601::INFO::2012-07-30 > 16:30:38,889::logUtils::37::dispatcher::(wrapper) Run and protect: > disconnectStorageServer(domType=6, > spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', > 'connection': '10.35.97.44:/rep1', 'mnt_options': 'vers=3', 'portal': '', > 'user': '', 'iqn': '', 'vfs_type': 'glusterfs', 'password': '******', 'id': > '00000000-0000-0000-0000-000000000000'}], options=None) I think there might be something stuck on the VDSM side: 1 minute passed after createStorageDomain was issued (before the disconnectStorageServer). I suppose this is something related to Daniel's client/server interaction because createStorageDomain worked fine for both me and Allon.
Since this seems to be an env issue and not a generic VDSM/engine issue, can the beta1 be cleared?
(In reply to comment #19) > Since this seems to be an env issue and not a generic VDSM/engine issue, can > the beta1 be cleared? Still setting pm_ack, need to investigate to understand if a best practice/guidelines can be produced out of this. Agree with the removal of the beta blocker at this stage.
Works fine as of si13.1 with VDSM build 4.9.6-26.