I'm running ovirt from http://people.centos.org/hughesjr/ovirt31/x86_64 Everything seems to run fine, except for the PosixFS part in vdsm (I'm using clvmd, cman and gfs2, not gluster): - the code doesn't take device mapper mounts into account. So if you specify the mount as "/dev/mapper/....", it gets mounted ok, but then it searches in /proc/mounts and there "/dev/dm-..." is used of course. For now I changed it in /usr/share/vdsm/storage/mount.py on line 115 by using _iterateMtab like this: def _iterMountRecords(): """for rec in _iterKnownMounts():""" for rec in _iterateMtab(): - After this, the filesystem gets mounted ok, but then another error occurs: in the file fileSD.py the real path (for the mount?) is being searched by a call to getRealPath(), but currently the subclass localFsSD.py is being used for the implementation of this function and this calls os.readlink for this. But since it is a mountpoint and not a link, this fails. I changed this like this in localFsSD.py on line 102: def getRealPath(self): return self.mountpoint """return os.readlink(self.mountpoint)""" Using these 2 changes, my defined storage stays up and I can start using it. This might not be the correct fix, but it works for now ...
Second part was fixed upstream: http://gerrit.ovirt.org/#/c/10024/ Need to review the iteration of mounts.
*** Bug 906963 has been marked as a duplicate of this bug. ***
I'm using oVirt 3.3.4 from the ovirt.org website on CentOS 6.5. I also use gfs2 over clvm with cman and I have the first problem as well. I think, in the long term and portability POV reading mtab is not a very good idea. It can be out of sync if something goes really wrong and some distro even symlinks it to /proc/mounts (isn't systemd devs recommending this btw?). I solved it with: # diff -Naru vdsm/storage/mount.py.orig vdsm/storage/mount.py --- vdsm/storage/mount.py.orig 2014-03-07 16:21:03.211859184 +0000 +++ vdsm/storage/mount.py 2014-03-07 16:21:20.718544406 +0000 @@ -262,7 +262,8 @@ def getRecord(self): for record in _iterMountRecords(): - if (record.fs_spec == self.fs_spec and + realSpec = os.path.realpath(self.fs_spec) + if (record.fs_spec == realSpec and record.fs_file == self.fs_file): return record Kind regards Enrico
Just updated to oVirt 3.4.0 and the issue is still there. Without the patch from comment #4 I'm not able to reinitialize the storage after the update. Applied the patch, restartd vdsmd and everything worked again.
(In reply to Enrico Tagliavini from comment #5) > Just updated to oVirt 3.4.0 and the issue is still there. Without the patch > from comment #4 I'm not able to reinitialize the storage after the update. > Applied the patch, restartd vdsmd and everything worked again. Thanks Enrico! Would you like to send a patch? See http://gerrit.ovirt.org and http://www.ovirt.org/Working_with_oVirt_Gerrit
Hi Nir, sorry for the very long delay..... I'm totally took by other stuff :(. I'll try to send a patch on your gerrit ASAP as soon I can sort my life out a bit. For now I found a bug in the current patch, so I put the new one here for reference. I discovered a problem with my patch as soon as I tried to add an NFS export domain to the cluster. Indeed it fails since os.path.realpath does fancy stuff if the target doesn't exist. The following takes care of the problem: [root@fionn-virt1 share]# diff -Naru vdsm/storage/mount.py.orig vdsm/storage/mount.py --- vdsm/storage/mount.py.orig 2014-03-07 16:21:03.211859184 +0000 +++ vdsm/storage/mount.py 2014-04-17 11:14:09.255332056 +0100 @@ -262,7 +262,12 @@ def getRecord(self): for record in _iterMountRecords(): - if (record.fs_spec == self.fs_spec and + realSpec = self.fs_spec + # if this is a symlink we need the real path + # to check against /proc/mounts + if os.path.lexists(realSpec): + realSpec = os.path.realpath(self.fs_spec) + if (record.fs_spec == realSpec and record.fs_file == self.fs_file): return record
Hi Nir, I finally found the time to send a patch. http://gerrit.ovirt.org/#/c/27321/ Kind regards Enrico
Enrico, Is it possible to reproduce with NFS or Posix storage domain? Can you add detailed description on how to reproduce this issue? Either by configuring a storage domain, or maybe manually from the shell. We will need this to verify your patch.
Hi Nir, for my case it is very easy to reproduce the issue. NFS is unrelated, I mentioned it because my first patch was breaking NFS support. To reproduce you can use CentOS 6.5 full updated (so I guess it is the same on RHEL), make your PosixFS over LVM and try to add it to oVirt using the conventional path /dev/mapper/$VG-$LV which is a symlink to /dev/dm-$N. Without the patch it should fail since in /proc/mounts the device should be /dev/dm-$N which doesn't match /dev/mapper/$VG-$LV . Also be aware I think my patch breaks newer systems where /etc/mtab is a symlink to /proc/mounts . On my gentoo system running systemd, with said symlink, I have no reference to /dev/dm-$N in /proc/mounts. The solution is to normalize both the record spec path and the one in /proc/mounts
Hi, some updates and discovery worth reporting here as well. This has been discovered while trying to build a unit test for this case. See also http://gerrit.ovirt.org/#/c/27321/ When mounting gfs2 on a single node and NOT from an LVM volume group it seems to just work as expected, as in the /dev/mapper/name symlink is used (I used kpartx to get a /dev/mapper/ symlink). So I can't reproduce it with a unit test creating a disk image, but the real system I get: [root@fionn-virt1 ~]# mount | grep gfs2 /dev/mapper/vg_is5000-lv_virtstorage on /gfs2/virtstorage type gfs2 (rw,seclabel,noatime,nodiratime,hostdata=jid=0,nobarrier) /dev/mapper/vg_is5000-lv_virtiso on /gfs2/virtiso type gfs2 (rw,seclabel,noatime,nodiratime,hostdata=jid=0,nobarrier) /dev/mapper/vg_is5000-lv_virtstorage on /rhev/data-center/mnt/_dev_mapper_vg__is5000-lv__virtstorage type gfs2 (rw,seclabel,noatime,nodiratime,hostdata=jid=0,nobarrier) /dev/mapper/vg_is5000-lv_virtiso on /rhev/data-center/mnt/_dev_mapper_vg__is5000-lv__virtiso type gfs2 (rw,seclabel,noatime,nodiratime,hostdata=jid=0,nobarrier) [root@fionn-virt1 ~]# grep gfs2 /proc/mounts /dev/dm-13 /gfs2/virtstorage gfs2 rw,seclabel,noatime,nodiratime,hostdata=jid=0,nobarrier 0 0 /dev/dm-14 /gfs2/virtiso gfs2 rw,seclabel,noatime,nodiratime,hostdata=jid=0,nobarrier 0 0 /dev/dm-13 /rhev/data-center/mnt/_dev_mapper_vg__is5000-lv__virtstorage gfs2 rw,seclabel,noatime,nodiratime,hostdata=jid=0,nobarrier 0 0 /dev/dm-14 /rhev/data-center/mnt/_dev_mapper_vg__is5000-lv__virtiso gfs2 rw,seclabel,noatime,nodiratime,hostdata=jid=0,nobarrier 0 0 They are mounted twice, once by the gfs2 service, the second one by oVirt. I used the /dev/mapper/name for both. I checked the source code for mount.gfs2 for something like readlink(), but found nothing. No idea who is resolving that symlink. That said, the 2 forms are equivalent and I still think oVirt should check for that and handle it correctly.
Ehm sorry I take it back. I can reproduce only with gfs2, but clvm is not needed, sorry for the confusion. Enrico
Nir, Can you add steps to reproduce?
(In reply to Elad from comment #13) > Nir, > Can you add steps to reproduce? See comment 10, I don't have any additional info.
(In reply to Nir Soffer from comment #14) > (In reply to Elad from comment #13) > > Nir, > > Can you add steps to reproduce? > > See comment 10, I don't have any additional info. Enrico, perhaps you can assist Elad with steps to reproduce so Elad can verify the fix?
(In reply to Allon Mureinik from comment #15) > (In reply to Nir Soffer from comment #14) > > (In reply to Elad from comment #13) > > > Nir, > > > Can you add steps to reproduce? > > > > See comment 10, I don't have any additional info. > > Enrico, perhaps you can assist Elad with steps to reproduce so Elad can > verify the fix? Sure, but there is nothing more to add from the previous comments unless you have some more specific question. I'll summarize again. The issue happens when using PosixFS with GFS2 file system over LVM as a backend. GFS2 is crucial to reproduce the bug, other file system will not trigger the bug. LVM is needed to have a symlink to the block device. I advise not to use loopback devices, oVirt has some special code doing additional checks for loopback devices. This is useful for testing bug real world scenarios do not make use of loopback. To reproduce I used CentOS 6.5 full updated (so I guess it is the same on RHEL), make your PosixFS over LVM and try to add it to oVirt using the conventional path /dev/mapper/$VG-$LV which is a symlink to /dev/dm-$N.With oVirt 3.3 and 3.4, missing the needed fix, it should fail since in /proc/mounts the device should be /dev/dm-$N which doesn't match /dev/mapper/$VG-$LV . For some odd reason this happens only with GFS2 to my knowledge. Thank you
With "For some odd reason this happens only with GFS2 to my knowledge." I mean the fact that the symlink is dereferenced before being written to /proc/mounts. When using ext4 file system the symlink path is writte to /proc/mounts as in /dev/mapper/vg_tp440s-root / ext4 rw,seclabel,relatime,data=ordered 0 0 # ll /dev/mapper/vg_tp440s-root lrwxrwxrwx. 1 root root 7 Aug 28 08:51 /dev/mapper/vg_tp440s-root -> ../dm-1
Hi Enrico, As I understood, this bug was fixed by your patch for your env. Therefore, we would like to close it as UPSTREAM once you confirm that the patch you've provided actually fix the issue. Thanks
Hi Elad, the bug was fixed by my patch. However I would like to point out I was not able to test the final version of the fix on the system that exposed it in the first place. Since when I reported the first version of the fix I changed work and thus lost access to that system. I tried my best to reproduce in a VM in my spare time, but was not very realistic. That said the code looks good to me and issue can be closed. Thank you for your work on this. Kind regards Enrico
closed upstream is the wrong resolution, as this _is_ upstream (ovirt).
(In reply to Sven Kieske from comment #20) > closed upstream is the wrong resolution, as this _is_ upstream (ovirt). Right, thanks for noticing! Changed to CURRENTRELEASE (as this patch is confirmed to be part of v4.16.0)