Bug 474833
Summary: | LVM on iSCSI do not work upon restart | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Miroslav Suchý <msuchy> |
Component: | iscsi-initiator-utils | Assignee: | Chris Leech <cleech> |
Status: | CLOSED WONTFIX | QA Contact: | Martin Jenner <mjenner> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.2 | CC: | arnaud.gomes, coughlan, cphillip, john, mchristi, michael_varun, oliver.hookins, prajnoha, rackeby, ricardo.arguello, rwahyudi, vvasilev |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-10-15 19:23:54 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Miroslav Suchý
2008-12-05 15:26:50 UTC
If I am understanding you right, there should be some code in the netfs script which should activate lvms after the network is up and iscsi has run. In /etc/init.d/netfs in the start function we have this: if [ -x /sbin/lvm.static ]; then if /sbin/lvm.static vgscan > /dev/null 2>&1 ; then action $"Setting up Logical Volume Management:" /sbin/lvm.static vgchange -a y fi fi which I thought was supposed to do what you are asking for. The solution is to add the "_netdev" to your fstab entry for the iSCSI mount. ad comment #1: yes the code is there and it *should* work. But it is not. If you want I can provide you access to reproducer machine. Please contact me on #satellite-devel irc. ad comment #2: it is not mount point. It is logical volume. And xen guests access it. So fstab is out of game. (In reply to comment #3) > ad comment #1: yes the code is there and it *should* work. But it is not. If > you want I can provide you access to reproducer machine. Please contact me on > #satellite-devel irc. > Can you send a email to mchristi with the box info. I will take a look at it, so we can get this fixed in 5.4. I have seen the same problem on el6. (In reply to comment #7) > I have seen the same problem on el6. Is it just iscsi on lvm or iscsi on multipath on lvm? And are you doing a FS on top of lvm or just using the lvm device directly? If using a FS on top of lvm then send your fstab? It is just iscsi on lvm. No multipathing. There is no file system on top. The aim is to encrypt the iscsi device but I was having problems with the device appearing after reboot which is why I ended up looking at this bug. /etc/fstab # # /etc/fstab # Created by anaconda on Wed May 4 11:18:33 2011 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # UUID=8d9e032a-1d72-49f1-ae93-b9e4fee21c4d / ext4 defaults 1 1 UUID=9ef123e4-4201-42f1-a4f6-86bf4d3997d7 /boot ext2 defaults 1 2 UUID=4cc61a80-0bd6-4d40-99ec-f8cd52e616a1 swap swap defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 /dev/sdb1 /mnt ext4 _netdev 0 0 I see the same issue when using a LVM-over-iSCSI volume (no multipath involved) as a virtual drive for a Xen guest. (as in disk = [ "phy:/dev/vg_ulysses-prod/ulysses,xvda,w" ]). The "vgchange" block in /etc/init.d/netfs does not run unless there is at least one filesystem with _netdev option in fstab, so the Xen guest will not start automatically at boot time. ccing LVM developer. Peter, how do you want to handle this? In rhel 6 is there some magic lvm script that prevents this (some sort of async device handling) and can be backported? We have this problem with fcoe and iscsi and anytime a driver is loaded after lvm. (In reply to comment #11) > ccing LVM developer. > > Peter, how do you want to handle this? In rhel 6 is there some magic lvm script > that prevents this (some sort of async device handling) and can be backported? > A workaround I know of is the one that uses custom udev rules to call LVM activation code ("vgchange") when a new device appears (which detected as a PV with ENV{ID_FS_TYPE}=="LVM2_member"), as documented in technical notes of bug #621375. But this workaround poses quite a significant performance issue with high volume/device count since there's LVM scanning... Of course, the problem here is that there's no direct vgchange call after special device activation (like iscsi or fcoe). For this to work, we would need to add an additional initscript that would be called at right time. But we already have vgchange calls here as marked by a star (in the order of execution): * 1. dracut * 2. rc.sysinit script 3. multipathd init script 4. iscsid init script 5. iscsi init script * 6. clvmd init script * 7. netfs init script But clvmd is specific to cluster environment and netfs script activates volumes with a record in fstab only (with "_netdev" option used, but this one is activated only if there's any filesystem on it as already noted in the comments above). To solve this situation in an easy and straightforward way, we would need to add *another* script with LVM activation call after all devices (including special devices) are set up. But considering the fact there's a plan (at least if everything goes well) for a new metadata daemon to be included in RHEL 6.3 that will collect and cache metadata based on incoming udev events (as devices appear), together with the possibility of automatic LVM activation (that will happen when the volume group is complete and has all PVs in place), I'll lean to wait for this as an official solution to this problem. Unfortunately, we do not support udev in RHEL5 LVM2/device-mapper and since this metadata daemon would be based on udev events, I don't think we'll backport this to RHEL5 then. So the only viable solution I can see for RHEL5 is adding another init script for late VG/LV activation. We are experiencing similar issue, but the trigger is network connectivity issue. Every time we experienced network connectivity issue, all LVM disk becomes read only mode. This is the error logs when we lost our network : ------------------------------------------------------- Mar 24 07:39:11 cookie kernel: bnx2 0000:01:00.0: eth0: NIC Copper Link is Down Mar 24 07:39:20 cookie kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 5305590931, last ping 5305595931, now 5305600931 Mar 24 07:39:20 cookie kernel: connection1:0: detected conn error (1011) Mar 24 07:39:21 cookie iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Mar 24 07:41:20 cookie kernel: session1: session recovery timed out after 120 secs Mar 24 07:41:20 cookie kernel: sd 4:0:0:0: SCSI error: return code = 0x000f0000 ... Mar 24 07:41:20 cookie kernel: end_request: I/O error, dev sdb, sector 419430785 Mar 24 07:41:20 cookie kernel: Buffer I/O error on device dm-1, logical block 0 Mar 24 07:41:20 cookie kernel: lost page write due to I/O error on dm-1 Mar 24 07:41:20 cookie kernel: ext3_abort called. Mar 24 07:41:20 cookie kernel: EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal Mar 24 07:41:20 cookie kernel: Remounting filesystem read-only ... Once the network is up and running again, iSCSI re-establish connection just fine, but disk are still mounted in Read only mode and LVM misbehaved : Mar 24 08:03:05 cookie kernel: bnx2 0000:01:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex Mar 24 08:04:23 cookie iscsid: connection1:0 is operational after recovery (270 attempts) Running lvdisplay after this produced : /dev/vgd/vgd-tmp: read failed after 0 of 4096 at 644245028864: Input/output error /dev/vgd/vgd-tmp: read failed after 0 of 4096 at 644245086208: Input/output error /dev/vgd/vgd-tmp: read failed after 0 of 4096 at 0: Input/output error /dev/vgd/vgd-tmp: read failed after 0 of 4096 at 4096: Input/output error To fix this issue we have to refresh the volume group and remount the disks: vgchange --refresh vgd mount -o remount /var/lib/mysql Other information : fstab entry : /dev/vgd/vgd-tmp /tmp ext3 acl,_netdev 0 0 Running service netfs status displayed the mount point: service netfs status Configured network block devices: /dev/vgd/vgd-tmp Active network block devices: /tmp If RHEL5 will not get udevd, what is the long term solutions for the problem I just described above ? (In reply to comment #13) > We are experiencing similar issue, but the trigger is network connectivity > issue. > Every time we experienced network connectivity issue, all LVM disk becomes read > only mode. > > This is the error logs when we lost our network : > ------------------------------------------------------- > Mar 24 07:39:11 cookie kernel: bnx2 0000:01:00.0: eth0: NIC Copper Link is Down > Mar 24 07:39:20 cookie kernel: connection1:0: ping timeout of 5 secs expired, > recv timeout 5, last rx 5305590931, last ping 5305595931, now 5305600931 > Mar 24 07:39:20 cookie kernel: connection1:0: detected conn error (1011) > Mar 24 07:39:21 cookie iscsid: Kernel reported iSCSI connection 1:0 error > (1011) state (3) > Mar 24 07:41:20 cookie kernel: session1: session recovery timed out after 120 > secs This is a a completely different issue. It is working as expected. If there is a connection issue that last longer than node.session.timeo.replacement_timeout seconds the iscsi layer will fail IO to upper layers. Upper layers like file systems will handle this in whatever way they feel is best. When they put the device in read only mode, then when the problem is resolved you have to remount. If lvm is used then you have to refresh that. Please do not respond about this issue in this bugzilla. Make a new one for iscsi-initiator-utils if you want and we can discuss more there. Or, just email me and we can talk more. (In reply to comment #12) > special devices) are set up. But considering the fact there's a plan (at > least if everything goes well) for a new metadata daemon to be included in > RHEL 6.3 that will collect and cache metadata based on incoming udev events > (as devices appear), together with the possibility of automatic LVM > activation (that will happen when the volume group is complete and has all > PVs in place), I'll lean to wait for this as an official solution to this > problem. Just a note that current RHEL 6.4 does include support for lvmetad (changing from technical preview from 6.3) and also support for LVM volume group/logical volume autoactivation based on incoming udev events (see also related bug #817866 and bug #). However, lvm does not support udev in RHEL5 and so it does not support lvmetad and autoactivation. So this solution is just for 6.4 and upwards. (In reply to comment #12) Just posting this as a suggestion for RHEL 5: /tmp /tmp none _netdev,bind 0 0 This is an attempt at a do-nothing fstab entry with the side-effect of triggering the "vgchange -ay" in netfs. Same problem on a few servers that I manage. LVM on iSCSI disks but with multipath. The workaround in "Comment 16" works for me. Yes, the workaround in Comment 16 has been working for me too. No additional minor releases are planned for Production Phase 2 in Red Hat Enterprise Linux 5, and therefore Red Hat is closing this bugzilla as it does not meet the inclusion criteria as stated in: https://access.redhat.com/site/support/policy/updates/errata/#Production_2_Phase I also faced the same issue on redhat-release-server-6Server-6.5.0.1.el6.x86_64,LVM on iscsi, its been mounted with _netdev option in /etc/fstab as well iscsid start of the process is been called at the last S99iscsid , The filesystem does not get mounted or when netfs service gets called at the last it cannot see the physical disks and does not activate the VG in return the server goes to maintainence mode , i need to do an iscsi target login everytime to get the disks visible and then manually activate the volume group I also faced the same issue on redhat-release-server-6Server-6.5.0.1.el6.x86_64,LVM on iscsi, its been mounted with _netdev option in /etc/fstab as well iscsid start of the process is been called at the last S99iscsid , The filesystem does not get mounted or when netfs service gets called at the last it cannot see the physical disks and does not activate the VG in return the server goes to maintainence mode , i need to do an iscsi target login everytime to get the disks visible and then manually activate the volume group You need to enable lvmetad in lvm.conf and make sure that lvm2-lvmetad is running. Thanks a lot for the response , could you please explain a bit in detail what is lvmetad got to do with the iscsi logins ?The lvm is not getting activated as because the physical devices are not getting detected that is what i infer . It would be great if you let us know in detail on how lvmetad be able to perform an iscsi login,appreciate you support on this The lvmetad is the "LVM metadata daemon" that acts an in-memory cache of LVM metadata gathered from devices as they appear in the system - whenever a new block device appears and it has a PV label on it, it's automatically scanned via a udev rule. This updates the lvmetad daemon with the LVM metadata fonud. Once the VG is complete (meaning all the PVs making up the VG are present in the system), the VG is activated. The lvmetad daemon is required for this LVM event-based autoactivation to work. Also, for LVM autoactivation to work, you need your underlying devices (in this case the iscsi) to be present in your system, of course. If it's not, that is a different issue then you need to resolve first. If this is the case, Chris is the right person to help here (the one who has this bug assigned). (This is a bug for RHEL5, if you experience any problems with RHEL6, you should open a bug against RHEL6/iscsi-initiator-utils component instead.) |