Bug 743740 - fstab iscsi mount point no longer mounts in F15 after systemd update.
Summary: fstab iscsi mount point no longer mounts in F15 after systemd update.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: iscsi-initiator-utils
Version: 17
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Mike Christie
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-05 21:13 UTC by Ron Gonzalez
Modified: 2013-08-01 03:50 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-08-01 03:50:53 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
don't return succes if network is not up (859 bytes, application/octet-stream)
2011-10-14 07:13 UTC, Mike Christie
no flags Details
New File: /etc/sysconfig/open-iscsi In which to list Volume Groups to Mount Automatically for iscsi initiator on boot. (18 bytes, application/octet-stream)
2011-10-17 22:22 UTC, Ron Gonzalez
no flags Details
/etc/init.d/iscsi patched with LVM Groups automount. (4.47 KB, application/octet-stream)
2011-10-17 22:23 UTC, Ron Gonzalez
no flags Details
/etc/init.d/iscsi auto mount LV's on Iscsi target PV's (4.57 KB, application/octet-stream)
2011-10-17 23:39 UTC, Ron Gonzalez
no flags Details

Description Ron Gonzalez 2011-10-05 21:13:54 UTC
Description of problem:

Volume Group on ISCSI device no longer mounts on system boot after F15 systemd.

Version-Release number of selected component (if applicable):


How reproducible:

F14 system would automount devices on boot.

I have the following configuration:

chkconfig --list 
network         0:off   1:off   2:off   3:off   4:off   5:off   6:off
iscsid          0:off   1:off   2:on    3:on    4:on    5:on    6:off
iscsi           0:off   1:off   2:on    3:on    4:on    5:on    6:off

Installed Packages 
iscsi-initiator-utils.x86_64 : 6.2.0.872-12.fc15  

#pvdisplay
 --- Physical volume ---
  PV Name               /dev/sdc
  VG Name               Media
  PV Size               2.73 TiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              715399
  Free PE               0
  Allocated PE          715399
  PV UUID               0Sr9Gk-3S33-xrlG-ukU9-J9lj-lBzW-2lGaMI

#lvdisplay
  --- Logical volume ---
  LV Name                /dev/Media/Media
  VG Name                Media
  LV UUID                CXHKOD-BXoY-SKlG-98Cm-Gp2Z-7x3b-GqBjuX
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                2.73 TiB
  Current LE             715399
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:5

#vgdisplay

  --- Volume group ---
  VG Name               Media
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                256
  Cur LV                1
  Open LV               1
  Max PV                256
  Cur PV                1
  Act PV                1
  VG Size               2.73 TiB
  PE Size               4.00 MiB
  Total PE              715399
  Alloc PE / Size       715399 / 2.73 TiB
  Free  PE / Size       0 / 0   
  VG UUID               yBRBne-3w0R-kU5M-CrSz-1F4u-Ilve-KkFrKJ


#blkid

/dev/mapper/Media-Media: UUID="cec10bea-9cca-46ac-9a1f-7126c0987dfb" TYPE="ext4" 
/dev/sdc: UUID="0Sr9Gk-3S33-xrlG-ukU9-J9lj-lBzW-2lGaMI" TYPE="LVM2_member" 


#device entry in #/etc/fstab

UUID=0Sr9Gk-3S33-xrlG-ukU9-J9lj-lBzW-2lGaMI     /home/iscsi_media       ext4    _netdev         0       0


Steps to Reproduce:
1. Create a Volume Group that uses an iscsi target as a physical volume.
2. the physical volume maps to /dev/sdc
3. the logical volume is mapped to /dev/Media/Media
4. set nodes to autostart.
  
Actual results:
Volumes are not mounted on system boot with F15 SystemD

Expected results:
Volumes to be mounted.

Additional info:

The way I correct this and mount my volumes is the run the following sequence of commands:

# iscsiadm -m node -l
# vgscan
  Found volume group "raid5" using metadata type lvm2
# lvscan
  inactive          '/dev/raid5/test' [10.00 GB] inherit
# vgchange -ay
  1 logical volume(s) in volume group "raid5" now active
# lvscan
  ACTIVE            '/dev/raid5/test' [10.00 GB] inherit
# mount -t auto /dev/Media/Media /home/iscsi_media/

Comment 1 Ron Gonzalez 2011-10-05 23:55:29 UTC
I am able to trace this problem back to the fact that iscsid is running, but iscsi is not starting! running.

When I run the commands (fresh from booting)

iscsiadm -m session
there are no sessions running.

#service iscsi start 

iscsi starts:

Now I can see that sessions are established:
iscsiadm -m session -P 1          
Target: iqn.2006-01.com.openfiler:tsn.d2781967b5e0
        Current Portal: 192.168.1.235:3260,1
        Persistent Portal: 192.168.1.235:3260,1
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1994-05.com.domain:01.d01cc8
                Iface IPaddress: 192.168.1.215
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
Target: iqn.2006-01.com.openfiler:tsn.a31ab453e7db
        Current Portal: 192.168.1.235:3260,1
        Persistent Portal: 192.168.1.235:3260,1
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1994-05.com.domain:01.d01cc8
                Iface IPaddress: 192.168.1.215
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE



Also the command:

#service iscsi status returns no output.

here is what systemd systemctl reports for the mount point:

root@talon:/etc/init.d/ > systemctl status home-iscsi_media.mount
home-iscsi_media.mount - /home/iscsi_media
          Loaded: loaded
          Active: inactive (dead)
           Where: /home/iscsi_media
            What: /dev/disk/by-uuid/0Sr9Gk-3S33-xrlG-ukU9-J9lj-lBzW-2lGaMI
          CGroup: name=systemd:/system/home-iscsi_media.mount

Comment 2 Ron Gonzalez 2011-10-06 00:22:44 UTC
I've gotten somewhat further by verifying that iscsi is running on system start.

Now when I run iscsiadmin -m session -P 1 I do see sessions on bootup.

However,

when I run lvscan, the Logical Volume is still unavailable.
I have to run vgchange -ay to get it to a mountable state.
How and why was this done automatically in F14 and F15 pre-systemd?


[root@talon lcstyle]# lvscan
  Found duplicate PV 0Sr9Gk3S33xrlGukU9J9ljlBzW2lGaMI: using /dev/sdc not /dev/sdb
  inactive          '/dev/Media/Media' [2.73 TiB] inherit
  ACTIVE            '/dev/vg_talon/usr_LogVol02' [39.66 GiB] inherit
  ACTIVE            '/dev/vg_talon/secure_LogVol03' [19.53 GiB] inherit
  ACTIVE            '/dev/vg_talon/home_LogVol01' [175.78 GiB] inherit
  ACTIVE            '/dev/vg_talon/root_LogVol00' [40.00 GiB] inherit


Here is what happens when you try to systemctl start the job without this being available (I timed it)


#time systemctl start home-iscsi_media.mount
A dependency job failed. See system logs for details.

real    1m17.450s
user    0m0.002s
sys     0m0.006s

As you can see it takes over a minute to return.

Now I run

#vgchange -ay
  Found duplicate PV 0Sr9Gk3S33xrlGukU9J9ljlBzW2lGaMI: using /dev/sdc not /dev/sdb
  1 logical volume(s) in volume group "Media" now active
  /dev/mapper/Media-Media not set up by udev: Falling back to direct node creation.
  The link /dev/Media/Media should had been created by udev but it was not found. Falling back to direct link creation.
  4 logical volume(s) in volume group "vg_talon" now active

Now I run:

#time systemctl start home-iscsi_media.mount

real    0m0.642s
user    0m0.002s
sys     0m0.024s

and it returns immediately and my mount point is all setup.

Comment 3 Ron Gonzalez 2011-10-06 00:46:41 UTC
A similar problem is explored here:

http://www.linuxquestions.org/questions/linux-server-73/logical-volumes-not-available-on-reboot-724093/

However the solution there is to add vgchange -ay to /etc/inittab.


I believe this problem is related to the fact to udev running 64-lvm.rules before this device is online!

look in /boot/initramfs-2.6.40.4-5.fc15.x86_64.img

the image contains /etc/udev/rules.d/64-lvm.rules

which contains:
# hacky rules to try to activate lvm when we get new block devs...
#
# Copyright 2008, Red Hat, Inc.
# Jeremy Katz <katzj>
SUBSYSTEM!="block", GOTO="lvm_end"
ACTION!="add|change", GOTO="lvm_end"
KERNEL=="dm-[0-9]*", ACTION=="add", GOTO="lvm_end"
ENV{ID_FS_TYPE}!="LVM?_member", GOTO="lvm_end"

PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \
    GOTO="lvm_end"

RUN+="/sbin/initqueue --settled --onetime --unique /sbin/lvm_scan"
RUN+="/bin/sh -c '>/tmp/.lvm_scan-%k;'"

LABEL="lvm_end"


Apparently boot order here is important, iscsi must by on and running so that when lvm_scan runs it finds the Logical Volume.

Comment 4 Ron Gonzalez 2011-10-06 01:35:43 UTC
FYI for iscsi to run on startup, I had to enable network service in chkconfig.



chkconfig --list network

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

network         0:off   1:off   2:off   3:on    4:off   5:on    6:off


It runs, but I have to run vgscan, then run vgchange -ay, then run mount.

If you have comment=systemd.automount in fstab for this filesystem, trying to mount it manually using 

mount -t auto /dev/Media/Media /home/iscsi_media will freeze your command prompt.

right now my fstab entry looks like this:

/dev/Media/Media        /home/iscsi_media       ext4    _netdev 0 0

Comment 5 Ron Gonzalez 2011-10-06 02:12:11 UTC
I fixed it,

my final settings are:

root@talon etc]# chkconfig --list network

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

network         0:off   1:off   2:off   3:on    4:off   5:on    6:off
[root@talon etc]# 


/etc/fstab:
/dev/Media/Media        /home/iscsi_media       ext4    _netdev 0 0

I don't know what the problem was, these were my original settings.

I made the following changes to rh_status in /etc/init.d/iscsi

rh_status() {
        echo "Running Status"
if [ -f $lockfile ]; then echo "Not Running!" || return 3
fi
    declare -a iparams=( $(iscsiadm -m session 2>/dev/null | egrep "tcp|iser|bnx2i|be2iscsi|cxgb3i") )
        echo $iparams
    if [[ -z "${iparams[*]}" ]]; then
        echo "no sessions"
        return 2
    fi
echo Running

    return 0
}

Comment 6 Ron Gonzalez 2011-10-06 02:15:28 UTC
systemctl on boot now reports:

UNIT                      LOAD   ACTIVE SUB       JOB DESCRIPTION
home-iscsi_media.mount    loaded active mounted   /home/iscsi_media


I would like to continue to investigate this issue, as even though it is working, I believe it will still be broken if I turn network service back to off off for runlevel 3 & 5.

Comment 7 Ron Gonzalez 2011-10-06 02:59:11 UTC
I see the patch in fix https://bugzilla.redhat.com/show_bug.cgi?id=692230

it should be
-    [ ! -f /var/lock/subsys/network -a ! -f /var/lock/subsys/NetworkManager ] && exit 0
+    status network || status NetworkManager || exit 0

Still Later another patch:

-    [ ! -f /var/lock/subsys/network -a ! -f /var/lock/subsys/NetworkManager ] && exit 0
+    [ ! -f /var/lock/subsys/network ] && ! status NetworkManager >/dev/null 2>&1 && exit 0


What I have in /etc/init.d/iscsi


start() {
    [ -x $exec ] || exit 5
    [ -f $config ] || exit 6

    # if the network isn't up yet exit cleanly, NetworkManager will call us
    # again when the network is up
    [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null 2>&1 && exit 0


Why is my init.d/iscsi not have this patch if:

iscsi-initiator-utils-6.2.0.872-12.fc15.x86_64 : iSCSI daemon and utility programs
Repo        : installed
Matched from:
Other       : Provides-match: /etc/init.d/iscsi


I just backed up /etc/init.d/iscsi  (mv iscsi to bkup_iscsi) and ran

#yum reinstall iscsi-initiator-utils.

What i get in /etc/init.d/iscsi from this package is still:

start() {
    [ -x $exec ] || exit 5
    [ -f $config ] || exit 6

    # if the network isn't up yet exit cleanly, NetworkManager will call us
    # again when the network is up
    [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null 2>&1 && exit 0

    # if no nodes are setup to startup automatically exit cleanly
    grep -qrs "node.startup = automatic" /var/lib/iscsi/nodes
    [ $? -eq 0 ] || exit 0



Then I ran:
#yum changelog 5 iscsi-initiator-utils-6.2.0.872-12.fc15.x86_64

Listing 5 changelogs

==================== Installed Packages ====================
iscsi-initiator-utils-6.2.0.872-12.fc15. installed
* Sat Apr 30 08:00:00 2011 Hans de Goede <hdegoede> - 6.2.0.872-12
- Change iscsi init scripts to check for networking being actually up, rather
  then for NetworkManager being started (#692230)

* Tue Apr 26 08:00:00 2011 Hans de Goede <hdegoede> - 6.2.0.872-11
- Fix iscsid autostarting when upgrading from an older version
  (add iscsid.startup key to iscsid.conf on upgrade)
- Fix printing of [ OK ] when successfully stopping iscsid
- systemd related fixes:
 - Add Should-Start/Stop tgtd to iscsi init script to fix (re)boot from
   hanging when using locally hosted targets
 - %ghost /var/lock/iscsi and contents (#656605)

* Mon Apr 25 08:00:00 2011 Mike Christie <mchristi> 6.2.0.872-10
- Fix iscsi init scripts check for networking being up (#692230)

* Wed Feb  9 07:00:00 2011 Fedora Release Engineering <rel-eng.org> - 6.2.0.872-9
- Rebuilt for https://fedoraproject.org/wiki/Fedora_15_Mass_Rebuild

* Wed Jul 21 08:00:00 2010 David Malcolm <dmalcolm> - 6.2.0.872-8
- Rebuilt for https://fedoraproject.org/wiki/Features/Python_2.7/MassRebuild

changelog stats. 1 pkg, 1 source pkg, 5 changelogs

So I went investigating:

[root@talon packages]# pwd
/var/cache/yum/x86_64/15/fedora/packages
[root@talon packages]# ls -al iscsi-initiator-utils-6.2.0.872-12.fc15.x86_64.rpm 
-rw-r--r--. 1 root root 331704 Apr 30 20:51 iscsi-initiator-utils-6.2.0.872-12.fc15.x86_64.rpm

#mkdir iscsirpm
#cp iscsi-initiator-utils-6.2.0.872-12.fc15.x86_64.rpm iscsirpm/
#cd iscsirpm

[root@talon iscsirpm]# rpm2cpio iscsi-initiator-utils-6.2.0.872-12.fc15.x86_64.rpm | cpio -idmv


./etc/NetworkManager/dispatcher.d/04-iscsi
./etc/iscsi
./etc/iscsi/iscsid.conf
./etc/rc.d/init.d/iscsi
./etc/rc.d/init.d/iscsid
./sbin/iscsi-iname
./sbin/iscsiadm
./sbin/iscsid
./sbin/iscsistart
./usr/lib64/libiscsi.so.0
./usr/lib64/python2.7/site-packages/libiscsimodule.so
./usr/share/doc/iscsi-initiator-utils-6.2.0.872
./usr/share/doc/iscsi-initiator-utils-6.2.0.872/README
./usr/share/man/man8/iscsi-iname.8.gz
./usr/share/man/man8/iscsiadm.8.gz
./usr/share/man/man8/iscsid.8.gz
./usr/share/man/man8/iscsistart.8.gz
./var/lib/iscsi
./var/lib/iscsi/ifaces
./var/lib/iscsi/isns
./var/lib/iscsi/nodes
./var/lib/iscsi/send_targets
./var/lib/iscsi/slp
./var/lib/iscsi/static
3693 blocks
[root@talon iscsirpm]# cd etc/rc.d/init.d/
[root@talon init.d]# pwd
/var/cache/yum/x86_64/15/fedora/packages/iscsirpm/etc/rc.d/init.d

[root@talon init.d]# grep "isn't" iscsi --context=5

start() {
    [ -x $exec ] || exit 5
    [ -f $config ] || exit 6

    # if the network isn't up yet exit cleanly, NetworkManager will call us
    # again when the network is up
    [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null 2>&1 && exit 0

    # if no nodes are setup to startup automatically exit cleanly
    grep -qrs "node.startup = automatic" /var/lib/iscsi/nodes

Comment 8 Ron Gonzalez 2011-10-06 03:05:20 UTC

[root@talon init.d]# rpm -qi iscsi-initiator-utils
Name        : iscsi-initiator-utils
Version     : 6.2.0.872
Release     : 12.fc15
Architecture: x86_64
Install Date: Wed 05 Oct 2011 10:28:53 PM EDT
Group       : System Environment/Daemons
Size        : 1887005
License     : GPLv2+
Signature   : RSA/SHA256, Sat 30 Apr 2011 02:00:12 PM EDT, Key ID b4ebf579069c8460
Source RPM  : iscsi-initiator-utils-6.2.0.872-12.fc15.src.rpm
Build Date  : Sat 30 Apr 2011 04:35:52 AM EDT
Build Host  : x86-07.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : http://www.open-iscsi.org
Summary     : iSCSI daemon and utility programs
Description :
The iscsi package provides the server daemon for the iSCSI protocol,
as well as the utility programs used to manage it. iSCSI is a protocol
for distributed disk access using SCSI commands sent over Internet
Protocol networks.



#repoquery iscsi-initiator-utils --location
http://mirrors.servercentral.net/fedora/releases/15/Everything/x86_64/os/Packages/iscsi-initiator-utils-6.2.0.872-12.fc15.i686.rpm
http://mirrors.servercentral.net/fedora/releases/15/Everything/x86_64/os/Packages/iscsi-initiator-utils-6.2.0.872-12.fc15.x86_64.rpm

Comment 9 Ron Gonzalez 2011-10-06 07:27:53 UTC
nevermind, I see that the right patch is already in the fix.  

# if the network isn't up yet exit cleanly, NetworkManager will call us
    # again when the network is up
    [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null 2>&1 && exit
0


This means I haven't really figured out what the root cause was for the behavior I was experiencing.

Comment 10 Hans de Goede 2011-10-06 07:38:33 UTC
(In reply to comment #1)
> I am able to trace this problem back to the fact that iscsid is running, but
> iscsi is not starting! running.
> 
> When I run the commands (fresh from booting)
> 
> iscsiadm -m session
> there are no sessions running.
> 
> #service iscsi start 
> 
> iscsi starts:
> 
> Now I can see that sessions are established:
> iscsiadm -m session -P 1          
> Target: iqn.2006-01.com.openfiler:tsn.d2781967b5e0
>         Current Portal: 192.168.1.235:3260,1
>         Persistent Portal: 192.168.1.235:3260,1

<snip>

Right, so the iscsi initscript works fine, it just is not getting called!

> Also the command:
> 
> #service iscsi status returns no output.

That is expected behavior, it only returns output on problems (and a non 0 exit statsus).

(In reply to comment #7)
> What I have in /etc/init.d/iscsi
> 
> 
> start() {
>     [ -x $exec ] || exit 5
>     [ -f $config ] || exit 6
> 
>     # if the network isn't up yet exit cleanly, NetworkManager will call us
>     # again when the network is up
>     [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null 2>&1 && exit
> 0
> 
> 
> Why is my init.d/iscsi not have this patch if:

It does have the final / correct version of the patch, the:
     [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null 2>&1 && exit

Line is correct, it makes iscsi start exit cleanly if the network service
has not run yet.

There are 2 scenarios here:
1) You're using the classic network scripts rather then network manager, in that
case the network service will get started before the iscsi service, /var/lock/subsys/network will thus exist, and thus the exit 0 won't get triggered and
the iscsi start will continue with logging into the iscsi targets

2) You're using NetworkManager and this not the classic network scripts,
in this /var/lock/subsys/network will not exist and when service iscsi start gets run during boot nm-online likely will also return false, resulting in
the iscsi start doing nothing. Then later on when NetworkManager actually brings up the network NetworkManager will execute /etc/NetworkManager/dispatcher.d/04-iscsi, which calls iscsi start again, now the nm-online check
will succeed and thus the exit 0 won't trigger and the script will login to your iscsi targets.

If you still doubt that this line:
     [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null 2>&1 && exit

Is incorrect, compare it with the one in /etc/init.d/netfs

Comment 11 Ron Gonzalez 2011-10-06 14:42:34 UTC
Thank you for your response.

I was confused by the patches proposed in the other bug report.

I have indeed verified the operation of the nm-online -x call, it returns 0.

This is now working as it should, I am going to endeavour to return the network service to its off state and reboot:

network         0:off   1:off   2:off   3:off   4:off   5:off   6:off

I am not sure what happened here, or why iscsi wasn't running, the proper entries for iscsi are found in /etc/rc.d/init.d and accordingly /etc/init.d.

Also the rc3.d and rc5.d both have the Sxxiscsi links to init.d/iscsi.

A very odd thing indeed.  I'll consider this as an exercise in gaining a deeper understanding of iscsi and systemd init.  I will report my results shortly.

Comment 12 Ron Gonzalez 2011-10-06 15:09:17 UTC
Like I said, this doesn't work correctly if I disable the network service.

here is the output

[root@talon log]# systemctl status iscsi.service
iscsi.service - LSB: Starts and stops login and scanning of iSCSI devices.
          Loaded: loaded (/etc/rc.d/init.d/iscsi)
          Active: failed since Thu, 06 Oct 2011 10:51:25 -0400; 12min ago
         Process: 1073 ExecStart=/etc/rc.d/init.d/iscsi start (code=exited, status=1/FAILURE)
          CGroup: name=systemd:/system/iscsi.service

[root@talon init.d]# systemctl status iscsid.service
iscsid.service - LSB: Starts and stops login iSCSI daemon.
          Loaded: loaded (/etc/rc.d/init.d/iscsid)
          Active: active (running) since Thu, 06 Oct 2011 10:51:24 -0400; 15min ago
         Process: 932 ExecStart=/etc/rc.d/init.d/iscsid start (code=exited, status=0/SUCCESS)
        Main PID: 1028 (iscsid)
          CGroup: name=systemd:/system/iscsid.service
                  ├ 1027 iscsid
                  └ 1028 iscsid

[root@talon log]# systemctl status network.service
network.service - LSB: Bring up/down networking
          Loaded: loaded (/etc/rc.d/init.d/network)
          Active: inactive (dead)
          CGroup: name=systemd:/system/network.service

[root@talon log]# systemctl status NetworkManager.service
NetworkManager.service - Network Manager
          Loaded: loaded (/lib/systemd/system/NetworkManager.service)
          Active: active (running) since Thu, 06 Oct 2011 10:51:25 -0400; 6min ago
        Main PID: 989 (NetworkManager)
          CGroup: name=systemd:/system/NetworkManager.service
                  ├  989 /usr/sbin/NetworkManager --no-daemon
                  └ 1092 /sbin/dhclient -d -4 -sf /usr/libexec/nm-dhcp-client.action -pf /var/run/dhclient-eth0.pid -lf /var/lib/dhclient/dhclient-5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03-eth0.lea...



Here is the output from /var/log/messages during startup:
http://pastebin.com/NWZLsNQk

Here are relevant log message excerpts:
[root@talon log]# grep "Oct  6" messages |grep "iscsi"
Oct  6 10:51:22 talon kernel: [   22.753267] iscsi: registered transport (tcp)
Oct  6 10:51:22 talon kernel: [   22.823878] iscsi: registered transport (iser)
Oct  6 10:51:22 talon kernel: [   23.256725] iscsi: registered transport (cxgb3i)
Oct  6 10:51:22 talon kernel: [   23.499049] iscsi: registered transport (bnx2i)
Oct  6 10:51:23 talon kernel: [   23.778571] iscsi: registered transport (be2iscsi)
Oct  6 10:51:23 talon iscsid: iSCSI logger with pid=1027 started!
Oct  6 10:51:23 talon kernel: [   23.935001] iscsid (1028): /proc/1028/oom_adj is deprecated, please use /proc/1028/oom_score_adj instead.
Oct  6 10:51:24 talon iscsid: transport class version 2.0-870. iscsid version 2.0-872
Oct  6 10:51:24 talon iscsid: iSCSI daemon with pid=1028 started!
Oct  6 10:51:25 talon iscsid: cannot make a connection to 192.168.1.235:3260 (-1,101)
Oct  6 10:51:25 talon iscsid: cannot make a connection to 192.168.1.235:3260 (-1,101)
Oct  6 10:51:25 talon systemd[1]: iscsi.service: control process exited, code=exited status=1
Oct  6 10:51:25 talon systemd[1]: Unit iscsi.service entered failed state.
Oct  6 10:52:33 talon systemd[1]: Job home-iscsi_media.mount/start failed with result 'dependency'.



Question, is it possible that /etc/init.d/iscsid is missing the NetworkManager startup test?

Comment 13 Ron Gonzalez 2011-10-06 15:20:00 UTC
I disabled iscsid from running at startup,

chkconfig --level 12345 iscsid off

Now all is well again.



[root@talon log]# systemctl status iscsid.service
iscsid.service - LSB: Starts and stops login iSCSI daemon.
          Loaded: loaded (/etc/rc.d/init.d/iscsid)
          Active: inactive (dead)
          CGroup: name=systemd:/system/iscsid.service

[root@talon log]# systemctl status iscsi.service
iscsi.service - LSB: Starts and stops login and scanning of iSCSI devices.
          Loaded: loaded (/etc/rc.d/init.d/iscsi)
          Active: active (running) since Thu, 06 Oct 2011 11:15:15 -0400; 4min 18s ago
         Process: 1077 ExecStart=/etc/rc.d/init.d/iscsi start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/iscsi.service
                  ├ 1180 iscsid
                  └ 1181 iscsid


[root@talon log]# chkconfig --list iscsi

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

iscsi           0:off   1:off   2:off   3:on    4:off   5:on    6:off
[root@talon log]# chkconfig --list iscsid

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

iscsid          0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@talon log]# chkconfig --list network

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

network         0:off   1:off   2:off   3:off   4:off   5:off   6:off

Comment 14 Ron Gonzalez 2011-10-06 15:40:14 UTC
Apparently the problem lies with iscsid starting first.

It doesn't test for network, and once it runs and fails apparently it prevents iscsi from starting up correctly?

Comment 15 Mike Christie 2011-10-13 19:44:59 UTC
Hey,

So running

service iscsi start

manually after the system has booted works for you? What network settings is that with or does it always work? Running that command by hand does not seem to work for me with any network settings.


And for comment #13 are you saying if you disable iscsid it all just works for you (when you reboot everything starts and mounts)?

Comment 16 Ron Gonzalez 2011-10-13 19:56:10 UTC
Hi Mike, 

Thanks for your response.  Allow me to retest later tonight, but based on my previous testing:

1.  If iscsid is set to start on boot, it fails to allow iscsi which loads later (S07 versus S13).

You can see the difference here:

[root@talon log]# systemctl status iscsi.service
iscsi.service - LSB: Starts and stops login and scanning of iSCSI devices.
          Loaded: loaded (/etc/rc.d/init.d/iscsi)
          Active: failed since Thu, 06 Oct 2011 10:51:25 -0400; 12min ago
         Process: 1073 ExecStart=/etc/rc.d/init.d/iscsi start (code=exited,
status=1/FAILURE)
          CGroup: name=systemd:/system/iscsi.service

[root@talon init.d]# systemctl status iscsid.service
iscsid.service - LSB: Starts and stops login iSCSI daemon.
          Loaded: loaded (/etc/rc.d/init.d/iscsid)
          Active: active (running) since Thu, 06 Oct 2011 10:51:24 -0400; 15min
ago
         Process: 932 ExecStart=/etc/rc.d/init.d/iscsid start (code=exited,
status=0/SUCCESS)
        Main PID: 1028 (iscsid)
          CGroup: name=systemd:/system/iscsid.service
                  ├ 1027 iscsid
                  └ 1028 iscsid



Versus when iscsid is disabled here:

[root@talon log]# systemctl status iscsi.service
iscsi.service - LSB: Starts and stops login and scanning of iSCSI devices.
          Loaded: loaded (/etc/rc.d/init.d/iscsi)
          Active: active (running) since Thu, 06 Oct 2011 11:15:15 -0400; 4min
18s ago
         Process: 1077 ExecStart=/etc/rc.d/init.d/iscsi start (code=exited,
status=0/SUCCESS)
          CGroup: name=systemd:/system/iscsi.service
                  ├ 1180 iscsid
                  └ 1181 iscsid


Apparently the iscsi startup scripts were modified and updated to wait for network, whilst I could not find the same modifications being made to iscsid startup scripts, resulting in the following pattern when iscsid is set to start:


Here are relevant log message excerpts:
[root@talon log]# grep "Oct  6" messages |grep "iscsi"
Oct  6 10:51:22 talon kernel: [   22.753267] iscsi: registered transport (tcp)
Oct  6 10:51:22 talon kernel: [   22.823878] iscsi: registered transport (iser)
Oct  6 10:51:22 talon kernel: [   23.256725] iscsi: registered transport
(cxgb3i)
Oct  6 10:51:22 talon kernel: [   23.499049] iscsi: registered transport
(bnx2i)
Oct  6 10:51:23 talon kernel: [   23.778571] iscsi: registered transport
(be2iscsi)
Oct  6 10:51:23 talon iscsid: iSCSI logger with pid=1027 started!
Oct  6 10:51:23 talon kernel: [   23.935001] iscsid (1028): /proc/1028/oom_adj
is deprecated, please use /proc/1028/oom_score_adj instead.
Oct  6 10:51:24 talon iscsid: transport class version 2.0-870. iscsid version
2.0-872
Oct  6 10:51:24 talon iscsid: iSCSI daemon with pid=1028 started!
Oct  6 10:51:25 talon iscsid: cannot make a connection to 192.168.1.235:3260
(-1,101)
Oct  6 10:51:25 talon iscsid: cannot make a connection to 192.168.1.235:3260
(-1,101)
Oct  6 10:51:25 talon systemd[1]: iscsi.service: control process exited,
code=exited status=1
Oct  6 10:51:25 talon systemd[1]: Unit iscsi.service entered failed state.
Oct  6 10:52:33 talon systemd[1]: Job home-iscsi_media.mount/start failed with
result 'dependency'.

Comment 17 Mike Christie 2011-10-13 20:05:12 UTC
(In reply to comment #16)
> Hi Mike, 
> 
> Thanks for your response.  Allow me to retest later tonight, but based on my
> previous testing:

Don't waste your time retesting.

Comment 18 Mike Christie 2011-10-14 07:13:19 UTC
Created attachment 528159 [details]
don't return succes if network is not up

Hey,

So it looks like the iscsi service fails because the login failed in your log. This is expected because it looks like the network is not up. What should be happening is that if you not using the network service and were just using NetworkManager then when that gets the network going the iscsi script should be called and it should login ok. That last part does not seem to be working though.

The problem seems to be that if the iscsi script is called and the network is down that network check you saw in the init script returns 0. Then if you or the network manager code were to run the iscsi script again like

service iscsi start

then that might end up doing nothing. Something in the systemctl code seems to check if the iscsi script had returned 0/success before and if run again will not do anything.

However, if you/NM were to do

service iscsi restart

it would work.


Or if you change the exit value from 0 to 1 in /etc/init.d/iscsi:

    [ ! -f /var/lock/subsys/network ] && ! nm-online -x >/dev/null 2>&1 && exit 1

it looks like it works again. When NM starts up the network and does iscsi start then that will actually run the iscsi script again.


Some of your other runs seem to work because we hit a different failure case in the iscsi script where a non-zero return code is returned. In that case systemctl will allow the iscsi script to be run later like when NM calls us.

Could you try the attached patch? To apply do:

#cd /etc/init.d/
#cp iscsi ~/
#patch -p1 -i /path/to/patch/iscsi-init-dont-return-zero.patch

Set your init script on/off settings back to normal. Then reboot your box and see if we login ok.

Comment 19 Mike Christie 2011-10-14 07:16:04 UTC
(In reply to comment #18)
> 
> However, if you/NM were to do
> 
> service iscsi restart
> 
> it would work.
> 

Actually in some of your test cases that will not work. I think for the case where the iscsi init script returned non-zero, restart will not run and in that case iscsi start would work like you saw. The patch I attached should work for all cases though, so please try that out.

Comment 20 Ron Gonzalez 2011-10-14 14:46:18 UTC
I would be glad to try the patch, 

However I must point out that everything works correctly when iscsid (as opposed to the iscsi and iscsid services both enabled) is off.

I am concious of the fact that iscsi will fail with Network Manager enabled initially, and that a hook is present that will call iscsi back when the network comes up.

The login that failed in my log is the iscsid service, and not iscsi.  As you can see from the systemctl status output, when iscsid starts and fails, later when iscsi tries to start it cannot, versus when iscsid is disabled, when iscsi starts it actually launches iscsid like so:

[root@talon log]# systemctl status iscsi.service
iscsi.service - LSB: Starts and stops login and scanning of iSCSI devices.
          Loaded: loaded (/etc/rc.d/init.d/iscsi)
          Active: active (running) since Thu, 06 Oct 2011 11:15:15 -0400; 4min
18s ago
         Process: 1077 ExecStart=/etc/rc.d/init.d/iscsi start (code=exited,
status=0/SUCCESS)
          CGroup: name=systemd:/system/iscsi.service
                  ├ 1180 iscsid
                  └ 1181 iscsid

[root@talon log]# systemctl status iscsid.service
iscsid.service - LSB: Starts and stops login iSCSI daemon.
          Loaded: loaded (/etc/rc.d/init.d/iscsid)
          Active: inactive (dead)
          CGroup: name=systemd:/system/iscsid.service


Your post isn't exactly clear in explaining this, and so I humbly ask for a clarification on your post #18.

Thanks,

Ron

Comment 21 Mike Christie 2011-10-14 21:36:08 UTC
Just to be clear I am not adddressing the lvm  issue yet. I am just working on the iscsi sessions coming up correctly automatically.

Or maybe we are hitting different bugs.


(In reply to comment #20)
> I would be glad to try the patch, 
> 
> However I must point out that everything works correctly when iscsid (as
> opposed to the iscsi and iscsid services both enabled) is off.

The basic problem is that the iscsi script reports success even though it has not logged into anything for the network not up case, and systemctl will not allow you to start a service twice if the first time was successful (the second time is a no op) (we also do/did this for LSB init scripts by checking the /var/lock/subsys files in the past). So if during startup, the network is not up yet, but the iscsi service is run we return success. Later when the iscsi script is run and the network is up, the iscsi script will fail to login because is not actually run again.

iscsid is just the iscsi daemon that does the login operation. It sends iscsi login pdus and does things like authentication like CHAP. It does not initiate any logins in normal mode (in discoveryd mode it does not it does not look like you are using that are you?).

iscsi basically asks iscsid to login to specific targets.

So if you are not using discoveryd mode then iscsid can be up when the netowrk is not and it should not be a problem, because we are not actually using the network at that time. We do not use it until the iscsi service is start and that starts to ask iscsid to login.

For discoveryd support we need to add network checks to the iscsid script and add network manager support.


> 
> I am concious of the fact that iscsi will fail with Network Manager enabled
> initially, and that a hook is present that will call iscsi back when the
> network comes up.

In my previous comment I am saying that hook is not working, because of that service started successfully due to network being down mistake. When I changed the return value to 1 for the network check in the iscsi script then it is now working for me with the default chkconfig settings.

It is hard for me to hit the race. So I simulated this by stopping the iscsi service, ifdowning the iface, starting the iscsi service, then ifuping the iface. The NM stuff should be calling the iscsi service and logging in but that fails to login because the iscsi script is not run the second time. systemctl just returns success right away.

> 
> The login that failed in my log is the iscsid service, and not iscsi.  As you


Which log snippet are you referring to? We might be talking about different ones or different things. I mean something like this:

Oct  6 10:51:25 talon iscsid: cannot make a connection to 192.168.1.235:3260
(-1,101)

is from iscsid, but it is really iscsi that is going to fail, because the login will fail. At that point iscsid should have already started ok, because the iscsi service needs it before it starts.

Also in the logging sometimes the kernel or iscsid will use the "iscsi" prefix when it should be using iscsid or something like kernel iscsi, so it gets messy to follow what is coming from where.


> can see from the systemctl status output, when iscsid starts and fails, later

I do not see that it fails. I just see that it is not started.

> 
> [root@talon log]# systemctl status iscsid.service
> iscsid.service - LSB: Starts and stops login iSCSI daemon.
>           Loaded: loaded (/etc/rc.d/init.d/iscsid)
>           Active: inactive (dead)
>           CGroup: name=systemd:/system/iscsid.service
> 

One reason I can think for why this works is that the iscsi service runs iscsiadm and iscsiadm will detect if iscsid is not running. If it is not then it will do a "service iscsid force-start". This then changes the timing, so by the time iscsiadm gets iscsid running the network could be up.


Also I think with the network manager case fixed I think you still might have errors with the lvm setup, so I am just addressing one item at a time.

Comment 22 Mike Christie 2011-10-14 21:41:47 UTC
(In reply to comment #21)
> > can see from the systemctl status output, when iscsid starts and fails, later
> 
> I do not see that it fails. I just see that it is not started.

Oops. Ignore that comment above referring to the bottom snippet. I misread your comment. But I do not see any systemctl output that shows iscsid failing. I just see the ones where you do not have it setup to run so it is inactive/dead.



> 
> > 
> > [root@talon log]# systemctl status iscsid.service
> > iscsid.service - LSB: Starts and stops login iSCSI daemon.
> >           Loaded: loaded (/etc/rc.d/init.d/iscsid)
> >           Active: inactive (dead)
> >           CGroup: name=systemd:/system/iscsid.service
> > 
>

Comment 23 Ron Gonzalez 2011-10-14 22:24:30 UTC
Mike,

I believe you have a clear understanding of this, thank you for explaining. I will try your patch as soon as possible.  What conditions are you looking for me to test with?

Should I change iscsid to on for runlevel 5, leave iscsi and network manager as is?

Comment 24 Mike Christie 2011-10-16 18:10:50 UTC
(In reply to comment #23)
> Mike,
> 
> I believe you have a clear understanding of this, thank you for explaining. I
> will try your patch as soon as possible.  What conditions are you looking for
> me to test with?

Try with your initial default setup you were using.

> 
> Should I change iscsid to on for runlevel 5, leave iscsi and network manager as
> is?

Yeah.


Oh yeah, I was looking at some of your log outpout and it looks like in one run you might have been hitting another bug. It looks like you might need to also set node.session.initial_login_retry_max higher. To do this do:

0. Logout of iscsi targets if logged in

service iscsi stop

1. open /etc/iscsi/iscsid.conf
2. Search for and set

node.session.initial_login_retry_max = 30

That will make sure new records get that value.

3. Update existing records:

iscsiadm -m node -o update -n node.session.initial_login_retry_max -v 30

5. Reboot box to see if that helps too.

You might need the patch and update the settings, because it looks like in one run you passed the iscsi init script NM check, but in the logs it looked like we still were not able to connect. This happens when the actual network interface is up, but the network layer is not completely setup (for example it might be doing some spanning tree stuff and so it needs some extra time before it will allow us to fully use the connection).

Comment 25 Ron Gonzalez 2011-10-17 15:24:18 UTC
Hi Mike,

I did not have any success with the aforementioned changes.

Here is what I did:

1. enable iscsid
2. change /etc/iscsi to reflect patch changes (exit 1)
3. changed /etc/iscsi/iscsid.conf to reflect retry max 30
4. updated existing records using iscsiadm -v 30 command
5. rebooted

iscsi starts but vgscan displays volume as inactive.

I still need to run vgchange -ay and manually mount /home/iscsi_media.


Oct 17 11:02:25 talon systemd[1]: Job remote-fs.target/start failed with result 'dependency'.
Oct 17 11:02:25 talon systemd[1]: Job home-iscsi_media.mount/start failed with result 'dependency'.


I got frustrated with this and so I yum removed network manager, I don't need network manager and quite honestly I am not happy with system.d at all.  If I could remove it I would.

I ended up modifying /etc/init.d/iscsi like so:
[ ! -f /var/lock/subsys/network ] && exit 0

This is still broken for me, my iscsi mount refuses to start, apparently I would need to add vgchange -ay manually to rc.local, which is a total hack.

Please see 708574.

Thanks for all of your help.

Comment 26 Ron Gonzalez 2011-10-17 15:37:12 UTC
I added vgchange -ay to rc.local and all works now.


[root@talon log]# chkconfig --list iscsid

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

iscsid          0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@talon log]# chkconfig --list iscsi

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

iscsi           0:off   1:off   2:off   3:off   4:off   5:on    6:off


/etc/init.d/iscsi  (modifications)

    # if the network isn't up yet exit cleanly, NetworkManager will call us
    # again when the network is up
    [ ! -f /var/lock/subsys/network ] && exit 0



[root@talon log]# chkconfig --list NetworkManager
error reading information on service NetworkManager: No such file or directory

[root@talon log]# chkconfig --list network
network         0:off   1:off   2:off   3:on    4:on    5:on    6:off





OLD /var/log/messages
**********************************Not Working*******************************
Oct 17 10:59:53 talon iscsid: Connection2:0 to [target: iqn.2006-01.com.openfiler:tsn.a31ab453e7db, portal: 192.168.1.235,3260] through [iface: default] is shutdown.
Oct 17 10:59:53 talon iscsid: Connection1:0 to [target: iqn.2006-01.com.openfiler:tsn.d2781967b5e0, portal: 192.168.1.235,3260] through [iface: default] is shutdown.
Oct 17 10:59:59 talon iscsid: Could not set session3 priority. READ/WRITE throughout and latency could be affected.
Oct 17 10:59:59 talon iscsid: Could not set session4 priority. READ/WRITE throughout and latency could be affected.
Oct 17 10:59:59 talon iscsid: Connection3:0 to [target: iqn.2006-01.com.openfiler:tsn.d2781967b5e0, portal: 192.168.1.235,3260] through [iface: default] is operational now
Oct 17 10:59:59 talon iscsid: Connection4:0 to [target: iqn.2006-01.com.openfiler:tsn.a31ab453e7db, portal: 192.168.1.235,3260] through [iface: default] is operational now
Oct 17 11:01:19 talon kernel: [   32.433027] iscsi: registered transport (tcp)
Oct 17 11:01:19 talon kernel: [   32.461850] iscsi: registered transport (iser)
Oct 17 11:01:20 talon kernel: [   32.870297] iscsi: registered transport (cxgb3i)
Oct 17 11:01:20 talon kernel: [   32.917414] iscsi: registered transport (bnx2i)
Oct 17 11:01:20 talon kernel: [   32.994884] iscsi: registered transport (be2iscsi)
Oct 17 11:01:20 talon iscsid: iSCSI logger with pid=1498 started!
Oct 17 11:01:20 talon kernel: [   33.069743] iscsid (1499): /proc/1499/oom_adj is deprecated, please use /proc/1499/oom_score_adj instead.
Oct 17 11:01:21 talon iscsid: transport class version 2.0-870. iscsid version 2.0-872
Oct 17 11:01:21 talon iscsid: iSCSI daemon with pid=1499 started!
Oct 17 11:01:22 talon iscsid: Could not set session1 priority. READ/WRITE throughout and latency could be affected.
Oct 17 11:01:22 talon iscsid: Could not set session2 priority. READ/WRITE throughout and latency could be affected.
Oct 17 11:01:22 talon iscsid: Connection1:0 to [target: iqn.2006-01.com.openfiler:tsn.d2781967b5e0, portal: 192.168.1.235,3260] through [iface: default] is operational now
Oct 17 11:01:22 talon iscsid: Connection2:0 to [target: iqn.2006-01.com.openfiler:tsn.a31ab453e7db, portal: 192.168.1.235,3260] through [iface: default] is operational now
Oct 17 11:02:25 talon systemd[1]: Job home-iscsi_media.mount/start failed with result 'dependency'.






**********************************WORKING***********************************
Oct 17 11:26:34 talon kernel: [   29.435237] iscsi: registered transport (tcp)
Oct 17 11:26:34 talon kernel: [   29.470161] iscsi: registered transport (iser)
Oct 17 11:26:34 talon kernel: [   29.563028] iscsi: registered transport (cxgb3i)
Oct 17 11:26:34 talon kernel: [   29.590903] iscsi: registered transport (bnx2i)
Oct 17 11:26:35 talon kernel: [   29.919418] iscsi: registered transport (be2iscsi)
Oct 17 11:26:35 talon iscsid: iSCSI logger with pid=1555 started!
Oct 17 11:26:35 talon kernel: [   30.078296] iscsid (1557): /proc/1557/oom_adj is deprecated, please use /proc/1557/oom_score_adj instead.
Oct 17 11:26:36 talon iscsid: transport class version 2.0-870. iscsid version 2.0-872
Oct 17 11:26:36 talon iscsid: iSCSI daemon with pid=1557 started!
Oct 17 11:26:37 talon iscsid: Could not set session1 priority. READ/WRITE throughout and latency could be affected.
Oct 17 11:26:37 talon iscsid: Could not set session2 priority. READ/WRITE throughout and latency could be affected.
Oct 17 11:26:37 talon iscsid: Connection1:0 to [target: iqn.2006-01.com.openfiler:tsn.d2781967b5e0, portal: 192.168.1.235,3260] through [iface: default] is operational now
Oct 17 11:26:37 talon iscsid: Connection2:0 to [target: iqn.2006-01.com.openfiler:tsn.a31ab453e7db, portal: 192.168.1.235,3260] through [iface: default] is operational now

Comment 27 Ron Gonzalez 2011-10-17 16:26:21 UTC
Hi Mike I have discovered this:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498616

Comment 28 Ron Gonzalez 2011-10-17 16:32:18 UTC
Please note that in the above link they also advocate umounting any lvms coming from pv's on iscsi targets.

Comment 29 Ron Gonzalez 2011-10-17 22:22:09 UTC
Created attachment 528664 [details]
New File:
/etc/sysconfig/open-iscsi
In which to list Volume Groups to Mount Automatically for iscsi initiator on boot.

Comment 30 Ron Gonzalez 2011-10-17 22:23:15 UTC
Created attachment 528665 [details]
/etc/init.d/iscsi patched with LVM Groups automount.

Comment 31 Ron Gonzalez 2011-10-17 22:24:15 UTC
Please see the two attachments to see how I resolved this problem. 

Why didn't these changes in debian make it back into Fedora?

Open-Iscsi is after all open-iscsi?

Thanks again,

Ron

Comment 32 Ron Gonzalez 2011-10-17 22:24:48 UTC
Comment on attachment 528665 [details]
/etc/init.d/iscsi patched with LVM Groups automount.

iscsi patched with LVM Groups automount and Umount on shutdown.

Comment 33 Ron Gonzalez 2011-10-17 23:39:41 UTC
Created attachment 528674 [details]
/etc/init.d/iscsi auto mount LV's on Iscsi target PV's

this is updated to attempt to mount and dismount within the script.

Comment 34 Mike Christie 2011-10-18 01:37:56 UTC
Yeah, so remember in comment #21 where I was saying I wanted to tackle the problem of iscsi starting up correctly :(

We had hoped that using systemd would be some magic solution to the NM case where part of the startup is not done in a normal synchronized order. So when you use just the network init script the iscsi script will wait for the network to be up. Then lvm and mounts are done for network drives by the netfs script. If you use NM then NM will call iscsi start, but I do not think scripts like /etc/init.d/netfs are also run.

The reason the debian changes are not in fedora is because I do not think debian developers ever sent them upstream (I maintain iscsi upstream and do not remember ever seeing them send the patches). But, there is also a wider problem. If say you were doing iscsi + multipath + lvm then before you do the lvm scan you need to make sure multipath is done setting things up. Or same for RAID or crypto devices using iscsi or whatever else dm/md devices you can put on top of iscsi. So the hope was instead of adding all those calls to the iscsi script we hoped systemd would handle it. Before systemd we relied on things like the netfs and other scripts being called.

Let me looking into seeing if properly fully integrating iscsi into systemd will help. If not then we might just have to add lvm/dm/md calls.

Thanks for all your work on this.

Comment 35 Ron Gonzalez 2011-10-18 01:58:28 UTC
Thanks Mike,

I have added back Network Manager and removed the vgchange -ay in rc.local. 
Your patch as it stands is also applied, as you can see from the iscsi service script attached.

So this means that the (your) iscsi patch works, I just didn't realize it originally.

With the changes I've made to the iscsi service script, and including the /etc/sysconfig/open-iscsi file (which consists of mostly ideas taken from the debian changes [link provided]), I now have restored Network Manager and systemd to expected functionality, and my iscsi mounts are available on startup.

However, I would like to figure out how to modify the iscsi service script to remove the iscsi mount when I stop the iscsi service.  Right now the problem I have is that umount'ing the mount point for the LV yields weird errors (like the ones shown below) when I do service iscsi stop.

Also if you look at the lvdisplay you notice that the device has been incremented, to /dev/sdd, where as normally it uses /dev/sdc.  I am not an expert on this by any stretch of the imagination, so I am loathe to use any of the remove commands (pv, lv, vg, etc.) to remove any duplicates.

Any guidance on this would be appreciated.

[root@talon iscsi_media]# lvscan
  /dev/Media/Media: read failed after 0 of 4096 at 3000600821760: Input/output error
  /dev/Media/Media: read failed after 0 of 4096 at 3000600879104: Input/output error
  /dev/Media/Media: read failed after 0 of 4096 at 0: Input/output error
  /dev/Media/Media: read failed after 0 of 4096 at 4096: Input/output error
  ACTIVE            '/dev/vg_talon/usr_LogVol02' [39.66 GiB] inherit
  ACTIVE            '/dev/vg_talon/secure_LogVol03' [19.53 GiB] inherit
  ACTIVE            '/dev/vg_talon/home_LogVol01' [175.78 GiB] inherit
  ACTIVE            '/dev/vg_talon/root_LogVol00' [40.00 GiB] inherit


[root@talon home]# lvdisplay
  /dev/Media/Media: read failed after 0 of 4096 at 3000600821760: Input/output error
  /dev/Media/Media: read failed after 0 of 4096 at 3000600879104: Input/output error
  /dev/Media/Media: read failed after 0 of 4096 at 0: Input/output error
  /dev/Media/Media: read failed after 0 of 4096 at 4096: Input/output error
  Found duplicate PV 0Sr9Gk3S33xrlGukU9J9ljlBzW2lGaMI: using /dev/sdd not /dev/sdb
  --- Logical volume ---
  LV Name                /dev/Media/Media
  VG Name                Media
  LV UUID                CXHKOD-BXoY-SKlG-98Cm-Gp2Z-7x3b-GqBjuX
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                2.73 TiB
  Current LE             715399
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:5

Comment 36 Ron Gonzalez 2011-10-18 02:05:47 UTC
If you need me to do additional testing related to your comment #21 please let me know, but I think the original issue is now cured, if you refer back to my log details in comment #26 : you see this:

OLD /var/log/messages
**********************************Not Working*******************************
Oct 17 11:01:22 talon iscsid: Connection1:0 to [target:
iqn.2006-01.com.openfiler:tsn.d2781967b5e0, portal: 192.168.1.235,3260] through
[iface: default] is operational now
Oct 17 11:01:22 talon iscsid: Connection2:0 to [target:
iqn.2006-01.com.openfiler:tsn.a31ab453e7db, portal: 192.168.1.235,3260] through
[iface: default] is operational now
Oct 17 11:02:25 talon systemd[1]: Job home-iscsi_media.mount/start failed with
result 'dependency'.


Which means that the iscsi media mount was failing because the vgchange -ay wasn't yet in the iscsi init script.  But it seems like the targets were logged in.  I want to remind you that I still have iscsid turned off, (really no need for me to enable it, unless you want me to continue testing with it enabled.)

I am willing to perform any tests necessary to assist, and thank you for your assistance as well.

Comment 37 Ron Gonzalez 2012-01-30 23:16:14 UTC
Hi Mike,

F16 still has the same behavior, my Volume Groups don't mount.

We need to scan for the presence of LV's VG's or whatever on iscsi nodes on bootup.

Do you have any final recommendations for me?  A recommended method of dealing with systemd filesystem mounts?

I am sure there are many others who would benefit from a potential official workaround.


Thanks,
Ron

Comment 38 Mike Christie 2012-02-04 09:13:48 UTC
Hey,

Checking to see what the LVM people have to say. I thought/think we were/are adding something that handles lvm devices asynchronously similar to how dm-multipath devices are assembled.

Comment 39 Ron Gonzalez 2012-07-19 21:08:31 UTC
Here is the device entry in my fstab

/dev/Media/Media        /home/iscsi_media               ext4    _netdev                 0 0

Lvdisplay
[root@talon etc]# lvdisplay
  Found duplicate PV 0Sr9Gk3S33xrlGukU9J9ljlBzW2lGaMI: using /dev/sdc not /dev/sdb
  --- Logical volume ---
  LV Path                /dev/Media/Media
  LV Name                Media
  VG Name                Media
  LV UUID                CXHKOD-BXoY-SKlG-98Cm-Gp2Z-7x3b-GqBjuX
  LV Write Access        read/write
  LV Creation host, time ,
  LV Status              NOT available
  LV Size                2.73 TiB
  Current LE             715399
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

pvdisplay

[root@talon etc]# pvdisplay
  Found duplicate PV 0Sr9Gk3S33xrlGukU9J9ljlBzW2lGaMI: using /dev/sdc not /dev/sdb
  --- Physical volume ---
  PV Name               /dev/sdc
  VG Name               Media
  PV Size               2.73 TiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              715399
  Free PE               0
  Allocated PE          715399
  PV UUID               0Sr9Gk-3S33-xrlG-ukU9-J9lj-lBzW-2lGaMI

Here is what seems to be the problem:

[root@talon etc]# lvscan
  Found duplicate PV 0Sr9Gk3S33xrlGukU9J9ljlBzW2lGaMI: using /dev/sdc not /dev/sdb
  inactive          '/dev/Media/Media' [2.73 TiB] inherit

It is inactive on boot, why do I need to run vgchange -ay all the time to make the volume active?  

I think this should be taken care of automagically no?

Comment 40 Peter Rajnoha 2012-10-22 10:24:45 UTC
(In reply to comment #38)
> Checking to see what the LVM people have to say. I thought/think we were/are
> adding something that handles lvm devices asynchronously similar to how
> dm-multipath devices are assembled.

We have lvmetad together with event-based activation support upstream now and in current Fedora rawhide (F19). This should support activating volume groups or separate logical volumes based on activation/auto_activation_volume_list in lvm.conf (by default, all complete VGs are activated). Currently, this is not supported for incomplete or cluster VGs. You need to have lvmetad enabled to make use of this (global/use_lvmetad=1 lvm.conf setting).

Comment 41 Ron Gonzalez 2012-10-22 13:46:38 UTC
(In reply to comment #40)
> (In reply to comment #38)
> > Checking to see what the LVM people have to say. I thought/think we were/are
> > adding something that handles lvm devices asynchronously similar to how
> > dm-multipath devices are assembled.
> 
> We have lvmetad together with event-based activation support upstream now
> and in current Fedora rawhide (F19). This should support activating volume
> groups or separate logical volumes based on
> activation/auto_activation_volume_list in lvm.conf (by default, all complete
> VGs are activated). Currently, this is not supported for incomplete or
> cluster VGs. You need to have lvmetad enabled to make use of this
> (global/use_lvmetad=1 lvm.conf setting).

Can you provide us with a clearer understanding here:
A. you mention F19, so this means that this capability to automount volume groups on iscsi devices was not previously available?
B. Is my question A above related to the move to SystemD, I thought my VG was mounting just fine before the transition.
C. you mention "complete VG's" and incomplete VG's, can you provide documentation or an example of what you are referring to here?
D. LVMETAD is this new in F19 or is this available in earlier releases?

Comment 42 Peter Rajnoha 2013-01-02 09:48:26 UTC
(In reply to comment #41)
> Can you provide us with a clearer understanding here:
> A. you mention F19, so this means that this capability to automount volume
> groups on iscsi devices was not previously available?

Before lvmetad (and accompanying LVM autoactivation feature that goes with lvmetad), the volume groups were activated only by direct "vgchange -ay" command call (found in /etc/rc.sysinit, /etc/init.d/netfs and /etc/init.d/clvmd init script in non-systemd environment and in fedora-storage-init(-late).service and clvmd.service in systemd environment).

> B. Is my question A above related to the move to SystemD, I thought my VG
> was mounting just fine before the transition.

It all depends on ordering of scripts/services during boot process. In older SysV init boot, the sequence is firm and serially ordered:

 1. initscripts' rc.sysinit (with the 1st vgchange -ay call)
 2. iscsi init script
 3. initscripts' netfs script (with the 2nd and *conditional* vgchange -ay call - condition based on whether the _netdev option is used in fstab)
 4. lvm's clvmd script (with the 3rd vgchange -ay call)

So there are 2 direct LVM activations done after iscsi init. Anyway, this is not entirely correct as you can set up your iscsi devices even after the boot is complete, during normal system run. In this case, you would need to call an extra vgchange -ay to activate the LVM volumes. The autoactivation feature removes a need for this extra manual step and makes the activation much more robust!

However, this is a bit different in systemd which adds much more parallelism into the boot process. The initscripts package provided the fedora-storage-init.service and fedora-storage-init-late.service for systemd to call the vgchange -ay (as a replacement for the rc.sysinit part of the LVM activation). The clvmd.service is called from systemd in SysV compatibility mode as clvmd does not have its own systemd unit yet, but only the older SysV init script (we'll port this one soon and). The netfs script is gone in systemd environment.

The LVM autoactivation is able to work with SysV init scripts (in RHEL) as well as with systemd (in F18+, enabled by default in F19+).

> C. you mention "complete VG's" and incomplete VG's, can you provide
> documentation or an example of what you are referring to here?

You can find more info in the lvm man page (man 8 lvm) and the "-P | --partial"
description. Simply, the partial mode means that not all PVs making up the VGs are necessary to activate the VG...

> D. LVMETAD is this new in F19 or is this available in earlier releases?

The lvmetad is also available in F18, but it's not enabled by default like it is in F19+/rawhide. To enable lvmetad, you need to set use_lvmetad=1 in /etc/lvm.conf (and lvm2-lvmetad init script to be enabled in SysV environment, you don't need to enable the lvm2-lvmetad.service in systemd environment as it is socket-activated).

Comment 43 Peter Rajnoha 2013-01-02 09:54:10 UTC
The LVM autoactivation with the help of lvmetad activates the volumes as soon as all the PVs for the VG are in place - this is all event-based as we're listening to udev events for newly appeared PVs. This is not supported without lvmetad as lvmetad is needed to store the state about collected PVs and it makes the decision whether we're able to activate LVM volumes (besides this feature, lvmetad also caches metadata so there is less direct access to devices).

Comment 44 Fedora End Of Life 2013-07-04 00:18:24 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 45 Fedora End Of Life 2013-08-01 03:50:58 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.