Created attachment 898086 [details] iscsid strace showing failure Description of problem: Can't start iscsid in a docker container Version-Release number of selected component (if applicable): # rpm -q docker-io docker-io-0.11.1-3.fc20.x86_64 How reproducible: always Steps to Reproduce: 1. use the published fedora image, docker pull fedora 2. start the container, docker run -t -i fedora /bin/bash 3. install iscsi-initiator-utils 4. try to start iscsid: bash-4.2# iscsid -f iscsid: can not bind NETLINK_ISCSI socket strace also attached
To give a little more context into what I'm doing, I'm trying to run OpenStack nova compute configured to use the nova-baremetal driver inside a container. when nova-baremetal provisions a machine it acts as an iscsi initiator and logs into a target that has been created on the machine that is being provisioned. It then dd's the requested image onto the disk. therefore, aiui, iscsid must be running inside the container where you are also running iscsiadm. This same thing has also been tried in lxc, with what I expect is the same issue: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1226855
Was SELinux involved? If you put the machine into permissive mode does it work? Or try this as a privleged image. Might be something that we are doing to lock the system down.
What are the permissions on /opt/hello ls -ld /opt ls -ld /opt/hello
SELinux is already in permissive mode on the Docker host. I did try in a privileged container, and I get something slightly different. iscsid -f just hangs forever on the command line. An strace shows (attached) it polling forever on a fd, i had to ctrl-c it in both cases.
Created attachment 900105 [details] iscsid strace showing failure from a privileged container
i think comment 3 was for another bug maybe? anyway, if not: bash-4.2# ls -ld /opt drwxr-xr-x. 2 root root 4096 Aug 7 2013 /opt bash-4.2# ls -ld /opt/hello ls: cannot access /opt/hello: No such file or directory
ah....actually, i suspect iscsid running forever in the foreground may indicate it *is* working. sorry, i wasn't thinking that -f was telling it to run in the foreground. i will see if i can actually connect to a target from the privileged container and report back
I'm using a privileged container running sshd as the process (so that I can login with a couple of different shells), I had to add this to my Dockerfile for the container, otherwise iscsid won't start: VOLUME ["/var/lock/iscsi"] i start the conatiner with: docker run --privileged -ti --name initiator -p 8022:22 -d iscsi-initiator then ssh in, and i start iscsid: iscsid -d 8 -f that appears to start fine then ssh in another session, and i can discover the target (target is actually on the container host): [root@bcf697cb8673 ~]# iscsiadm -m discovery -t st -p 192.168.122.1 192.168.122.1:3260,1 iqn.2013-07.com.example.storage.ssd1 but when i try to login to the target, iscsid exits (or crashes, hard to tell): [root@bcf697cb8673 ~]# iscsiadm -m node --targetname iqn.2013-07.com.example.storage.ssd1 --portal 192.168.122.1 --login Logging in to [iface: default, target: iqn.2013-07.com.example.storage.ssd1, portal: 192.168.122.1,3260] (multiple) iscsiadm: got read error (0/0), daemon died? iscsiadm: Could not login to [iface: default, target: iqn.2013-07.com.example.storage.ssd1, portal: 192.168.122.1,3260]. iscsiadm: initiator reported error (18 - could not communicate to iscsid) iscsiadm: Could not log into all portals from the ssh session where iscsid is running i see: iscsid: mgmt_ipc_write_rsp: rsp to fd 5 iscsid: poll result 1 iscsid: mgmt_ipc_write_rsp: rsp to fd 5 iscsid: poll result 1 iscsid: in read_transports iscsid: Adding new transport tcp iscsid: Matched transport tcp iscsid: sysfs_attr_get_value: open '/class/iscsi_transport/tcp'/'handle' iscsid: sysfs_attr_get_value: new uncached attribute '/sys/class/iscsi_transport/tcp/handle' iscsid: sysfs_attr_get_value: add to cache '/sys/class/iscsi_transport/tcp/handle' iscsid: sysfs_attr_get_value: cache '/sys/class/iscsi_transport/tcp/handle' with attribute value '18446744072107593760' iscsid: sysfs_attr_get_value: open '/class/iscsi_transport/tcp'/'caps' iscsid: sysfs_attr_get_value: new uncached attribute '/sys/class/iscsi_transport/tcp/caps' iscsid: sysfs_attr_get_value: add to cache '/sys/class/iscsi_transport/tcp/caps' iscsid: sysfs_attr_get_value: cache '/sys/class/iscsi_transport/tcp/caps' with attribute value '0x39' iscsid: Allocted session 0x7f38ceb4f9b0 iscsid: no authentication configured... iscsid: resolved 192.168.122.1 to 192.168.122.1 iscsid: setting iface default, dev , set ip , hw , transport tcp. iscsid: get ev context 0x7f38ceb5c470 iscsid: set TCP recv window size to 524288, actually got 425984 iscsid: set TCP send window size to 524288, actually got 425984 iscsid: connecting to 192.168.122.1:3260 iscsid: sched conn context 0x7f38ceb5c470 event 2, tmo 0 iscsid: thread 0x7f38ceb5c470 schedule: delay 0 state 3 iscsid: Setting login timer 0x7f38ceb578e0 timeout 15 iscsid: thread 0x7f38ceb578e0 schedule: delay 60 state 3 iscsid: exec thread 7f38ceb5c470 callback iscsid: put ev context 0x7f38ceb5c470 iscsid: connected local port 37259 to 192.168.122.1:3260 iscsid: in kcreate_session iscsid: in __kipc_call iscsid: in kwritev iscsid: sendmsg: bug? ctrl_fd 4 Maybe the lines with sysfs_attr_get_value are indicative of something that's needed from /sys still? These exact same discovery and login commands work fine running from a libvirt vm connecting to the same target. On my container host, i do have the correct iscsi kernel modules, and I also see these in the container: on the host: [root@teletran-1 docker]# lsmod | grep iscsi iscsi_tcp 18333 0 libiscsi_tcp 24176 1 iscsi_tcp libiscsi 54750 2 libiscsi_tcp,iscsi_tcp scsi_transport_iscsi 97405 4 iscsi_tcp,libiscsi on the container: [root@bcf697cb8673 ~]# lsmod | grep iscsi iscsi_tcp 18333 0 libiscsi_tcp 24176 1 iscsi_tcp libiscsi 54750 2 libiscsi_tcp,iscsi_tcp scsi_transport_iscsi 97405 4 iscsi_tcp,libiscsi i'll attach an strace of the iscsid process that's exiting, if that helps. i can also attach an strace of an iscsid process that shows it working from a libvirt VM, if you think that would be helpful to compare.
Created attachment 900353 [details] iscsid strace showing failure from a privileged container strace of iscsid generated with: strace -f -o iscsid.strace iscsid -d 12 -f the iscsid process exits when you try to login to an iscsi target from the container.
note that the output i'm now seeing seems to match very closely what was reported in https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1226855 when this same thing was tried with lxc-tools instead of Docker.
So this looks to be more specific to iscsid and namespacing then to docker. I think you should open the bug with them and see if they can help.
It looks like the iSCSI netlink control interface isn't namespace aware, and the kernel side of the iSCSI initiator rejects messages that don't come from the default network namespace. I suppose it might make sense to track active sessions per netns.
*** Bug 1102911 has been marked as a duplicate of this bug. ***
FYI I was playing with running iscsid in superprivileged container: https://github.com/rvykydal/dockerfile-iscsid/tree/master/rhel7
I've spent some more time looking at running iscsid in a network namespace container, and there are a number of kernel issues that need to be worked out. The kernel side of the iSCSI netlink family only listens in the initial network namespace (just like some other storage related netlink families). It's easy enough to add per-namespace kernel sockets, a bit more work to associate network namespaces to iSCSI objects in order to route async event notifications to the right place. iSCSI makes heavy use of sysfs as well, and many of the iSCSI sysfs devices will need network namespace tags for filtered views of sysfs. I think that makes sense for iscsi_host, iscsi_session, and iscsi_connection. Certainly not for iscsi_transport. Possibly for iscsi_endpoint and iscsi_iface. Without growing some way to assign an iscsi_host to a net namespace like is done for network devices, this will probably work for dynamically generated hosts (iscsi_tcp) but not for offload hardware. I can imagine use cases where it might be nice to have multiple iscsid instances in their own containers managing their own set of iSCSI sessions. I've got some work started, but it's not ready for review and testing just yet. Without that, I don't see how we can get to multiple working iscsid processes or even a single iscsid running without --net=host
This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
(In reply to Chris Leech from comment #15) > I've spent some more time looking at running iscsid in a network namespace > container, and there are a number of kernel issues that need to be worked > out. > > The kernel side of the iSCSI netlink family only listens in the initial > network namespace (just like some other storage related netlink families). > It's easy enough to add per-namespace kernel sockets, a bit more work to > associate network namespaces to iSCSI objects in order to route async event > notifications to the right place. > > iSCSI makes heavy use of sysfs as well, and many of the iSCSI sysfs devices > will need network namespace tags for filtered views of sysfs. I think that > makes sense for iscsi_host, iscsi_session, and iscsi_connection. Certainly > not for iscsi_transport. Possibly for iscsi_endpoint and iscsi_iface. > > Without growing some way to assign an iscsi_host to a net namespace like is > done for network devices, this will probably work for dynamically generated > hosts (iscsi_tcp) but not for offload hardware. > > I can imagine use cases where it might be nice to have multiple iscsid > instances in their own containers managing their own set of iSCSI sessions. > > I've got some work started, but it's not ready for review and testing just > yet. > > Without that, I don't see how we can get to multiple working iscsid > processes or even a single iscsid running without --net=host do you have working patch? I wanted to test this out to check the working?
(In reply to Vaibhav Khanduja from comment #17) > do you have working patch? I wanted to test this out to check the working? In progress patches sent via email.
(In reply to Chris Leech from comment #18) > (In reply to Vaibhav Khanduja from comment #17) > > do you have working patch? I wanted to test this out to check the working? > > In progress patches sent via email. Any progress on this patch?
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
Is this something we should continue to care about?
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle. Changing version to '25'.
(In reply to Chris Leech from comment #18) > (In reply to Vaibhav Khanduja from comment #17) > > do you have working patch? I wanted to test this out to check the working? > > In progress patches sent via email. Hi Chris, Do you have these patches ready? Thanks.
This message is a reminder that Fedora 25 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '25'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 25 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
Is this issue fixed in any of the recent releases?