Bug 1100000 - can't run iscsid in a docker container
Summary: can't run iscsid in a docker container
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: iscsi
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Chris Leech
QA Contact: Brock Organ
URL:
Whiteboard:
: 1102911 (view as bug list)
Depends On:
Blocks: 1102911 1416129 1997250 1997817
TreeView+ depends on / blocked
 
Reported: 2014-05-21 18:39 UTC by James Slagle
Modified: 2021-08-26 08:47 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1102911 1416129 (view as bug list)
Environment:
Last Closed: 2017-12-12 10:17:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
iscsid strace showing failure (5.63 KB, text/plain)
2014-05-21 18:39 UTC, James Slagle
no flags Details
iscsid strace showing failure from a privileged container (9.41 KB, text/plain)
2014-05-28 19:02 UTC, James Slagle
no flags Details
iscsid strace showing failure from a privileged container (24.65 KB, text/plain)
2014-05-29 13:37 UTC, James Slagle
no flags Details

Description James Slagle 2014-05-21 18:39:06 UTC
Created attachment 898086 [details]
iscsid strace showing failure

Description of problem:
Can't start iscsid in a docker container

Version-Release number of selected component (if applicable):

# rpm -q docker-io
docker-io-0.11.1-3.fc20.x86_64


How reproducible:
always

Steps to Reproduce:
1. use the published fedora image, docker pull fedora
2. start the container, docker run -t -i fedora /bin/bash
3. install iscsi-initiator-utils
4. try to start iscsid:

bash-4.2# iscsid -f
iscsid: can not bind NETLINK_ISCSI socket

strace also attached

Comment 1 James Slagle 2014-05-21 18:45:57 UTC
To give a little more context into what I'm doing, I'm trying to run OpenStack nova compute configured to use the nova-baremetal driver inside a container.

when nova-baremetal provisions a machine it acts as an iscsi initiator and logs into a target that has been created on the machine that is being provisioned. It then dd's the requested image onto the disk.

therefore, aiui, iscsid must be running inside the container where you are also running iscsiadm.

This same thing has also been tried in lxc, with what I expect is the same issue:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1226855

Comment 2 Daniel Walsh 2014-05-28 16:29:35 UTC
Was SELinux involved?  If you put the machine into permissive mode does it work?  Or try this as a privleged image.  Might be something that we are doing to lock the system down.

Comment 3 Daniel Walsh 2014-05-28 16:33:18 UTC
What are the permissions on /opt/hello

ls -ld /opt 
ls -ld /opt/hello

Comment 4 James Slagle 2014-05-28 19:01:53 UTC
SELinux is already in permissive mode on the Docker host.

I did try in a privileged container, and I get something slightly different. iscsid -f just hangs forever on the command line.

An strace shows (attached) it polling forever on a fd, i had to ctrl-c it in both cases.

Comment 5 James Slagle 2014-05-28 19:02:26 UTC
Created attachment 900105 [details]
iscsid strace showing failure from a privileged container

Comment 6 James Slagle 2014-05-28 19:03:16 UTC
i think comment 3 was for another bug maybe?  anyway, if not:

bash-4.2# ls -ld /opt 
drwxr-xr-x. 2 root root 4096 Aug  7  2013 /opt
bash-4.2# ls -ld /opt/hello
ls: cannot access /opt/hello: No such file or directory

Comment 7 James Slagle 2014-05-28 19:08:58 UTC
ah....actually, i suspect iscsid running forever in the foreground may indicate it *is* working.  sorry, i wasn't thinking that -f was telling it to run in the foreground.

i will see if i can actually connect to a target from the privileged container and report back

Comment 8 James Slagle 2014-05-29 13:34:32 UTC
I'm using a privileged container running sshd as the process (so that I can login with a couple of different shells), 

I had to add this to my Dockerfile for the container, otherwise iscsid won't start:
VOLUME ["/var/lock/iscsi"]

i start the conatiner with:
docker run --privileged -ti --name initiator -p 8022:22 -d iscsi-initiator

then ssh in, and i start iscsid:
iscsid -d 8 -f
that appears to start fine

then ssh in another session, and i can discover the target (target is actually on the container host):
[root@bcf697cb8673 ~]# iscsiadm -m discovery -t st -p 192.168.122.1
192.168.122.1:3260,1 iqn.2013-07.com.example.storage.ssd1

but when i try to login to the target, iscsid exits (or crashes, hard to tell):
[root@bcf697cb8673 ~]# iscsiadm -m node --targetname iqn.2013-07.com.example.storage.ssd1 --portal 192.168.122.1 --login
Logging in to [iface: default, target: iqn.2013-07.com.example.storage.ssd1, portal: 192.168.122.1,3260] (multiple)
iscsiadm: got read error (0/0), daemon died?
iscsiadm: Could not login to [iface: default, target: iqn.2013-07.com.example.storage.ssd1, portal: 192.168.122.1,3260].
iscsiadm: initiator reported error (18 - could not communicate to iscsid)
iscsiadm: Could not log into all portals

from the ssh session where iscsid is running i see:
iscsid: mgmt_ipc_write_rsp: rsp to fd 5
iscsid: poll result 1
iscsid: mgmt_ipc_write_rsp: rsp to fd 5
iscsid: poll result 1
iscsid: in read_transports
iscsid: Adding new transport tcp
iscsid: Matched transport tcp

iscsid: sysfs_attr_get_value: open '/class/iscsi_transport/tcp'/'handle'

iscsid: sysfs_attr_get_value: new uncached attribute '/sys/class/iscsi_transport/tcp/handle'

iscsid: sysfs_attr_get_value: add to cache '/sys/class/iscsi_transport/tcp/handle'

iscsid: sysfs_attr_get_value: cache '/sys/class/iscsi_transport/tcp/handle' with attribute value '18446744072107593760'

iscsid: sysfs_attr_get_value: open '/class/iscsi_transport/tcp'/'caps'

iscsid: sysfs_attr_get_value: new uncached attribute '/sys/class/iscsi_transport/tcp/caps'

iscsid: sysfs_attr_get_value: add to cache '/sys/class/iscsi_transport/tcp/caps'

iscsid: sysfs_attr_get_value: cache '/sys/class/iscsi_transport/tcp/caps' with attribute value '0x39'

iscsid: Allocted session 0x7f38ceb4f9b0
iscsid: no authentication configured...
iscsid: resolved 192.168.122.1 to 192.168.122.1
iscsid: setting iface default, dev , set ip , hw , transport tcp.

iscsid: get ev context 0x7f38ceb5c470
iscsid: set TCP recv window size to 524288, actually got 425984
iscsid: set TCP send window size to 524288, actually got 425984
iscsid: connecting to 192.168.122.1:3260
iscsid: sched conn context 0x7f38ceb5c470 event 2, tmo 0
iscsid: thread 0x7f38ceb5c470 schedule: delay 0 state 3
iscsid: Setting login timer 0x7f38ceb578e0 timeout 15
iscsid: thread 0x7f38ceb578e0 schedule: delay 60 state 3
iscsid: exec thread 7f38ceb5c470 callback
iscsid: put ev context 0x7f38ceb5c470
iscsid: connected local port 37259 to 192.168.122.1:3260
iscsid: in kcreate_session
iscsid: in __kipc_call
iscsid: in kwritev
iscsid: sendmsg: bug? ctrl_fd 4


Maybe the lines with sysfs_attr_get_value are indicative of something that's needed from /sys still?

These exact same discovery and login commands work fine running from a libvirt vm connecting to the same target.

On my container host, i do have the correct iscsi kernel modules, and I also see these in the container:
on the host:
[root@teletran-1 docker]# lsmod | grep iscsi
iscsi_tcp              18333  0 
libiscsi_tcp           24176  1 iscsi_tcp
libiscsi               54750  2 libiscsi_tcp,iscsi_tcp
scsi_transport_iscsi    97405  4 iscsi_tcp,libiscsi

on the container:
[root@bcf697cb8673 ~]# lsmod | grep iscsi
iscsi_tcp              18333  0 
libiscsi_tcp           24176  1 iscsi_tcp
libiscsi               54750  2 libiscsi_tcp,iscsi_tcp
scsi_transport_iscsi    97405  4 iscsi_tcp,libiscsi


i'll attach an strace of the iscsid process that's exiting, if that helps.
i can also attach an strace of an iscsid process that shows it working from a libvirt VM, if you think that would be helpful to compare.

Comment 9 James Slagle 2014-05-29 13:37:12 UTC
Created attachment 900353 [details]
iscsid strace showing failure from a privileged container

strace of iscsid generated with:
strace -f -o iscsid.strace iscsid -d 12 -f

the iscsid process exits when you try to login to an iscsi target from the container.

Comment 10 James Slagle 2014-05-29 14:08:33 UTC
note that the output i'm now seeing seems to match very closely what was reported in https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1226855 when this same thing was tried with lxc-tools instead of Docker.

Comment 11 Daniel Walsh 2014-05-29 19:32:31 UTC
So this looks to be more specific to iscsid and namespacing then to docker.  I think you should open the bug with them and see if they can help.

Comment 12 Chris Leech 2014-06-04 23:44:37 UTC
It looks like the iSCSI netlink control interface isn't namespace aware, and the kernel side of the iSCSI initiator rejects messages that don't come from the default network namespace.  I suppose it might make sense to track active sessions per netns.

Comment 13 James Slagle 2014-06-05 11:51:06 UTC
*** Bug 1102911 has been marked as a duplicate of this bug. ***

Comment 14 Radek Vykydal 2015-03-30 12:50:41 UTC
FYI I was playing with running iscsid in superprivileged container:
https://github.com/rvykydal/dockerfile-iscsid/tree/master/rhel7

Comment 15 Chris Leech 2015-04-23 17:30:38 UTC
I've spent some more time looking at running iscsid in a network namespace container, and there are a number of kernel issues that need to be worked out.

The kernel side of the iSCSI netlink family only listens in the initial network namespace (just like some other storage related netlink families).  It's easy enough to add per-namespace kernel sockets, a bit more work to associate network namespaces to iSCSI objects in order to route async event notifications to the right place.

iSCSI makes heavy use of sysfs as well, and many of the iSCSI sysfs devices will need network namespace tags for filtered views of sysfs.  I think that makes sense for iscsi_host, iscsi_session, and iscsi_connection.  Certainly not for iscsi_transport.  Possibly for iscsi_endpoint and iscsi_iface.

Without growing some way to assign an iscsi_host to a net namespace like is done for network devices, this will probably work for dynamically generated hosts (iscsi_tcp) but not for offload hardware.

I can imagine use cases where it might be nice to have multiple iscsid instances in their own containers managing their own set of iSCSI sessions.

I've got some work started, but it's not ready for review and testing just yet.

Without that, I don't see how we can get to multiple working iscsid processes or even a single iscsid running without --net=host

Comment 16 Fedora End Of Life 2015-05-29 11:55:11 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Vaibhav Khanduja 2015-07-09 20:29:41 UTC
(In reply to Chris Leech from comment #15)
> I've spent some more time looking at running iscsid in a network namespace
> container, and there are a number of kernel issues that need to be worked
> out.
> 
> The kernel side of the iSCSI netlink family only listens in the initial
> network namespace (just like some other storage related netlink families). 
> It's easy enough to add per-namespace kernel sockets, a bit more work to
> associate network namespaces to iSCSI objects in order to route async event
> notifications to the right place.
> 
> iSCSI makes heavy use of sysfs as well, and many of the iSCSI sysfs devices
> will need network namespace tags for filtered views of sysfs.  I think that
> makes sense for iscsi_host, iscsi_session, and iscsi_connection.  Certainly
> not for iscsi_transport.  Possibly for iscsi_endpoint and iscsi_iface.
> 
> Without growing some way to assign an iscsi_host to a net namespace like is
> done for network devices, this will probably work for dynamically generated
> hosts (iscsi_tcp) but not for offload hardware.
> 
> I can imagine use cases where it might be nice to have multiple iscsid
> instances in their own containers managing their own set of iSCSI sessions.
> 
> I've got some work started, but it's not ready for review and testing just
> yet.
> 
> Without that, I don't see how we can get to multiple working iscsid
> processes or even a single iscsid running without --net=host

do you have working patch? I wanted to test this out to check the working?

Comment 18 Chris Leech 2015-07-13 18:09:21 UTC
(In reply to Vaibhav Khanduja from comment #17)
> do you have working patch? I wanted to test this out to check the working?

In progress patches sent via email.

Comment 19 paguayo 2016-04-11 18:39:46 UTC
(In reply to Chris Leech from comment #18)
> (In reply to Vaibhav Khanduja from comment #17)
> > do you have working patch? I wanted to test this out to check the working?
> 
> In progress patches sent via email.

Any progress on this patch?

Comment 20 Fedora End Of Life 2016-07-19 11:32:46 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 21 Daniel Walsh 2016-07-19 14:18:03 UTC
Is this something we should continue to care about?

Comment 22 Jan Kurik 2016-07-26 04:08:27 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 23 Ayyanar 2017-02-09 09:47:45 UTC
(In reply to Chris Leech from comment #18)
> (In reply to Vaibhav Khanduja from comment #17)
> > do you have working patch? I wanted to test this out to check the working?
> 
> In progress patches sent via email.

Hi Chris, Do you have these patches ready?

Thanks.

Comment 24 Fedora End Of Life 2017-11-16 18:53:12 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 25 Fedora End Of Life 2017-12-12 10:17:29 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 26 Vasanth 2018-08-21 09:55:23 UTC
Is this issue fixed in any of the recent releases?


Note You need to log in before you can comment on or make changes to this bug.