Bug 917538

Summary: device mapper multipath fails to create 1024 mpaths on s390x
Product: Red Hat Enterprise Linux 7 Reporter: Bruno Goncalves <bgoncalv>
Component: device-mapper-multipathAssignee: Peter Rajnoha <prajnoha>
Status: CLOSED CURRENTRELEASE QA Contact: Bruno Goncalves <bgoncalv>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: agk, bmarzins, harald, heinzm, msnitzer, prajnoha, sauchter, zkabelac
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: s390x   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 13:21:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bruno Goncalves 2013-03-04 10:10:24 UTC
Description of problem:
Trying to login to 1024 LUNs causes the following messages:
ar  4 05:06:04 ibm-z10-32 systemd-udevd[2315]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbfp' [8027]
Mar  4 05:06:06 ibm-z10-32 systemd-udevd[1949]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbfm' [8017]
Mar  4 05:06:06 ibm-z10-32 systemd-udevd[1970]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbgb' [8029]
Mar  4 05:06:07 ibm-z10-32 systemd-udevd[2164]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbgg' [8016]
Mar  4 05:06:07 ibm-z10-32 systemd-udevd[1903]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbft' [8024]
Mar  4 05:06:07 ibm-z10-32 systemd-udevd[1885]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbex' [8018]
Mar  4 05:06:07 ibm-z10-32 systemd-udevd[2315]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbfp' [8027]
Mar  4 05:06:07 ibm-z10-32 systemd-udevd[1887]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbef' [8019]
Mar  4 05:06:07 ibm-z10-32 systemd-udevd[1970]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbgb' [8029]
Mar  4 05:06:07 ibm-z10-32 systemd-udevd[1903]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbft' [8024]
Mar  4 05:06:07 ibm-z10-32 systemd-udevd[1970]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbgb' [8029]
Mar  4 05:06:08 ibm-z10-32 systemd-udevd[1933]: timeout: killing '/sbin/multipath -c /dev/sdbdw' [8023]
Mar  4 05:06:10 ibm-z10-32 systemd-udevd[1933]: timeout: killing '/sbin/multipath -c /dev/sdbdw' [8023]
Mar  4 05:06:12 ibm-z10-32 systemd-udevd[2315]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbfp' [8027]
Mar  4 05:06:13 ibm-z10-32 systemd-udevd[1903]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbft' [8024]
Mar  4 05:06:15 ibm-z10-32 systemd-udevd[1970]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbgb' [8029]
Mar  4 05:06:17 ibm-z10-32 systemd-udevd[1971]: timeout: killing '/sbin/multipath -c /dev/sdbea' [8011]
Mar  4 05:06:18 ibm-z10-32 systemd-udevd[1885]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbex' [8018]
Mar  4 05:06:21 ibm-z10-32 systemd-udevd[1949]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbfm' [8017]
Mar  4 05:06:23 ibm-z10-32 systemd-udevd[1887]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbef' [8019]

The system probably gets very busy and no other command seems to respond.

When device-mapper-multipath is removed all the LUNs login properly.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.9-42.el7.s390x

How reproducible:
100%

Steps to Reproduce:
1.Install RHEL7
2.enable iscsid to support 1024 LUNs
service iscsid stop
Stopping iscsid (via systemctl):  [  OK  ]

modprobe iscsi_tcp max_lun=1024

echo 1024 > /sys/module/scsi_mod/parameters/max_report_luns

3.Discovery target with 1024 LUNs
iscsiadm -m discovery -I default -p <target portal> -t st
  
4. login to target
iscsiadm -m node -l

Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.151753773, portal: 10.16.41.222,3260] (multiple)
Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.151753773, portal: 10.16.43.127,3260] (multiple)
Login to [iface: default, target: iqn.1992-08.com.netapp:sn.151753773, portal: 10.16.41.222,3260] successful.
Login to [iface: default, target: iqn.1992-08.com.netapp:sn.151753773, portal: 10.16.43.127,3260] successful.

Actual results:
systemd seems to try to remove the devices due timeout and server is not able to perform any other command.

Expected results:
1024 mpath devices, with 2 paths in each.

Additional info:
The following message appears when installing device-mapper-multipath
Mar  4 04:51:03 ibm-z10-32 systemd[1]: [/usr/lib/systemd/system/beah-srv.service:4] Failed to add dependency on beah-beaker-backend, ignoring: Invalid argument
Mar  4 04:51:03 ibm-z10-32 systemd[1]: [/usr/lib/systemd/system/beah-srv.service:4] Failed to add dependency on beah-fwd-backend, ignoring: Invalid argument
Mar  4 04:51:03 ibm-z10-32 yum[15464]: Installed: device-mapper-multipath-0.4.9-42.el7.s390x

Comment 2 Bruno Goncalves 2013-03-04 10:37:48 UTC
This Call trace also happened from time to time.

Mar  4 05:40:33 ibm-z10-32 systemd-udevd[1922]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbfh' [8026]
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.685839] INFO: task kworker/0:5:2077 blocked for more than 120 seconds.
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.685867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.685880] kworker/0:5     D 00000000005f5d32     0  2077      2 0x00000200
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.685923]        0000000002f28500 0000000037f79880 0000000002f28570 0000000037f79880 
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.685923]        0000000000174b5a 000000001c47f930 000000001c47f958 0000000037f79880 
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.685923]        0000000002f28570 000000000096a500 000000000096a500 000000000096a500 
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.685923]        000000001c7c48a8 00000000008b9e80 0000000002f28500 0000000037f79838 
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.685923]        00000000006058b8 00000000005f7a56 000000001c47f998 000000001c47faf8 
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686051] Call Trace:
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686057] ([<00000000005f7a56>] __schedule+0x56a/0xab8)
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686074]  [<00000000005f5d32>] schedule_timeout+0x22a/0x2ac
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686084]  [<00000000005f7204>] wait_for_common+0x114/0x190
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686095]  [<0000000000158dde>] kthread_create_on_node+0xb2/0x14c
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686111]  [<000000000014d170>] create_worker+0x12c/0x288
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686124]  [<000000000014fcd4>] manage_workers+0x1c4/0x358
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686219]  [<0000000000150c86>] worker_thread+0x41e/0x460
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686221]  [<0000000000158b46>] kthread+0xda/0xe4
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686224]  [<00000000005f98ce>] kernel_thread_starter+0x6/0xc
Mar  4 05:40:33 ibm-z10-32 kernel: [ 2880.686227]  [<00000000005f98c8>] kernel_thread_starter+0x0/0xc
Mar  4 05:40:34 ibm-z10-32 systemd-udevd[1971]: timeout: killing '/sbin/multipath -c /dev/sdbea' [8011]
Mar  4 05:40:35 ibm-z10-32 systemd-udevd[1970]: timeout: killing 'scsi_id --export --whitelisted -d /dev/sdbgb' [8029]

Comment 3 Peter Rajnoha 2013-03-04 10:42:12 UTC
This looks like another instance of bug #885978, but with scsi_id instead of blkid as it's seen in the other bug report...

Comment 4 Harald Hoyer 2013-03-22 09:34:50 UTC
(In reply to comment #3)
> This looks like another instance of bug #885978, but with scsi_id instead of
> blkid as it's seen in the other bug report...

well, seems like device-mapper-multipath is the culprit

(In reply to comment #0)
> When device-mapper-multipath is removed all the LUNs login properly.

Comment 5 Harald Hoyer 2013-03-25 16:55:38 UTC
This patch might help in systemd-udevd > 198
http://cgit.freedesktop.org/systemd/systemd/commit/?id=8cc3f8c0bcd23bb68166cb197a4c541d7621b19c

Comment 6 Peter Rajnoha 2013-05-06 07:49:21 UTC
Is this still reproducible with systemd > 198?

Comment 7 Bruno Goncalves 2013-05-07 11:35:52 UTC
It seems multipathd is not working properly with latest version:

May  7 11:34:31 ibm-z10-24 systemd[1]: Stopping Device-Mapper Multipath Device Controller...
May  7 11:34:31 ibm-z10-24 multipathd: --------shut down-------
May  7 11:34:31 ibm-z10-24 systemd[1]: Starting Device-Mapper Multipath Device Controller...
May  7 11:34:31 ibm-z10-24 systemd[1]: PID file /var/run/multipathd.pid not readable (yet?) after start.
May  7 11:34:31 ibm-z10-24 systemd[1]: Started Device-Mapper Multipath Device Controller.
May  7 11:34:31 ibm-z10-24 multipathd: DM multipath kernel driver not loaded
May  7 11:34:31 ibm-z10-24 multipathd: path checkers start up

[root@ibm-z10-24 ~]# multipath -l
May 07 11:34:45 | DM multipath kernel driver not loaded
May 07 11:34:45 | DM multipath kernel driver not loaded

[root@ibm-z10-24 ~]# cat /var/run/multipathd.pid
2102

[root@ibm-z10-24 ~]# ps -ef | grep 2102
root      2102     1  0 11:34 ?        00:00:00 /sbin/multipathd


rpm -q device-mapper-multipath
device-mapper-multipath-0.4.9-49.el7.s390x

rpm -q systemd
systemd-202-3.el7.s390x

Comment 8 Bruno Goncalves 2013-05-07 11:46:33 UTC
Loading the kernel module manually solves this problem.

modprobe dm-multipath

Comment 9 Bruno Goncalves 2013-05-07 11:52:36 UTC
The original issue is not reproduced any more on


rpm -q device-mapper-multipath
device-mapper-multipath-0.4.9-49.el7.s390x

rpm -q systemd
systemd-202-3.el7.s390x

Although, should I open a new BZ for the kernel module not being loaded automatically?

Comment 10 Ben Marzinski 2013-05-08 18:59:02 UTC
(In reply to comment #9)
> The original issue is not reproduced any more on
> 
> 
> rpm -q device-mapper-multipath
> device-mapper-multipath-0.4.9-49.el7.s390x
> 
> rpm -q systemd
> systemd-202-3.el7.s390x
> 
> Although, should I open a new BZ for the kernel module not being loaded
> automatically?

Sure. The module issue is multipath's fault.  It checks the version and fails if it's not loaded.  However, the kernel module does autoload when you try to create a multipath device. Or, it should.

With the dm-multipath module unloaded, can you try

# service multipathd start
# multipath -l

multipathd doesn't fail out if the driver isn't loaded, and as soon as it tries to create a multipath device, the module should get loaded correctly.

If that doesn't work, then there's a kernel issue. Otherwise, multipath just needs to load the kernel module when it's run.

Comment 11 Bruno Goncalves 2013-05-09 06:44:05 UTC
(In reply to comment #10)

> With the dm-multipath module unloaded, can you try
> 
> # service multipathd start
> # multipath -l
> 
> multipathd doesn't fail out if the driver isn't loaded, and as soon as it
> tries to create a multipath device, the module should get loaded correctly.
> 
> If that doesn't work, then there's a kernel issue. Otherwise, multipath just
> needs to load the kernel module when it's run.

That was the problem, I tried to run multipath -l after "service multipathd restart". As the server was configured the start multipathd service on boot.

So it seems it is a kernel issue there.

Comment 12 Bruno Goncalves 2013-05-09 07:13:15 UTC
It think this BZ can be closed as the original issue has been fixed.

I've just opened a new BZ#961218 to address dm-multipath module issue.

Comment 13 Ludek Smid 2014-06-13 13:21:48 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.