Bug 1673978

Summary: ReaR unable to create the Rescue ISO when having a lot of multipath devices
Product: Red Hat Enterprise Linux 8 Reporter: Pavel Cahyna <pcahyna>
Component: rearAssignee: Pavel Cahyna <pcahyna>
Status: CLOSED ERRATA QA Contact: David Jež <djez>
Severity: medium Docs Contact:
Priority: urgent    
Version: 8.0CC: djez, fkrska, jwboyer, ovasik, pcahyna, qe-baseos-apps, rmetrich, toneata
Target Milestone: rcKeywords: Patch, Regression, Reproducer, ZStream
Target Release: 8.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rear-2.4-8.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1672938
: 1691303 (view as bug list) Environment:
Last Closed: 2019-11-05 21:04:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1672938, 1681544    
Bug Blocks: 1691303, 1701002    

Comment 1 Pavel Cahyna 2019-02-12 14:35:09 UTC
We are having problems with the reproducer on RHEL8 due to bz1675071. By the way, <rmetrich>, this can affect your customer as well in case the problem occurs with any SCSI driver (I don't know that yet). Basically the problem is that on systems with such a large number of SCSI devices the kernel consumes about a gigabyte of memory.

Comment 2 Renaud Métrich 2019-02-12 14:40:35 UTC
We could try with real SCSI, for example tons of iSCSI luns.

Comment 3 Pavel Cahyna 2019-02-12 14:41:39 UTC
That has been my idea as well, but I need to work on other stuff now....

Comment 4 Renaud Métrich 2019-02-12 14:46:11 UTC
Not promising anything, but will try to setup something.

Comment 5 Pavel Cahyna 2019-02-12 14:48:21 UTC
That would be great. My idea was to setup one machine with lots of scsi_debug LUNs (with enough RAM, using the workaround or using RHEL 7 to not suffer from the bug), exporting them via iSCSI target, and let another machine connect to it using a iSCSI initiator.

Comment 6 Renaud Métrich 2019-02-13 13:22:49 UTC
I cannot reproduce the old good slowness with iscsi, but still we can see a difference.

Server setup:
------------

# yum -y install targetcli

# cat iscsi.sh

#!/bin/bash

(
for vol in $(seq 1 100); do
	echo "backstores/fileio create file_or_dev=/root/iscsi/vol${vol}.img size=1M name=vol${vol}"
done

for tgt in $(seq 1 4); do
	echo "iscsi/ create iqn.2019-02.com.bz1673978:tgt${tgt}"
	echo "iscsi/iqn.2019-02.com.bz1673978:tgt${tgt}/tpg1/acls create iqn.2019-02.com.bz1673978:client"
	for vol in $(seq 1 100); do
		echo "iscsi/iqn.2019-02.com.bz1673978:tgt${tgt}/tpg1/luns create /backstores/fileio/vol${vol}"
	done
done
) | targetcli

# chmod +x iscsi.sh; ./iscsi.sh

--> creates 100 luns and 4 targets

# systemctl start target


Client setup:
------------

# yum -y install iscsi-initiator-utils device-mapper-multipath

# echo "InitiatorName=iqn.2019-02.com.bz1673978:client" > /etc/iscsi/initiatorname.iscsi
# iscsiadm -m discovery -t st -p <server_ipaddr>
# for i in $(seq 1 4); do iscsiadm -m node -T iqn.2019-02.com.bz1673978:tgt$i -p <server_ipaddr> -l; done



WITHOUT THE FIX:

# time rear mkrescue

real	4m14.741s
user	1m56.594s
sys	1m57.890s


WITH THE FIX:

# time rear mkrescue

real	1m38.588s
user	0m50.723s
sys	0m31.869s

Comment 7 Pavel Cahyna 2019-02-13 14:14:14 UTC
I suspect you are not actually using multipath. How many /dev/dm-* devices do you have? Use 

multipath -l 

to verify.

In my test case, I am enabling multipath with

mpathconf --enable --with_multipathd y

I don't think that installing device-mapper-multipath was enough.

However, the question regarding iscsi was mainly about the memory consumption of such a large number of iSCSI devices at the initiator - whether you are also seeing this memory consumption on the order of GBs like we do with the scsi_debug driver.

Comment 8 Renaud Métrich 2019-02-13 14:17:40 UTC
Multipath is activated on my system (I used a VM already with multipath disks).
And I don't see any memory impact.

Comment 9 Pavel Cahyna 2019-02-13 14:50:02 UTC
> creates 100 luns and 4 targets

The number of paths per LUN is equal to the number of targets, right? i.e. 4 paths per LUN.

I suspect that 100 luns is not enough to see the problem, the old code IIUC has a quadratic complexity in the number of devices. So my test which uses 1200 devices in total (400 LUNs x 3 paths per LUN) should be 9x slower than your case, if it uses 400 devices.

> And I don't see any memory impact.

Is that on RHEL8?

Comment 32 errata-xmlrpc 2019-11-05 21:04:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3413