Bug 1561162

Summary: [RHEL7.5] Extreme performance impact caused by raid resync
Product: Red Hat Enterprise Linux 7 Reporter: John Pittman <jpittman>
Component: kernelAssignee: Nigel Croxon <ncroxon>
kernel sub component: Multiple Devices (MD) QA Contact: guazhang <guazhang>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: alex.wang, bhu, brdeoliv, bubrown, cmarthal, deepak.prasad, dhoward, guazhang, heinzm, jbrassow, jfeeney, jkrysl, jmagrini, jpittman, jshortt, jstodola, loberman, mbanas, mpoole, ncroxon, pasik, salmy, tumeya, xni
Version: 7.4Keywords: Regression, ZStream
Target Milestone: rc   
Target Release: 7.6   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.0-951.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1710345 (view as bug list) Environment:
Last Closed: 2018-10-30 08:52:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1532680, 1710345    
Attachments:
Description Flags
Proposed Patch
none
Proposed Patch none

Description John Pittman 2018-03-27 19:05:31 UTC
Description of problem:

This issue is a continuation of bz1455679.  Through continued research, it was found that the proper fix here is to revert the patch that causes the issue, ac8fa4196d20 ("md: allow resync to go faster when there is competing IO.).  

Version-Release number of selected component (if applicable):

kernel-3.10.0-693.17.1.el7

How reproducible:

Issue large amount of I/O during raid resync.  Easiest way to reproduce is to install to a md device that is resync'ing and watch partition creation & filesystem formatting time.

Prior to patch revert

19:58:39,249 INFO anaconda: Creating disklabel on /dev/sdb
....snip
20:23:30,488 INFO anaconda: Created swap on /dev/md/3
~ 25mins

After patch revert:

16:45:17,990 INFO anaconda: Creating disklabel on /dev/sdb
....snip
16:45:36,935 INFO anaconda: Created swap on /dev/md/3
~ 20secs

Expected results:

No detectable impact on I/O while raid resync is running.

Additional info:

Upstream discussion at https://marc.info/?l=linux-raid&m=152120839121813&w=2

Comment 6 John Pittman 2018-03-31 00:38:08 UTC
Second attempt to get fix accepted upstream.
https://marc.info/?l=linux-raid&m=152235148327824&w=2

If they don't accept, maybe we can get something temporary into the RH stream.

Comment 10 guazhang@redhat.com 2018-05-29 02:09:08 UTC
Hello

what's performance you want to get while raid resync ?
or what's the baseline time for the installing ?

Comment 36 guazhang@redhat.com 2018-08-29 03:26:53 UTC
Hello


# cp test kernel to the dir   /var/www/html/guazhang/repo/test_kernel
# createrepo -u -o -d /var/www/html/guazhang/repo/
#  setenforce 0
#  lorax --product="RHEL" --version=7.6 --release=7.6 --source=http://download.lab.bos.redhat.com/rel-eng/RHEL-7.6-20180810.0/compose/Server/x86_64/os/ --source=http://download.lab.bos.redhat.com/rel-eng/RHEL-7.6-20180810.0/compose/Server-optional/x86_64/os/ --source http://pnate-control-01.lab.bos.redhat.com/guazhang/repo  --variant=Server --nomacboot --buildarch=x86_64 --volid=RHEL-7.6_Server.x86_64 ./updated_kernel




ks.cfg

lang en_US.UTF-8
keyboard us
url --url="http://pnate-control-01.lab.bos.redhat.com/guazhang/repo"
firewall --disabled
firstboot --disable
rootpw redhat
timezone Europe/Prague
reboot
clearpart --all
timezone America/New_York
ignoredisk --only-use=sda
bootloader --append="loglevel=5 crashkernel=auto" --location=mbr
zerombr
clearpart --all --initlabel
part raid.225 --ondisk=sda --size=512
part raid.231 --ondisk=sdb --size=512
part raid.261 --ondisk=sda --size=12288
part raid.255 --ondisk=sdb --size=12288
part raid.249 --ondisk=sda --size=12288
part raid.267 --ondisk=sdb --size=12288
part raid.237 --ondisk=sda --size=205824
part raid.243 --ondisk=sdb --size=205824
raid pv.285 --device=2 --fstype="lvmpv" --level=RAID1 raid.237 raid.243
raid / --device=1 --fstype="ext4" --level=RAID1 --mkfsoptions="-E nodiscard" raid.249 raid.255
raid /boot --device=0 --fstype="ext4" --level=RAID1 --mkfsoptions="-E nodiscard" raid.225 raid.231
raid swap --device=3 --fstype="swap" --level=RAID1 --mkfsoptions="-E nodiscard" raid.261 raid.267
volgroup vg00 --pesize=65536 pv.285
logvol /var/log  --fstype="ext4" --grow --size=1 --mkfsoptions="-E nodiscard" --name=log --vgname=vg00
logvol /usr  --fstype="ext4" --size=30000 --mkfsoptions="-E nodiscard" --name=usr --vgname=vg00
logvol /tmp  --fstype="ext4" --size=30000 --mkfsoptions="-E nodiscard" --name=tmp --vgname=vg00
logvol /opt  --fstype="ext4" --size=30000 --mkfsoptions="-E nodiscard" --name=opt --vgname=vg00
logvol /var  --fstype="ext4" --size=30000 --mkfsoptions="-E nodiscard" --name=var --vgname=vg00
logvol /home  --fstype="ext4" --size=30000 --mkfsoptions="-E nodiscard" --name=home --vgname=vg00
logvol /opt/mgtservices  --fstype="ext4" --size=30000 --mkfsoptions="-E nodiscard" --name=mgtservices --vgname=vg00


get errors
      
Traceback (most recent call last): 
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 227, in run 
    threading.Thread.run(self, *args, **kwargs) 
Please make your choice from above:   File "/usr/lib64/python2.7/threading.py", line 765, in run 
    self.__target(*self.__args, **self.__kwargs) 
  File "/usr/lib64/python2.7/site-packages/pyanaconda/ui/tui/spokes/software.py", line 238, in checkSoftwareSelection 
    self.payload.checkSoftwareSelection() 
  File "/usr/lib64/python2.7/site-packages/pyanaconda/packaging/yumpayload.py", line 1418, in checkSoftwareSelection 
    self._applyYumSelections() 
  File "/usr/lib64/python2.7/site-packages/pyanaconda/packaging/yumpayload.py", line 1360, in _applyYumSelections 
    self._selectYumGroup("core") 
  File "/usr/lib64/python2.7/site-packages/pyanaconda/packaging/yumpayload.py", line 1247, in _selectYumGroup 
    raise NoSuchGroup(groupid, required=required) 
pyanaconda.packaging.NoSuchGroup: core 
      
beaker job 
https://beaker.engineering.redhat.com/jobs/2728229

I just have a test the ISO with beaker server, but it seem the ISO build failed, Could someone have a look the errors ?

thanks
Guazhang

Comment 37 guazhang@redhat.com 2018-08-29 10:51:53 UTC
Hello

I want to know how to replace the default kernel with test kernel in DVD.iso , then boot with test kernel for testing. 

Could someone have detail commands or guide ?

thanks
Guazhang

Comment 38 Nigel Croxon 2018-08-29 13:07:53 UTC
Created attachment 1479489 [details]
Proposed Patch

Comment 40 Nigel Croxon 2018-08-29 14:28:32 UTC
Created attachment 1479511 [details]
Proposed Patch

Comment 41 Bruno Meneguele 2018-09-07 15:16:45 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 43 Bruno Meneguele 2018-09-10 14:27:30 UTC
Patch(es) available on kernel-3.10.0-945.el7

Comment 45 Nigel Croxon 2018-09-11 18:23:21 UTC
Nacking Heinz's patch for https://bugzilla.redhat.com/show_bug.cgi?id=1627563

I am resubmitting this patch, that broke LVM.

-Nigel

Comment 46 Nigel Croxon 2018-09-11 19:48:26 UTC
Re-submitting the patch to rh kernel list now.
And moving this bz back to POST.

-Nigel

Comment 48 Jonathan Earl Brassow 2018-09-12 13:33:37 UTC
Wait.  How did this bug go from POST -> ON_QA?  And why hasn't the 'Fixed In Version' changed?

I'm moving this back to POST.

Comment 49 Heinz Mauelshagen 2018-09-12 13:37:06 UTC
*** Bug 1627563 has been marked as a duplicate of this bug. ***

Comment 52 XiaoNi 2018-09-17 10:35:30 UTC
*** Bug 1628499 has been marked as a duplicate of this bug. ***

Comment 53 Bruno Meneguele 2018-09-17 22:50:51 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 55 guazhang@redhat.com 2018-09-19 08:00:19 UTC
Hello

Could someone help to check the bug if can move to "ON_QA" status so that QE will verify it with fix kernel ?

thanks
Guazhang

Comment 58 Bruno Meneguele 2018-09-19 12:55:11 UTC
Patch(es) available on kernel-3.10.0-951.el7

Comment 61 guazhang@redhat.com 2018-10-08 01:29:23 UTC
Hello

the bug has test passed, move to verified

Comment 63 errata-xmlrpc 2018-10-30 08:52:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3083