648885 – LVM2 on Multipath SAN LUN : bio too big device dm-5 results in file corruption

Bug 648885 - LVM2 on Multipath SAN LUN : bio too big device dm-5 results in file corruption

Summary: LVM2 on Multipath SAN LUN : bio too big device dm-5 results in file corruption

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.6
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	LVM and device-mapper development team
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	743405 760237
TreeView+	depends on / blocked

Reported:	2010-11-02 12:50 UTC by Andre ten Bohmer
Modified:	2018-11-29 19:48 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-10-24 16:22:55 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Andre ten Bohmer 2010-11-02 12:50:08 UTC

Description of problem:
Adding a new lun resulted in file corruption on already mounted and active lun's on an Oracle production server.

Version-Release number of selected component (if applicable):
OS : Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Kernel : 2.6.18-194.11.4.el5
Hardware : HP ProLiant BL460c G1
HBA : QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA
Filesystem : ext3


How reproducible:
Not tested (yet)

Steps to Reproduce:
]# multipath -ll
- rescan hba's to detect new presented lun
]# echo "- - -" > /sys/class/scsi_host/host0/scan
]# echo "- - -" > /sys/class/scsi_host/host1/scan
]# multipath -ll
- adding alias for new lun
]# vi /etc/multipath.conf
]# multipath -r
- bio log events start occuring:
2010-09-30 08:39:14     0       4       10.x.x.x       bio too big device dm-5 (176 > 64)     kernel:
2010-09-30 08:39:14     0       4       10.x.x.x       bio too big device dm-5 (72 > 64)      kernel:
2010-09-30 08:39:14     3       3       10.x.x.x       dm-6: devmap already registered        multipathd:
2010-09-30 08:39:14     3       5       10.x.x.x       dm-6: add map (uevent)         multipathd:
2010-09-30 08:39:14     3       3       10.x.x.x       dm-5: devmap already registered        multipathd:
2010-09-30 08:39:14     3       5       10.x.x.x       dm-5: add map (uevent)         multipathd:
2010-09-30 08:39:14     3       3       10.x.x.x       dm-4: devmap already registered        multipathd:
2010-09-30 08:39:14     3       5       10.x.x.x       dm-4: add map (uevent)         multipathd:
2010-09-30 08:39:14     0       4       10.x.x.x       bio too big device dm-0 (1024 > 64)    kernel:
2010-09-30 08:39:14     0       4       10.x.x.x       bio too big device dm-0 (1024 > 64)    kernel:

  
Actual results:
File corruption on already active and mounted lun's

Expected results:
Solid operation and no file corruption. This procedure has been followed many times on other servers without problems. Is this procedure oke, or are we very unlucky to run into a bug which seldom occurs?

Additional info:
Server is a boot from SAN setup with several big data lun's presented. bio events where reported for each lun including the OS lun (dm-0) but file corruption seemed to have occurred only on dm-5, one of the Oracle data lun's. Oracle uses 3 big ext3 filesystems without ASM.

Info on the dm-5 device:
/dev/dm-5 hosts LVM2 volumegroup vg_data004, size 1.95T (ext3)
LVM2 pv - vg - lv are all on this lun

Multipath info on dm-5 :
data004 (3600508b40010889b000090000c650000) dm-5 HP,HSV210
[size=2.0T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=100][active]
 \_ 0:0:3:13 sdl 8:176  [active][ready]
 \_ 1:0:3:13 sdx 65:112 [active][ready]
\_ round-robin 0 [prio=20][enabled]
 \_ 0:0:2:13 sdf 8:80   [active][ready]
 \_ 1:0:2:13 sdr 65:16  [active][ready]

Comment 1 Parag Nemade 2011-02-01 07:16:13 UTC

How can above bug be related to m17n-db?

Comment 2 Parag Nemade 2011-02-01 09:09:43 UTC

Moving this to kernel people in case they can help here.

Comment 3 Andre ten Bohmer 2011-02-03 12:39:52 UTC

The same server yesterday (Februari 2nd), now running Red Hat Enterprise Linux Server release 5.6 Kernel 2.6.18-238.el5 x86_64

Via Qlogic SanSurfer software (Remote Agent for Linux Version 1.00.1280-19) a HP technical consultant modified the following parameter:

SanSurfer : "HBA -> Port -> Parameters -> Advanced HBA Parameters -> Execution Throttle" changed from 16 to 32 "-> Save"

This server has one HBA with 2 ports. After modifying the first port, the following showed up in /var/log/messages:


Feb  2 15:34:52 scomp1037 kernel: qla2xxx 0000:10:00.0: Performing ISP error recovery - ha= ffff81082d5bc4f8.
Feb  2 15:34:53 scomp1037 kernel:  rport-0:0-0: blocked FC remote port time out: saving binding
Feb  2 15:34:53 scomp1037 kernel:  rport-0:0-1: blocked FC remote port time out: saving binding
Feb  2 15:34:53 scomp1037 kernel:  rport-0:0-2: blocked FC remote port time out: saving binding
Feb  2 15:34:53 scomp1037 kernel:  rport-0:0-3: blocked FC remote port time out: saving binding
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:1: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdh, sector 11776429
Feb  2 15:34:53 scomp1037 kernel: device-mapper: multipath: Failing path 8:112.
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:1: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdh, sector 11825493
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:1: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdh, sector 11923829
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:1: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 multipathd: dm-2: add map (uevent) 
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdh, sector 21410149
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:1: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdh, sector 31895325
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:10: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdj, sector 1002495848
Feb  2 15:34:53 scomp1037 kernel: device-mapper: multipath: Failing path 8:144.
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:10: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdj, sector 6730900
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 1431896872
Feb  2 15:34:53 scomp1037 kernel: device-mapper: multipath: Failing path 8:160.
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2042291448
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2042291704
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2042292184
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2042292504
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2042292792
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2042294040
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2599415176
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2599425888
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 2599576024
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:11: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdk, sector 3061938608
Feb  2 15:34:53 scomp1037 kernel: sd 0:0:3:12: SCSI error: return code = 0x00010000
Feb  2 15:34:53 scomp1037 kernel: end_request: I/O error, dev sdl, sector 1403427096
Feb  2 15:34:53 scomp1037 kernel: device-mapper: multipath: Failing path 8:176.
Feb  2 15:34:53 scomp1037 kernel: qla2xxx 0000:10:00.0: LOOP UP detected (4 Gbps).
Feb  2 15:34:58 scomp1037 multipathd: dm-2: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: dm-4: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-4: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: 8:112: mark as failed 
Feb  2 15:34:58 scomp1037 multipathd: system: remaining active paths: 3 
Feb  2 15:34:58 scomp1037 multipathd: dm-5: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-5: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: 8:144: mark as failed 
Feb  2 15:34:58 scomp1037 multipathd: datanew1: remaining active paths: 3 
Feb  2 15:34:58 scomp1037 multipathd: 8:160: mark as failed 
Feb  2 15:34:58 scomp1037 multipathd: data002: remaining active paths: 3 
Feb  2 15:34:58 scomp1037 multipathd: 8:176: mark as failed 
Feb  2 15:34:58 scomp1037 multipathd: data003: remaining active paths: 3 
Feb  2 15:34:58 scomp1037 multipathd: dm-6: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-6: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: sdh: tur checker reports path is up 
Feb  2 15:34:58 scomp1037 multipathd: 8:112: reinstated 
Feb  2 15:34:58 scomp1037 multipathd: system: remaining active paths: 4 
Feb  2 15:34:58 scomp1037 multipathd: /sbin/mpath_prio_alua exitted with 5 
Feb  2 15:34:58 scomp1037 multipathd: error calling out /sbin/mpath_prio_alua /dev/sda 
Feb  2 15:34:58 scomp1037 multipathd: system: load table [0 52428800 multipath 0 0 3 2 round-robin 0 1 1 8:0 100 round-robin 0 2 1 8:112 100 65:80 100 round-robin 0 
Feb  2 15:34:58 scomp1037 multipathd: sdj: tur checker reports path is up 
Feb  2 15:34:58 scomp1037 multipathd: 8:144: reinstated 
Feb  2 15:34:58 scomp1037 multipathd: datanew1: remaining active paths: 4 
Feb  2 15:34:58 scomp1037 multipathd: /sbin/mpath_prio_alua exitted with 5 
Feb  2 15:34:58 scomp1037 multipathd: error calling out /sbin/mpath_prio_alua /dev/sdc 
Feb  2 15:34:58 scomp1037 multipathd: datanew1: load table [0 1048576000 multipath 0 0 3 2 round-robin 0 1 1 8:32 100 round-robin 0 2 1 8:144 100 65:112 100 round-ro
Feb  2 15:34:58 scomp1037 multipathd: sdk: tur checker reports path is up 
Feb  2 15:34:58 scomp1037 multipathd: 8:160: reinstated 
Feb  2 15:34:58 scomp1037 multipathd: data002: remaining active paths: 4 
Feb  2 15:34:58 scomp1037 multipathd: /sbin/mpath_prio_alua exitted with 5 
Feb  2 15:34:58 scomp1037 multipathd: error calling out /sbin/mpath_prio_alua /dev/sdd 
Feb  2 15:34:58 scomp1037 multipathd: data002: load table [0 4194304000 multipath 0 0 3 2 round-robin 0 1 1 8:48 100 round-robin 0 2 1 8:160 100 65:128 100 round-rob
Feb  2 15:34:58 scomp1037 multipathd: sdl: tur checker reports path is up 
Feb  2 15:34:58 scomp1037 multipathd: 8:176: reinstated 
Feb  2 15:34:58 scomp1037 multipathd: data003: remaining active paths: 4 
Feb  2 15:34:58 scomp1037 multipathd: /sbin/mpath_prio_alua exitted with 5 
Feb  2 15:34:58 scomp1037 multipathd: error calling out /sbin/mpath_prio_alua /dev/sde 
Feb  2 15:34:58 scomp1037 multipathd: data003: load table [0 4194304000 multipath 0 0 3 2 round-robin 0 1 1 8:64 100 round-robin 0 2 1 8:176 100 65:144 100 round-rob
Feb  2 15:34:58 scomp1037 multipathd: dm-2: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-2: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: dm-2: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-2: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: dm-4: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-4: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: dm-4: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-4: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: dm-5: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-5: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: dm-5: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-5: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: dm-6: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-6: devmap already registered 
Feb  2 15:34:58 scomp1037 multipathd: dm-6: add map (uevent) 
Feb  2 15:34:58 scomp1037 multipathd: dm-6: devmap already registered 

- This line is from a Oracle log
2011-02-02 15:34:59        IO errors.... instance terminated

Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (518 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-6 (256 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-6 (256 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (88 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (256 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (192 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-6 (224 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-6 (96 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-6 (256 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-6 (256 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (88 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (208 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (128 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (88 > 64)
Feb  2 15:34:59 scomp1037 kernel: bio too big device dm-4 (96 > 64)

A server reboot was necessary.

Comment 4 Andre ten Bohmer 2011-02-09 10:35:06 UTC

"2011-02-03 07:39:52 EST" is reproduce-able , w've cloned the system disk, presented it on an other server (same type of blade ) , fully updated the system via RHN :

kernel :2.6.18-238.1.1.el5

]# rpm -qfi /etc/init.d/multipathd
Name        : device-mapper-multipath      Relocations: (not relocatable)
Version     : 0.4.7                             Vendor: Red Hat, Inc.
Release     : 42.el5                        Build Date: Tue 07 Dec 2010 07:44:21 PM CET


]# modinfo qla2xxx
filename:       /lib/modules/2.6.18-238.1.1.el5/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
version:        8.03.01.05.05.06-k
license:        GPL
description:    QLogic Fibre Channel HBA Driver

Same procedure to modify "Execution Throttle" and BIO messages appeared.

Next step is to install HP qlogic modules and see if this solves this problem.

Comment 5 Andre ten Bohmer 2011-03-15 16:08:15 UTC

Maybe it's related to Oracle itself. As soon as Oracle performs an IO request (not sure yet if it's only write), the bio messages start to appear. Creating, deleting files on the console seems not to trigger any errors on the same partition/lun.

What is also interesting, is that Oracle seems to ignore setting the max_sectors_kb Linux kernel parameter. We use a script to adjust this parameter in order to be able to reduce IO load on the HP EVA's SAN systems. A HP san/IO expert noticed that even when we shrink the parameter, the OS honors this but Oracle still bypasses the OS sending it's own blocksize ?!

]# cat modparam.sh
#!/bin/bash
# default 512
maxsize=32
case "$1" in
"-b" | "-B" )
	echo "Setting max_sectors_kb to $maxsize ..."
	for i in `ls -a -d /sys/block/sd*` ; do echo "$i/queue/max_sectors_kb" ; echo $maxsize > $i/queue/max_sectors_kb ; done
	;;
* )
	for i in `ls -a -d /sys/block/sd*` ; do echo "$i/queue/scheduler"; cat $i/queue/scheduler ; done
	for i in `ls -a -d /sys/block/sd*` ; do echo "$i/queue/max_sectors_kb"; cat $i/queue/max_sectors_kb ; done
	;;
esac

Comment 6 Andre ten Bohmer 2011-06-01 15:44:28 UTC

Today again a bio error:
Jun  1 17:00:26 scomp1037 multipathd: system: load table [0 52428800 multipath 0 0 2 1 round-robin 0 2 1 8:112 100 65:80 100 round-robin 0 2 1 8:0 100 8:224 100] 
Jun  1 17:00:26 scomp1037 multipathd: dm-2: add map (uevent) 
Jun  1 17:00:26 scomp1037 multipathd: dm-2: devmap already registered 
Jun  1 17:01:16 scomp1037 multipathd: data004: load table [0 4194304000 multipath 0 0 2 1 round-robin 0 2 1 65:160 100 8:192 100 round-robin 0 2 1 8:80 100 65:48 100
Jun  1 17:01:16 scomp1037 multipathd: samba: load table [0 1048576000 multipath 0 0 2 1 round-robin 0 2 1 8:128 100 65:96 100 round-robin 0 2 1 8:16 100 8:240 100] 
Jun  1 17:01:16 scomp1037 multipathd: data002: load table [0 4194304000 multipath 0 0 2 1 round-robin 0 2 1 8:160 100 65:128 100 round-robin 0 2 1 8:48 100 65:16 100
Jun  1 17:01:16 scomp1037 multipathd: dm-0: add map (uevent) 
Jun  1 17:01:16 scomp1037 multipathd: dm-0: devmap already registered 
Jun  1 17:01:16 scomp1037 multipathd: dm-3: add map (uevent) 
Jun  1 17:01:16 scomp1037 multipathd: dm-3: devmap already registered 
Jun  1 17:01:16 scomp1037 multipathd: dm-5: add map (uevent) 
Jun  1 17:01:16 scomp1037 multipathd: dm-5: devmap already registered 
Jun  1 17:05:35 scomp1037 kernel: bio too big device dm-5 (288 > 256)

]# uname -a
Linux scomp1037.wurnet.nl 2.6.18-238.9.1.el5 #1 SMP Fri Mar 18 12:42:39 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

Could priority be raised to high please?

Comment 8 James Hofmeister 2011-08-04 22:53:40 UTC

Another case of this seen on production server with ext3 file system used by Oracle.

2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
HP ProLiant DL380 G6
QLogic Fibre Channel HBA Driver: 8.03.01.05.05.06-k
device-mapper-multipath-0.4.7-42.el5_6.1.x86_64
Ext3 file system built on lvm volume group.

Jul 21 08:01:13 host kernel: bio too big device dm-9 (957 > 256)
Jul 21 08:01:14 host kernel: bio too big device dm-10 (1017 > 256)
<< MANY LOGGED >>

Comment 9 ajeesh.c 2011-09-02 04:34:34 UTC

I am seeing this error on our rhel5u6 .

Sep  1 17:40:13 host  kernel: bio too big device dm-8 (416 > 256)
Sep  1 17:40:13 host kernel: bio too big device dm-8 (512 > 256)
Sep  1 17:40:13 host  kernel: bio too big device dm-8 (512 > 256)
Sep  1 17:40:13 host kernel: bio too big device dm-8 (416 > 256)
Sep  1 17:40:13 host  kernel: bio too big device dm-8 (512 > 256)

Aug 31 21:43:31 host multipathd: /sbin/mpath_prio_alua exitted with 1
Aug 31 21:43:31 host multipathd: error calling out /sbin/mpath_prio_alua /dev/sdu
Aug 31 21:43:31 host multipathd: dm-12: devmap already registered
Aug 31 20:11:33 host multipathd: /sbin/mpath_prio_alua exitted with 1
Aug 31 20:11:33 host multipathd: error calling out /sbin/mpath_prio_alua /dev/sdq
Aug 31 20:11:33 host multipathd: dm-8: devmap already registered


Kernel 2.6.18-238.9.1.el5

ext3 file system with LVM striping .

Mainly seeing this error up on Read.

We are also setting the max_sectors_kb 128 as per EVA best practice .


Storage eva6400 with full 8 path and a total of 10 LUN .

Could you please advice on the resolution of this error?

Comment 10 Andre ten Bohmer 2011-09-02 09:42:36 UTC

Well maybe time someone from HP contacts RedHat direct about this issue please? I've filed this bug almost a year ago without any response from RedHat so far.

Comment 11 Alasdair Kergon 2011-09-02 11:30:11 UTC

If the underlying device limits are being made more restrictive due to something you do externally, linux has no mechanism to propagate those changes up the device stack automatically and you must update them yourself.  Best not to try making such changes while I/O is flowing through the stack.  If you can't manage to set the lower limits manually (through /sys perhaps) in advance of the external change, then for the dm layers of the stack (including multipath), reload the tables and suspend them, make the external change, then resume them.

Comment 12 Alasdair Kergon 2011-09-02 11:33:43 UTC

(In device-mapper, a 'resume' that follows a 'reload' will reset the limits based on the current state of the devices directly referenced in the table.)

Comment 13 Andre ten Bohmer 2011-09-02 12:25:44 UTC

(In reply to comment #11)
> If the underlying device limits are being made more restrictive due to
> something you do externally, linux has no mechanism to propagate those changes
> up the device stack automatically and you must update them yourself.  Best not
> to try making such changes while I/O is flowing through the stack.  If you
> can't manage to set the lower limits manually (through /sys perhaps) in advance
> of the external change, then for the dm layers of the stack (including
> multipath), reload the tables and suspend them, make the external change, then
> resume them.

Thanks for you time! We do not make any changes externally like on the EVA or such. We use  /sys/block/sd* max_sectors_kb to change this parameter (see my previous post 2011-03-15 12:08:15 EDT) , but probably this is to low level or high level with regards to multipath to be propagated correct?

Comment 14 Alasdair Kergon 2011-09-02 12:39:00 UTC

If you change it at one level of the stack that change does NOT propagate automatically - a dm multipath layer above will only notice the change if you follow it with a dm reload+resume at that layer.

You'll need to set it at *every* level of the stack the same way and to avoid problems with I/O in flight you should make the changes top-down not bottom up.

Comment 15 Alasdair Kergon 2011-09-02 12:47:39 UTC

BTW I intend to close this bugzilla soon as WONT_FIX.  In the long term (years rather than months), I hope we can find a way to make changes propagate automatically, but that's a matter for upstream work and it is not easy (or it would already have been done).

In the mean time, there are manual (or scriptable) workarounds as I indicated above provided you know in advance that the change is about to occur.

Comment 16 Alasdair Kergon 2012-11-15 11:33:14 UTC

A question we could investigate: whether we could have an option to recalculate the limits on (some?) resumes that don't follow a reload.

Note You need to log in before you can comment on or make changes to this bug.