Bug 189442 - [Symantec RHEL4.5 bug] AVT ping-pong at boot time leads to long boot of the host (40min+)
Summary: [Symantec RHEL4.5 bug] AVT ping-pong at boot time leads to long boot of the h...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: lvm2
Version: 4.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Heinz Mauelshagen
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 176344
TreeView+ depends on / blocked
 
Reported: 2006-04-19 23:46 UTC by Shailendra Hebsur
Modified: 2019-01-31 13:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-09-12 15:34:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
'messages' file from the RHEL4u2 box (360.69 KB, text/plain)
2006-04-19 23:46 UTC, Shailendra Hebsur
no flags Details

Description Shailendra Hebsur 2006-04-19 23:46:47 UTC
Description of problem:

What is AVT incase of Engenio storage array type? 

A feature of the controller firmware that helps to manage each volume in a 
storage array. When you use AVT with a multi-path driver, AVT helps to make 
sure that an I/O data path always is available for the volumes in the storage 
array.

Procedure to duplicate the issue: 

1) It's a 1x1 setup with 1 host connected to 1 array through Fabric
2) Install Storage Foundation-version:4.1 of Veritas 
3) Run all the DMP settings on the array which are as below,

set controller [a] HostNVSRAMByte[6,0x24]=0x01,0x01; // enable AVT
set controller [a] HostNVSRAMByte[6,0x27]=0x18,0x18; // AVT_EXCLUSION_EXTENT
set controller [b] HostNVSRAMByte[6,0x24]=0x01,0x01; // enable AVT
set controller [b] HostNVSRAMByte[6,0x27]=0x18,0x18; // AVT_EXCLUSION_EXTENT

4) Map 32 LUNs to host from array
5) Reboot the host
6) The host takes more than 40 minutes to boot back up
7) The long boot is because the lvm is issuing READS to all the discovered SCSI 
devices at sector/LBA 2088832(0x1fdf80) which is triggering AVT. When AVT is 
triggered, it moves the LUN to the path where the READ is coming

For instance, 

a) The host has a dual channel (Qlogic/emulex) Fibre HBA. T
b) The first port of the Fibre HBA is connected to ControllerA and second port 
is connected to ControllerB through Fabric
c) ControllerA owns LUN0/sdb (preferred pathA) and ControllerB owns LUN1/sdc 
(preferred pathB). The host has an internal SCSI hard disk and so sda has been 
assigned to it
d) During the boot time, the lvm is issuing READ to sdb and sdc. Since, the 
READ command is going on pathA to sdc, it is going to trigger AVT and the AVT 
is going to move that LUN1 to pathA to give a good status back to the initiator
e) It happens same when LVM issues READ to sdb on pathB and AVT will be 
triggered
f) Because of the above thrashing during the boot time, there is a delay in the 
boot and it takes about 40+ minutes and it is entirely based on the number of 
LUNs mapped to the host. The higher the number of LUNs, longer is the boot time


8) The Engenio storage has a feature which is "AVT Exclusion extent" which 
specifies that if the host/initiator issues READ to the first and last 
8192sector of the disk, it will not trigger AVT
8) The array is configured with volume size of 1GB 
9) The volume size is 1GB, the sector range should be < 8192 at the beginning 
of the disk and > 2088960 sector of the end of the disk
10) Since the host is issuing READ at sector 2088832 (1FDF80) which is not 
falling in the specified range, it is triggering AVT which in turn leads to the 
long boot time of the host
11) The above mentioned LBA offset is also verified from the FC trace
12) The issue is happening on RHEL4u2



Version-Release number of selected component (if applicable):

1) RHEL4u2 is being used in the configuration
2) Qlogic HBA driver-version: 8.01.03


How reproducible:

-- Always

Steps to Reproduce:

1. Connect a host(Dell PowerEdgeServer 2650/any) to an Engenio Storage array 
through Fabric (It's fine even if there is Fibre switch)
2. Install Storage Foundation-version:4.1 of Veritas w/ any Maintanence Packs
3. Even if there is no Veritas also, the issue should still happen
4. Run all the DMP settings on the array which are as below,

set controller [a] HostNVSRAMByte[6,0x1a]=0x01,0x01; // return 12 bytes of WWN 
in inquiry
set controller [a] HostNVSRAMByte[6,0x23]=0x01,0x01; // enable "report 
preferred path"
set controller [a] HostNVSRAMByte[6,0x24]=0x01,0x01; // enable AVT
set controller [a] HostNVSRAMByte[6,0x25]=0x80,0x80; // enable DMP support
set controller [a] HostNVSRAMByte[6,0x27]=0x18,0x18; // AVT_EXCLUSION_EXTENT
set controller [b] HostNVSRAMByte[6,0x1a]=0x01,0x01; // return 12 bytes of WWN 
in inquiry
set controller [b] HostNVSRAMByte[6,0x23]=0x01,0x01; // enable "report 
preferred path"
set controller [b] HostNVSRAMByte[6,0x24]=0x01,0x01; // enable AVT
set controller [b] HostNVSRAMByte[6,0x25]=0x80,0x80; // enable DMP support
set controller [b] HostNVSRAMByte[6,0x27]=0x18,0x18; // AVT_EXCLUSION_EXTENT

5. Map 32 LUNs to host from array
6. Reboot the host
7. The host takes about more than 40+ minutes to boot back up
  
Actual results:

-- The LVM is triggering AVT which leads to thrashing of LUNs which in turn 
delays the boot time of the host

Expected results:

-- A small fix can be done to avoid issuing READ to the SCSI devices coming 
from Engenio storage array (LSI) which basically shows up as LSI 

For an Engenio Storage array, the o/p from '/proc/scsi/scsi/'

Host: scsi2 Channel: 00 Id: 10 Lun: 16
  Vendor: LSI      Model: INF-01-00        Rev: 9617
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 10 Lun: 17
  Vendor: LSI      Model: INF-01-00        Rev: 9617
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 10 Lun: 18
  Vendor: LSI      Model: INF-01-00        Rev: 9617
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 10 Lun: 19
  Vendor: LSI      Model: INF-01-00        Rev: 9617
  Type:   Direct-Access                    ANSI SCSI revision: 03

Additional info:

The messages from '/var/log/messages' file. An internal developed failover 
(RDAC) driver was used to debug to see what modules from OS are issuing READS 
and the below are, 

Apr 19 22:36:56 deer kernel: disk:sdb process: lvm opcode READ_10 
LBA:0x0:TranLen:0x80
Apr 19 22:36:56 deer kernel: disk:sdb process: lvm opcode READ_10 
LBA:0x1fff80:TranLen:0x8
Apr 19 22:36:56 deer kernel: disk:sdb process: lvm opcode READ_10 
LBA:0x0:TranLen:0x8
Apr 19 22:36:56 deer kernel: disk:sdb process: lvm opcode READ_10 
LBA:0x1fdf80:TranLen:0x8
Apr 19 22:36:57 deer kernel: disk:sdb process: lvm opcode READ_10 
LBA:0x0:TranLen:0x8
Apr 19 22:36:57 deer kernel: disk:sdb process: lvm opcode READ_10 
LBA:0x1fdf80:TranLen:0x8
Apr 19 22:36:57 deer kernel: disk:sdb process: lvm opcode READ_10 
LBA:0x0:TranLen:0x8

I am also attaching the messages' file w/ this bugzilla and the line#'s to be 
looked at in the 'messages' file are as below, 

1. line#3358 and onwards
2. line#3553, a command being issued from lvm.static
3. line#3590, a command being issued from 'mount' 

Issues with the modules in the OS:

1. lvm
2. lvm.static
3. mount

Comment 1 Shailendra Hebsur 2006-04-19 23:46:48 UTC
Created attachment 128017 [details]
'messages' file from the RHEL4u2 box

Comment 9 RHEL Program Management 2006-08-18 16:14:25 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.


Note You need to log in before you can comment on or make changes to this bug.