Bug 158365

Summary: Breaking ethernet bonding causes node to reset.
Product: [Retired] Red Hat Cluster Suite Reporter: David Milburn <dmilburn>
Component: clumanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: cluster-maint, tao
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-02 17:22:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Milburn 2005-05-20 21:51:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
Breaking the ethernet connection on node A running the
service causes node B to reset. This is with no powerswitch
and watchdog disabled, customer can reproduce everytime.

Version-Release number of selected component (if applicable):
clumanager-1.2.22-2

How reproducible:
Always

Steps to Reproduce:
1. Unplug cable on eth0
2. Unplug cable on eth1
3. Node on cluster side doing a soft reset.
  

Actual Results:  Node on cluster side is not shutting down gracefully and
is soft reseting followed by reboot.

Expected Results:  Node on cluster side should shutdown gracefully and reboot.

Additional info:

Both nodes are identical

IBM x335 series, Dual processor Xeon 2.8 GhZ with hyper-threading, 2 Gb memory and 2x36 Gb Raid 1 disk.

Comment 1 Lon Hohberger 2005-05-23 14:46:43 UTC
Based on the description, one of two things is happening:

(1) Unplugging the ethernet cables from a bonded interface behaves differently
from using a non-bonded interface, and the following code is getting run after
all paths are lost:

        /*
         * Reboot if we didn't send a heartbeat in interval*TKO_COUNT
         */
        if (!debug && __cmp_tv(&maxtime, &diff) == 1) {
                clulog(LOG_EMERG, "Failed to send a heartbeat within "
                       "failover time - REBOOTING\n");
                sync();
                reboot(RB_AUTOBOOT);
        }

The membership daemon doesn't know about fencing, so it can't make a judgement
based on that, which brings us to the other possibility:

(2) There are no power switches configured so the quorum daemon (which handles
fencing) is extremely paranoid.  You *can't* gracefully shut down in this case.
 After the failover time, if we try to shut down, the other node will be trying
to mount the file systems we still have mounted, resulting in file system (and
probably data) corruption.  A reboot-as-fast-as-possible doesn't guarantee data
integrity, but it's certainly better than a slow shutdown.


It's not a bug either way.  However:

(a) Behavior (2) won't change unless power switches are installed and
configured; data integrity trumps the nicety of a clean shutdown.

(b) We could alter the behavior of (1) to just log a nasty message at EMERG log
level and not reboot.  If there are no power switches configured, the behavior
won't visibly change at all.

Comment 2 Lon Hohberger 2005-05-23 14:55:23 UTC
Ah, after rereading this and parts of the related issue, there is more to it
than this.

Comment 4 Lon Hohberger 2006-02-02 17:22:00 UTC
Upon further investigation (this is a really old issue), this isn't related to
bonding at all.

dmesg from one of the sysreports:

scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.10-RH1
        <Adaptec 29320A Ultra320 SCSI adapter>
        aic7901: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs

blk: queue c4da7418, I/O limit 4095Mb (mask 0xffffffff)
(scsi1:A:5:0): refuses WIDE negotiation.  Using 8bit transfers
(scsi1:A:5:0): refuses synchronous negotiation. Using asynchronous transfers
(scsi1:A:8): 160.000MB/s transfers (80.000MHz DT|IU|QAS, 16bit)
(scsi1:A:9): 160.000MB/s transfers (80.000MHz DT|IU|QAS, 16bit)
  Vendor: SUN       Model: StorEdge 3120  D  Rev: 1159
  Type:   Processor                          ANSI SCSI revision: 02
blk: queue c4da7218, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: FUJITSU   Model: MAP3735N SUN72G   Rev: 0401
  Type:   Direct-Access                      ANSI SCSI revision: 04
blk: queue c4da7618, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: FUJITSU   Model: MAP3735N SUN72G   Rev: 0401
  Type:   Direct-Access                      ANSI SCSI revision: 04
blk: queue c4da7818, I/O limit 4095Mb (mask 0xffffffff)
scsi1:A:8:0: Tagged Queuing enabled.  Depth 32
scsi1:A:9:0: Tagged Queuing enabled.  Depth 32
Attached scsi disk sdb at scsi1, channel 0, id 8, lun 0
Attached scsi disk sdc at scsi1, channel 0, id 9, lun 0
SCSI device sdb: 143374738 512-byte hdwr sectors (73408 MB)
 sdb: sdb1 sdb2 sdb3
==================================
/etc/sysconfig/rawdevices from the same node:

# raw device bindings
# format:  <rawdev> <major> <minor>
#          <rawdev> <blockdev>
# example: /dev/raw/raw1 /dev/sda1
#          /dev/raw/raw2 8 5
/dev/raw/raw1 /dev/sdb1
/dev/raw/raw2 /dev/sdb2
==================================

A couple of facts:

* The StorEdge 3120 is a SCSI JBOD, with zero RAID capabilities.  Here's more
information about that array:

    http://www.sun.com/storage/workgroup/3000/3100/3120scsi/

* The Fujitsu MAP3735N is a 73GB hard disk drive.  You can buy them here:

    http://froogle.google.com/froogle?q=MAP3735+FUJITSU

What does this mean?  Dmesg shows that the controllers are seeing the disks
individually, and that they are not in any sort of RAID configuration.  The
behavior (bus reset followed by errors) observed is common on multi-initiator
parallel SCSI configurations.

A quick look at the RHCS documentation says that doing a multi-initiator
parallel SCSI configuration will not work:

   
http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/cluster-suite/ch-hardware.html

"Testing has shown that it is difficult, if not impossible, to configure
reliable multi-initiator parallel SCSI configurations at data rates above
80MB/sec using standard SCSI adapters. Further tests have shown that these
configurations cannot support online repair because the bus does not work
reliably when the HBA terminators are disabled, and external terminators are
used. For these reasons, multi-initiator SCSI configurations using standard
adapters are not supported. Either single-initiator SCSI bus adapters (connected
to multi-ported storage) or Fibre Channel adapters are required."

The customer should disable SCSI bus resets on the host bus adapters; this is
usually done in the BIOS for the HBA.  This *might* prevent the reboot from
occurring in the future (if it does, services will remain available!), but is
not guaranteed because some SCSI device drivers perform SCSI bus resets while
initializing.  (I do not know if the AIC7xxx driver performs a SCSI bus reset or
not.)

The cluster can not shut down gracefully if access to shared storage is not
available, mostly because unmounting will not work.  Unmounting a file system
requires updating certain metadata on the superblock, which can not be done if
the storage is not accessible.

You can make clumanager take different actions on shared storage access failures
using the cludb command, but this is not a widely-tested nor supported thing to
do (the ability is there primarily for testing purposes, not production use). 
See the man page for 'cludb' for more details.  Good luck!