Bug 2103866 - Host in cluster rebooting upon array side controller failover
Summary: Host in cluster rebooting upon array side controller failover
Keywords:
Status: CLOSED DUPLICATE of bug 2103867
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: corosync
Version: 7.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-05 06:49 UTC by Govind Kulkarni
Modified: 2022-07-07 07:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-07 07:08:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
syslog message (6.00 MB, text/plain)
2022-07-05 06:49 UTC, Govind Kulkarni
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-126975 0 None None None 2022-07-05 06:53:31 UTC

Description Govind Kulkarni 2022-07-05 06:49:05 UTC
Created attachment 1894611 [details]
syslog message

Description of problem:

RHEL7.8 hosts, configured HA active/passive in NFS cluster.
Bith hosts, host1 & host2 mounted the NFS share volume and triggered the IO.
Failover was triggerred on storage array end. Which leads to IO drop and host reboot.

How reproducible:
Always

Steps to Reproduce:
1. Configure hosts in NFS cluster
2. Start IO
3. Trigger Controller failover on Nimble array.

Actual results:
IO drops and host reboots

Expected results:
IO should continue to run without disruption.

Additional info:
Mar 27 14:10:55 iwf-dl360-17 crmd[3606]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Mar 27 14:10:55 iwf-dl360-17 pengine[3605]: notice: Scheduling shutdown of node iwf-dl360-18
Mar 27 14:10:55 iwf-dl360-17 pengine[3605]: notice: * Shutdown iwf-dl360-18
Mar 27 14:10:55 iwf-dl360-17 pengine[3605]: notice: Calculated transition 6, saving inputs in /var/lib/pacemaker/pengine/pe-input-287.bz2
Mar 27 14:10:55 iwf-dl360-17 crmd[3606]: notice: Transition 6 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-287.bz2): Complete
Mar 27 14:10:55 iwf-dl360-17 crmd[3606]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Mar 27 14:10:55 iwf-dl360-17 crmd[3606]: notice: do_shutdown of peer iwf-dl360-18 is complete
Mar 27 14:10:55 iwf-dl360-17 attrd[3604]: notice: Node iwf-dl360-18 state is now lost
Mar 27 14:10:55 iwf-dl360-17 attrd[3604]: notice: Removing all iwf-dl360-18 attributes for peer loss
Mar 27 14:10:55 iwf-dl360-17 attrd[3604]: notice: Purged 1 peer with id=2 and/or uname=iwf-dl360-18 from the membership cache
Mar 27 14:10:55 iwf-dl360-17 stonith-ng[3602]: notice: Node iwf-dl360-18 state is now lost
Mar 27 14:10:55 iwf-dl360-17 stonith-ng[3602]: notice: Purged 1 peer with id=2 and/or uname=iwf-dl360-18 from the membership cache
Mar 27 14:10:55 iwf-dl360-17 cib[3601]: notice: Node iwf-dl360-18 state is now lost
Mar 27 14:10:55 iwf-dl360-17 cib[3601]: notice: Purged 1 peer with id=2 and/or uname=iwf-dl360-18 from the membership cache

Mar 27 14:10:55 iwf-dl360-17 corosync[3174]: [TOTEM ] A new membership (10.201.14.83:547) was formed. Members left: 2
Mar 27 14:10:55 iwf-dl360-17 corosync[3174]: [CPG ] downlist left_list: 1 received
Mar 27 14:10:55 iwf-dl360-17 corosync[3174]: [QUORUM] Members[1]: 1
Mar 27 14:10:55 iwf-dl360-17 corosync[3174]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 27 14:10:55 iwf-dl360-17 crmd[3606]: notice: Node iwf-dl360-18 state is now lost
Mar 27 14:10:55 iwf-dl360-17 pacemakerd[3591]: notice: Node iwf-dl360-18 state is now lost
Mar 27 14:10:55 iwf-dl360-17 crmd[3606]: notice: do_shutdown of peer iwf-dl360-18 is complete
Mar 27 14:11:04 iwf-dl360-17 systemd: Reloading.

Comment 3 Jan Friesse 2022-07-07 07:08:26 UTC

*** This bug has been marked as a duplicate of bug 2103867 ***


Note You need to log in before you can comment on or make changes to this bug.