Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1761916

Summary: pvscan stuck with pcs cluster configuration
Product: Red Hat Enterprise Linux 8 Reporter: vincent chen <vincent.chen1>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED DUPLICATE QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.1CC: agk, cluster-maint, fdinitto, heinzm, jbrassow, prajnoha, teigland, zkabelac
Target Milestone: rc   
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-15 16:13:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
e2e-l4-10024 sosreport none

Description vincent chen 2019-10-15 14:59:18 UTC
Created attachment 1626024 [details]
e2e-l4-10024 sosreport

Description of problem:
the pcs cluster often see clusterlv0 monitor error and then node e2e-l4-10026 will be fenced and node 10026 .  at this time, we see pvscan/lvscan  will be stuck.


[root@e2e-l4-10024 ~]# pcs status
Cluster name: RHCS
Stack: corosync
Current DC: e2e-l4-10024 (version 2.0.2-3.el8-744a30d655) - partition with quorum
Last updated: Tue Oct 15 09:44:50 2019
Last change: Tue Oct 15 03:14:24 2019 by root via cibadmin on e2e-l4-10024

2 nodes configured
13 resources configured

Online: [ e2e-l4-10024 e2e-l4-10026 ]

Full list of resources:

 emc_fence      (stonith:fence_scsi):   Started e2e-l4-10024
 Clone Set: dlm-clone [dlm]
     Started: [ e2e-l4-10024 e2e-l4-10026 ]
 Clone Set: lvmlockd-clone [lvmlockd]
     Started: [ e2e-l4-10024 e2e-l4-10026 ]
 Clone Set: xio_gfs-clone [xio_gfs]
     Started: [ e2e-l4-10024 e2e-l4-10026 ]
 Clone Set: alua_0_vg_1570690865-clone [alua_0_vg_1570690865]
     Started: [ e2e-l4-10024 e2e-l4-10026 ]

Failed Resource Actions:
* clusterlv0_monitor_0 on e2e-l4-10026 'unknown error' (1): call=20, status=Timed Out, exitreason='',
    last-rc-change='Tue Oct 15 09:42:33 2019', queued=0ms, exec=90001ms
* clusterlv1_monitor_0 on e2e-l4-10026 'unknown error' (1): call=30, status=Timed Out, exitreason='',
    last-rc-change='Tue Oct 15 09:42:33 2019', queued=0ms, exec=90001ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


pvscan -vvv stuck

[root@e2e-l4-10026 ~]# pvscan -vvv
        Parsing: pvscan -vvv
        Recognised command pvscan_display (id 113 / enum 94).
        Sysfs filter initialised.
        Internal filter initialised.
        LVM type filter initialised.
        Usable device filter initialised (scan_lvs 0).
        mpath filter initialised.
        Partitioned filter initialised.
        signature filter initialised.
        MD filter initialised.
        Composite filter initialised.
        Persistent filter initialised.
      devices/allow_mixed_block_sizes not found in config: defaulting to 0
      devices/hints not found in config: defaulting to all
      metadata/record_lvs_history not found in config: defaulting to 0
        DEGRADED MODE. Incomplete RAID LVs will be processed.
        Processing command: pvscan -vvv
        Command pid: 41298
        System ID: e2e-l4-10026
        O_DIRECT will be used
      global/locking_type not found in config: defaulting to 1
        File locking settings: readonly:0 sysinit:0 ignorelockingfailure:0 global/metadata_read_only:0 global/wait_for_locks:1.
      devices/md_component_checks not found in config: defaulting to auto
        Using md_component_checks auto use_full_md_check 0
        /run/lvm/lvmlockd.socket: Opening daemon socket to lvmlockd for protocol lvmlockd version 1.
        Sending daemon lvmlockd: hello
        Successfully connected to lvmlockd on fd 3.
      report/output_format not found in config: defaulting to basic
      log/report_command_log not found in config: defaulting to 0
        Processing each PV
      Locking /run/lock/lvm/P_global RB
        _do_flock /run/lock/lvm/P_global:aux WB
        _undo_flock /run/lock/lvm/P_global:aux
        _do_flock /run/lock/lvm/P_global RB
        lockd global mode sh                     < --- stuck  here 


 

Version-Release number of selected component (if applicable):
lvm2-2.03.05-4.el8.x86_64
dlm-4.0.9-3.el8.x86_64


How reproducible:


Steps to Reproduce:
1. set up cluster 
2.wait some time, the "pcs status" showing monitor error 
3. check pvscan and it stuck
4.  reboot e2e-l4-10026 will resolve this problem temporarily. wait some some, the above issue will happen again

Ac"tual results:



Expected results:

pvs work well and cluster don't see monitor error


Additional info:

Comment 1 vincent chen 2019-10-15 15:43:17 UTC
i try to collect e2e-l4-10026  sosreport. there have many time out

Please enter the case id that you are generating this report for []:

 Setting up archive ...
 Setting up plugins ...
 Running plugins. Please wait ...

  Starting 27/111 filesys         [Running: block cgroups dracut filesys]


 Plugin block timed out

  Starting 28/111 firewalld       [Running: cgroups dracut filesys firewalld]
 Plugin cgroups timed out

  Starting 33/111 gssproxy        [Running: dracut filesys grub2 gssproxy]
 Plugin dracut timed out

  Starting 44/111 kernel          [Running: filesys grub2 hardware kernel]
 Plugin filesys timed out

  Starting 55/111 lvm2            [Running: grub2 hardware kernel lvm2]


 Plugin grub2 timed out

  Starting 56/111 md              [Running: hardware kernel lvm2 md]
 Plugin hardware timed out

  Starting 60/111 multipath       [Running: kernel lvm2 memory multipath]
 Plugin kernel timed out

  Starting 75/111 pci             [Running: lvm2 networking pam pci]
 Plugin lvm2 timed out

  Starting 111/111 yum             [Running: processor system vdo yum]                    nager]

Comment 2 David Teigland 2019-10-15 16:13:10 UTC
This looks like a duplicate of bug 1730455 which is a bug in the LVM-activate agent.

*** This bug has been marked as a duplicate of bug 1730455 ***

Comment 3 vincent chen 2019-10-16 03:06:33 UTC
David,
in bug 1730455, there state to have a fix code resource-agents-4.1.1-33.el8.x86_64
i don't find resource-agents-4.1.1-33.el8.x86_64 in redhat sbuscription repo. where i can get this code?

[root@e2e-l4-10026 ~]# dnf repolist
Updating Subscription Management repositories.
Red Hat Enterprise Linux 8 for x86_64 - High Availability Beta (Debug RPMs)         33 kB/s |  27 kB     00:00
Red Hat Enterprise Linux 8 for x86_64 - High Availability Beta (RPMs)              472 kB/s | 512 kB     00:01
Red Hat Enterprise Linux 8 for x86_64 - High Availability Beta (Source RPMs)        27 kB/s | 3.8 kB     00:00
Red Hat Enterprise Linux 8 for x86_64 - AppStream Beta (RPMs)                       16 kB/s | 4.2 kB     00:00
Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (Source RPMs)                   27 kB/s | 3.8 kB     00:00
Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (RPMs)                          28 kB/s | 4.0 kB     00:00
repo id                                             repo name                                                status
rhel-8-for-x86_64-appstream-beta-rpms               Red Hat Enterprise Linux 8 for x86_64 - AppStream Beta ( 4,795
rhel-8-for-x86_64-baseos-beta-rpms                  Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (RPM 1,662
rhel-8-for-x86_64-baseos-beta-source-rpms           Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (Sou   515
rhel-8-for-x86_64-highavailability-beta-debug-rpms  Red Hat Enterprise Linux 8 for x86_64 - High Availabilit    32
rhel-8-for-x86_64-highavailability-beta-rpms        Red Hat Enterprise Linux 8 for x86_64 - High Availabilit    60
rhel-8-for-x86_64-highavailability-beta-source-rpms Red Hat Enterprise Linux 8 for x86_64 - High Availabilit    17


[root@e2e-l4-10026 ~]#  yum list resource-agents --showduplicates
Updating Subscription Management repositories.
Last metadata expiration check: 0:00:06 ago on Tue 15 Oct 2019 11:03:05 PM EDT.
Installed Packages
resource-agents.x86_64               4.1.1-27.el8               @RHEL-8-highavailability-partners
Available Packages
resource-agents.src                  4.1.1-27.el8               rhel-8-for-x86_64-highavailability-beta-source-rpms
resource-agents.x86_64               4.1.1-27.el8               rhel-8-for-x86_64-highavailability-beta-rpms

Comment 4 David Teigland 2019-10-16 14:16:46 UTC
I have very little knowledge about the process by which fixes become available.  I'm guessing you'd need to contact the support group to figure out what the problem is.

Comment 5 vincent chen 2019-10-16 14:24:34 UTC
thanks, David. i will check support group about it.