Bug 1761916
| Summary: | pvscan stuck with pcs cluster configuration | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | vincent chen <vincent.chen1> | ||||
| Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.1 | CC: | agk, cluster-maint, fdinitto, heinzm, jbrassow, prajnoha, teigland, zkabelac | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 8.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-10-15 16:13:10 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
i try to collect e2e-l4-10026 sosreport. there have many time out Please enter the case id that you are generating this report for []: Setting up archive ... Setting up plugins ... Running plugins. Please wait ... Starting 27/111 filesys [Running: block cgroups dracut filesys] Plugin block timed out Starting 28/111 firewalld [Running: cgroups dracut filesys firewalld] Plugin cgroups timed out Starting 33/111 gssproxy [Running: dracut filesys grub2 gssproxy] Plugin dracut timed out Starting 44/111 kernel [Running: filesys grub2 hardware kernel] Plugin filesys timed out Starting 55/111 lvm2 [Running: grub2 hardware kernel lvm2] Plugin grub2 timed out Starting 56/111 md [Running: hardware kernel lvm2 md] Plugin hardware timed out Starting 60/111 multipath [Running: kernel lvm2 memory multipath] Plugin kernel timed out Starting 75/111 pci [Running: lvm2 networking pam pci] Plugin lvm2 timed out Starting 111/111 yum [Running: processor system vdo yum] nager] This looks like a duplicate of bug 1730455 which is a bug in the LVM-activate agent. *** This bug has been marked as a duplicate of bug 1730455 *** David, in bug 1730455, there state to have a fix code resource-agents-4.1.1-33.el8.x86_64 i don't find resource-agents-4.1.1-33.el8.x86_64 in redhat sbuscription repo. where i can get this code? [root@e2e-l4-10026 ~]# dnf repolist Updating Subscription Management repositories. Red Hat Enterprise Linux 8 for x86_64 - High Availability Beta (Debug RPMs) 33 kB/s | 27 kB 00:00 Red Hat Enterprise Linux 8 for x86_64 - High Availability Beta (RPMs) 472 kB/s | 512 kB 00:01 Red Hat Enterprise Linux 8 for x86_64 - High Availability Beta (Source RPMs) 27 kB/s | 3.8 kB 00:00 Red Hat Enterprise Linux 8 for x86_64 - AppStream Beta (RPMs) 16 kB/s | 4.2 kB 00:00 Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (Source RPMs) 27 kB/s | 3.8 kB 00:00 Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (RPMs) 28 kB/s | 4.0 kB 00:00 repo id repo name status rhel-8-for-x86_64-appstream-beta-rpms Red Hat Enterprise Linux 8 for x86_64 - AppStream Beta ( 4,795 rhel-8-for-x86_64-baseos-beta-rpms Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (RPM 1,662 rhel-8-for-x86_64-baseos-beta-source-rpms Red Hat Enterprise Linux 8 for x86_64 - BaseOS Beta (Sou 515 rhel-8-for-x86_64-highavailability-beta-debug-rpms Red Hat Enterprise Linux 8 for x86_64 - High Availabilit 32 rhel-8-for-x86_64-highavailability-beta-rpms Red Hat Enterprise Linux 8 for x86_64 - High Availabilit 60 rhel-8-for-x86_64-highavailability-beta-source-rpms Red Hat Enterprise Linux 8 for x86_64 - High Availabilit 17 [root@e2e-l4-10026 ~]# yum list resource-agents --showduplicates Updating Subscription Management repositories. Last metadata expiration check: 0:00:06 ago on Tue 15 Oct 2019 11:03:05 PM EDT. Installed Packages resource-agents.x86_64 4.1.1-27.el8 @RHEL-8-highavailability-partners Available Packages resource-agents.src 4.1.1-27.el8 rhel-8-for-x86_64-highavailability-beta-source-rpms resource-agents.x86_64 4.1.1-27.el8 rhel-8-for-x86_64-highavailability-beta-rpms I have very little knowledge about the process by which fixes become available. I'm guessing you'd need to contact the support group to figure out what the problem is. thanks, David. i will check support group about it. |
Created attachment 1626024 [details] e2e-l4-10024 sosreport Description of problem: the pcs cluster often see clusterlv0 monitor error and then node e2e-l4-10026 will be fenced and node 10026 . at this time, we see pvscan/lvscan will be stuck. [root@e2e-l4-10024 ~]# pcs status Cluster name: RHCS Stack: corosync Current DC: e2e-l4-10024 (version 2.0.2-3.el8-744a30d655) - partition with quorum Last updated: Tue Oct 15 09:44:50 2019 Last change: Tue Oct 15 03:14:24 2019 by root via cibadmin on e2e-l4-10024 2 nodes configured 13 resources configured Online: [ e2e-l4-10024 e2e-l4-10026 ] Full list of resources: emc_fence (stonith:fence_scsi): Started e2e-l4-10024 Clone Set: dlm-clone [dlm] Started: [ e2e-l4-10024 e2e-l4-10026 ] Clone Set: lvmlockd-clone [lvmlockd] Started: [ e2e-l4-10024 e2e-l4-10026 ] Clone Set: xio_gfs-clone [xio_gfs] Started: [ e2e-l4-10024 e2e-l4-10026 ] Clone Set: alua_0_vg_1570690865-clone [alua_0_vg_1570690865] Started: [ e2e-l4-10024 e2e-l4-10026 ] Failed Resource Actions: * clusterlv0_monitor_0 on e2e-l4-10026 'unknown error' (1): call=20, status=Timed Out, exitreason='', last-rc-change='Tue Oct 15 09:42:33 2019', queued=0ms, exec=90001ms * clusterlv1_monitor_0 on e2e-l4-10026 'unknown error' (1): call=30, status=Timed Out, exitreason='', last-rc-change='Tue Oct 15 09:42:33 2019', queued=0ms, exec=90001ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled pvscan -vvv stuck [root@e2e-l4-10026 ~]# pvscan -vvv Parsing: pvscan -vvv Recognised command pvscan_display (id 113 / enum 94). Sysfs filter initialised. Internal filter initialised. LVM type filter initialised. Usable device filter initialised (scan_lvs 0). mpath filter initialised. Partitioned filter initialised. signature filter initialised. MD filter initialised. Composite filter initialised. Persistent filter initialised. devices/allow_mixed_block_sizes not found in config: defaulting to 0 devices/hints not found in config: defaulting to all metadata/record_lvs_history not found in config: defaulting to 0 DEGRADED MODE. Incomplete RAID LVs will be processed. Processing command: pvscan -vvv Command pid: 41298 System ID: e2e-l4-10026 O_DIRECT will be used global/locking_type not found in config: defaulting to 1 File locking settings: readonly:0 sysinit:0 ignorelockingfailure:0 global/metadata_read_only:0 global/wait_for_locks:1. devices/md_component_checks not found in config: defaulting to auto Using md_component_checks auto use_full_md_check 0 /run/lvm/lvmlockd.socket: Opening daemon socket to lvmlockd for protocol lvmlockd version 1. Sending daemon lvmlockd: hello Successfully connected to lvmlockd on fd 3. report/output_format not found in config: defaulting to basic log/report_command_log not found in config: defaulting to 0 Processing each PV Locking /run/lock/lvm/P_global RB _do_flock /run/lock/lvm/P_global:aux WB _undo_flock /run/lock/lvm/P_global:aux _do_flock /run/lock/lvm/P_global RB lockd global mode sh < --- stuck here Version-Release number of selected component (if applicable): lvm2-2.03.05-4.el8.x86_64 dlm-4.0.9-3.el8.x86_64 How reproducible: Steps to Reproduce: 1. set up cluster 2.wait some time, the "pcs status" showing monitor error 3. check pvscan and it stuck 4. reboot e2e-l4-10026 will resolve this problem temporarily. wait some some, the above issue will happen again Ac"tual results: Expected results: pvs work well and cluster don't see monitor error Additional info: