Bug 1811373
Summary: | glusterfsd crashes healing disperse volumes on arm | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Fox <foxxz.net> | ||||
Component: | core | Assignee: | bugs <bugs> | ||||
Status: | CLOSED UPSTREAM | QA Contact: | |||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7 | CC: | bugs, jahernan, pasik, srakonde | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | armv7l | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-03-12 12:22:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Can you check if this patch [1] fixes the issue ? [1] https://review.gluster.org/c/glusterfs/+/23912 This bug is moved to https://github.com/gluster/glusterfs/issues/886, and will be tracked there from now on. Visit GitHub issues URL for further details The patch provided did correct the issue. Thank you. Closed. Info added. |
Created attachment 1668387 [details] Excerpts from several gluster logs Description of problem: the gluster brick process on an arm node that needs healing will crash (almost always) seconds after it starts and connects to other cluster members. Have tested under ubuntu 18, gluster v7 and v4 running on odroid HC2 and raspbian gluster v5 running on raspberry pi 3 Version-Release number of selected component (if applicable): gluster 7.2 but have also reproduced the problem on 4 and 5 How reproducible: Reliably reproducible Steps to Reproduce: 1. Create disperse volume on a cluster with 3 or more members/bricks and enable healing 2. Have a client mount volume and begin writing files to volume 3. Reboot a cluster member during client operations 4. Cluster member rejoins cluster and attempts to heal 5. glusterd on that member typically crashes seconds to minutes after startup. In rare cases longer. Actual results: gluster volume status shows the affected brick online briefly and then offline after it crashes. The self heal daemon shows as online. The brick is never able to heal and rejoin the cluster. Expected results: The brick should come online and sync up. Additional info: Have run the same test on x86 hardware and it does not exhibit the same crash. I am willing to make this testbed available to developers to help debug this issue. It is a 12 node system comprised of odroid HC2 units with a 4tb drive attached to each unit. Volume Name: bigdisp Type: Disperse Volume ID: 56fa5de3-36d5-45ec-9789-88d8aae02275 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (8 + 4) = 12 Transport-type: tcp Bricks: Brick1: gluster1:/exports/sda/brick1/bigdisp Brick2: gluster2:/exports/sda/brick1/bigdisp Brick3: gluster3:/exports/sda/brick1/bigdisp Brick4: gluster4:/exports/sda/brick1/bigdisp Brick5: gluster5:/exports/sda/brick1/bigdisp Brick6: gluster6:/exports/sda/brick1/bigdisp Brick7: gluster7:/exports/sda/brick1/bigdisp Brick8: gluster8:/exports/sda/brick1/bigdisp Brick9: gluster9:/exports/sda/brick1/bigdisp Brick10: gluster10:/exports/sda/brick1/bigdisp Brick11: gluster11:/exports/sda/brick1/bigdisp Brick12: gluster12:/exports/sda/brick1/bigdisp Options Reconfigured: disperse.shd-max-threads: 4 client.event-threads: 8 cluster.disperse-self-heal-daemon: enable transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on Status of volume: bigdisp Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster1:/exports/sda/brick1/bigdisp 49152 0 Y 4632 Brick gluster2:/exports/sda/brick1/bigdisp 49152 0 Y 3115 Brick gluster3:/exports/sda/brick1/bigdisp N/A N/A N N/A Brick gluster4:/exports/sda/brick1/bigdisp 49152 0 Y 2728 Brick gluster5:/exports/sda/brick1/bigdisp 49152 0 Y 3072 Brick gluster6:/exports/sda/brick1/bigdisp 49152 0 Y 2549 Brick gluster7:/exports/sda/brick1/bigdisp 49152 0 Y 16848 Brick gluster8:/exports/sda/brick1/bigdisp 49152 0 Y 16740 Brick gluster9:/exports/sda/brick1/bigdisp 49152 0 Y 2619 Brick gluster10:/exports/sda/brick1/bigdisp 49152 0 Y 2677 Brick gluster11:/exports/sda/brick1/bigdisp 49152 0 Y 3023 Brick gluster12:/exports/sda/brick1/bigdisp 49153 0 Y 2440 Self-heal Daemon on localhost N/A N/A Y 4653 Self-heal Daemon on gluster3 N/A N/A Y 7620 Self-heal Daemon on gluster10 N/A N/A Y 2698 Self-heal Daemon on gluster7 N/A N/A Y 16869 Self-heal Daemon on gluster8 N/A N/A Y 16761 Self-heal Daemon on gluster12 N/A N/A Y 2461 Self-heal Daemon on gluster9 N/A N/A Y 2640 Self-heal Daemon on gluster2 N/A N/A Y 3136 Self-heal Daemon on gluster5 N/A N/A Y 3093 Self-heal Daemon on gluster4 N/A N/A Y 2749 Self-heal Daemon on gluster6 N/A N/A Y 2570 Self-heal Daemon on gluster11 N/A N/A Y 3044 Task Status of Volume bigdisp ------------------------------------------------------------------------------ There are no active volume tasks