Bug 1414519
Summary: | Glusterd fails to start: rpc frame timeouts | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Luis E. Cerezo <lcerezo> | ||||||
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | ||||||
Status: | CLOSED EOL | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.8 | CC: | amukherj, bugs, joe, lcerezo | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-11-07 10:37:57 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Luis E. Cerezo
2017-01-18 17:40:35 UTC
Created attachment 1242236 [details]
logfile (sanitized domain name)
log file in debug. it expands to ~50Mb
These rpc timeouts occur on all servers. Hi Joe, Yea, we are seeing these on all the servers. 24007 is open on all hosts though. [lucho@localhost HCI_scripts]$ ansible chi-virt-infra-hosts -m shell -a 'tcping -t 10 chi-virt-103-7-gluster.REDACTED.com 24007' -uroot chi-virt-103-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-103-7-gluster.REDACTED.com port 24007 open. chi-virt-102-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-103-7-gluster.REDACTED.com port 24007 open. chi-virt-101-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-103-7-gluster.REDACTED.com port 24007 open. [lucho@localhost HCI_scripts]$ ansible chi-virt-infra-hosts -m shell -a 'tcping -t 10 chi-virt-102-7-gluster.REDACTED.com 24007' -uroot chi-virt-103-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-102-7-gluster.REDACTED.com port 24007 open. chi-virt-101-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-102-7-gluster.REDACTED.com port 24007 open. chi-virt-102-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-102-7-gluster.REDACTED.com port 24007 open. [lucho@localhost HCI_scripts]$ ansible chi-virt-infra-hosts -m shell -a 'tcping -t 10 chi-virt-101-7-gluster.REDACTED.com 24007' -uroot chi-virt-102-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-101-7-gluster.REDACTED.com port 24007 open. chi-virt-103-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-101-7-gluster.REDACTED.com port 24007 open. chi-virt-101-7.REDACTED.com | SUCCESS | rc=0 >> chi-virt-101-7-gluster.REDACTED.com port 24007 open. [lucho@localhost HCI_scripts]$ Is there any additional information I can provide? (In reply to Luis E. Cerezo from comment #1) > Created attachment 1242236 [details] > logfile (sanitized domain name) > > log file in debug. it expands to ~50Mb The logfile attached is not readable. Could you please check and reattach the glusterd log file? Here's a pastbin url from the irc chat (DEBUG REMOVED) The attachment is a gzip of the log file. https://paste.fedoraproject.org/529909/47589871/ I'll upload the file again. It's gzipped of etc-glusterfs-glusterd.vol.log on one host in debug mode. I can provide other nodes in this 3 node setup if you wish. Created attachment 1243905 [details]
etc-glusterfs-glusterd.vol.log GZIP
sha512sum etc-glusterfs-glusterd.vol.log.gz 0d1dff013fb7e6a6ed3aeda60498c9565693c6b858b0f0579d02c48f0fb0874e5948e2620dcc54903708e3da9f2e7aabf868facaeb5bdab4fd1e35bd63dc12b1 etc-glusterfs-glusterd.vol.log.gz I didn't find any evidence of glusterd not coming up from the log file you shared. "fails to start" is probably not a logically accurate statement. From his user perspective, that's what he's interpreting the symptoms as. The real problem seems to be the repeating "[2017-01-18 00:07:24.745691] E [rpc-clnt.c:200:call_bail] 0-management: bailing out frame type(Peer mgmt) op(--(2)) xid = 0x8 sent = 2017-01-17 23:57:22.580694. timeout = 600 for 10.49.1.145:24007" timeouts he's getting on all servers. This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release. |