Bug 1630344
Summary: | Somtimes node-agent message socket file "message.sock" is missing | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | gowtham <gshanmug> |
Component: | web-admin-tendrl-node-agent | Assignee: | gowtham <gshanmug> |
Status: | CLOSED ERRATA | QA Contact: | Filip Balák <fbalak> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.4 | CC: | fbalak, mbukatov, nthomas, rhs-bugs, sankarshan |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | RHGS 3.4.z Batch Update 2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | tendrl-node-agent-1.6.3-11.el7rhgs | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-12-17 17:06:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
gowtham
2018-09-18 12:37:09 UTC
As per current implementation Message socket file is present "/var/run/tendrl/message.sock", This file with folder "tendrl" is created when node-agent starts. When we stop node-agent then the folder "tendrl" and file "message.sock" will delete. Sometimes message.sock file is not created but folder "tendrl" is created. This issue is happening when a temporary network issue happening while try to connect with etcd. I have reprduced this scenario in another way, I have stopped etcd service, so node-agent services are going down after few retries. Then I started node-agent service (not restart just start) service tendrl-node-agent start it will start node-agent again, then i started tendrl other services also then I saw node-agent continuously raised above traceback message. I have checked message socket directory "/var/run/tendrl/", their message.sock file is missed. service tendrl-node-agent start is starting only node-agent service only after the crash, it is not starting service tendrl-node-agent.socket start. So socket file is not created. I have seen the same problem in customer machine log file, after node-gent restart log message I saw all message socket issue, and in the log file, I saw etcd temporary connection issue log message also. PR is under review: https://github.com/Tendrl/node-agent/pull/851 Steps to reproduce: 1. check file under /var/run/tendrl/message.sock 2. stop etcd service 3. After a few minutes, node-agent will go down 4. start etcd service 5. then use the command: service node-agent start (don't use restart) 6. check the directory /var/run/tendrl/ (message.sock file won't created) 7. start other tendrl services, and check the log file The reason tendrl-node-agent service will start node-agent.sock also, but in this case, node-agent.sock is not called. The same case happened in customer machine also. But little different, node-agent itself started again in temporary etcd connection issue. I can't reproduce exact scenario but mine is similar to that. QE team will retest this based on reproducer from comment 4. I was able to reproduce this issue with reproducer from comment 4 on older version and see that /var/run/tendrl/message.sock was not created and that tendrl-monitoring-integration was reporting errors related to that. With current version the message.sock file is created correctly and tendrl-monitoring-integration is working without errors related to this file (but there are tracebacks described in BZ 1647393 and BZ 1647386 now). --> VERIFIED Older version: tendrl-monitoring-integration-1.6.3-14.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-api-1.6.3-7.el7rhgs.noarch tendrl-api-httpd-1.6.3-7.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-14.el7rhgs.noarch tendrl-ansible-1.6.3-8.el7rhgs.noarch tendrl-commons-1.6.3-13.el7rhgs.noarch tendrl-node-agent-1.6.3-10.el7rhgs.noarch tendrl-ui-1.6.3-11.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch Current version: tendrl-monitoring-integration-1.6.3-15.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-api-1.6.3-8.el7rhgs.noarch tendrl-api-httpd-1.6.3-8.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-15.el7rhgs.noarch tendrl-ansible-1.6.3-9.el7rhgs.noarch tendrl-commons-1.6.3-13.el7rhgs.noarch tendrl-node-agent-1.6.3-11.el7rhgs.noarch tendrl-ui-1.6.3-12.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3829 |