| Summary: | gluster_shared_storage volume doesn't get mounted after disabling and enabling of shared storage. | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Shashank Raj <sraj> |
| Component: | glusterd | Assignee: | Gaurav Kumar Garg <ggarg> |
| Status: | CLOSED NOTABUG | QA Contact: | storage-qa-internal <storage-qa-internal> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | rhgs-3.1 | CC: | amukherj, ggarg, rhs-bugs, sashinde, sasundar, smohan, storage-qa-internal, vbellur |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-03-15 11:05:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Shashank Raj
2016-03-09 08:41:43 UTC
it seems that while disabling cluster.enable-shared-storage option disconnect event did not happen. there is some problem in rpc. from brick logs: [2016-03-08 00:36:19.055944] I [MSGID: 106005] [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: Brick dhcp37-52.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick has disconnected from glusterd. [2016-03-08 00:36:19.056289] W [rpcsvc.c:270:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) for 10.70.36.49:65482 [2016-03-08 00:36:19.056316] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully Due to disconnect event it was not able to remove the entry of brick port. since this issue is not reproducible every time frequency of reproducing of this issue is very rare. marking severity and priority of this issue low. Disconnect did happen but pmap_signout didn't and that's why pmap_registry_remove () for port 49152 was not called and pmap.ports[49152].brickname was not NULLed out. This resulted pmap_registry_search to pick up the same port assuming this port is mapped to the current running brick process and communicate back to the client and resulting into a mount failure. We'd need to check two things: 1. Was there a signout event initiated from the brick? 2. If signout event was initiated did it reach to the program layer? If so why it couldn't process it. Probably all it will boil down to finding the reason of the following logs: [2016-03-08 00:36:19.056289] W [rpcsvc.c:270:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) for 10.70.36.49:65482 [2016-03-08 00:36:19.056316] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully signout event didn't happen because we didn't see any logs entry like "removing brick <brick name> on port <port number>" in glusterd logs. And when first shared_storage disable happen it didn't free 49152 port as we can see in the following log message [2016-03-08 23:19:35.637523] W [rpcsvc.c:270:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 12984 37 330) for 10.70.36.49:65499 [2016-03-08 23:19:35.637571] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete succ essfully in this log window he tried to disable shared_storage. because of above rpc error it could not able to execute actor_fn () means "__gluster_pmap_signout". *since this issue reproduced only one time and not reproducible anymore I should close this issue.* Feel free to re-open this issue if its reproduce again. |