Bug 1210568
| Summary: | [GlusterFS 3.6.2 ] glusterd crashes after enabling management SSL/TLS and glusterd is restarted | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | ssamanta | ||||||
| Component: | glusterd | Assignee: | krishnaram Karthick <kramdoss> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | |||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 3.6.2 | CC: | amukherj, bugs, csaba, jdarcy, kaushal, mzywusko, rcyriac, sankarshan, smohan | ||||||
| Target Milestone: | --- | Keywords: | Triaged | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-08-01 04:42:31 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1211643 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Description
ssamanta
2015-04-10 05:50:05 UTC
Sobhan, I still don't understand what is wrong with your environment. I've myself never faced any crashes, and know that other people who had SSL setup issues didn't face crashes. I'll try to find time early next week and sit down with you. I'd like to observe what you are doing completely. In what way is one server's certificates incorrect? The description seems to show certificates being created the same way everywhere, and that way looks valid. The problem is while concatenating the public keys to form glusterfs.ca file I missed one key (i.e clients public key) and copied the glusterfs.ca file to /etc/ssl to the server nodes.I tried to mount from the client and it failed. Then I realized I missed one entry in glusterfs.ca file to the server nodes. When I saw "gluster volume status" I saw the brick was down for one node and there was a brick crash. I understand that SSL connection is failed due to incorrect SSL certificates being copied to server nodes. In any case the glusterd should not crash and it should handle it gracefully. Sobhan,
It would be good if you can reproduce this and collect all the cli steps and create a document so that everyone is clear what steps you followed in what exact sequence and post that doc as an attachment to BZ, that should reduce all confusion and reduce the ambiguity that typically crops up when u put something in words
Also you said:
>How reproducible:
>Tried once
Not very encouraging :) IMHO, As a bug reporter you should report a bug only if its reproducing relaibly or atleast say that this happened just once and not
happening again, which is a imp data point for developer
Created attachment 1013873 [details]
Reproduction steps for glusterd crash
I reproduced the issue and mentioned where exactly the glusterd crashes. Please see the attachment. I also have the machines in the same state and happy to share the test setup for more investigation. The following backtrace may be useful for debugging further.
(gdb) bt
#0 SSL_write (s=0x0, buf=0x5555557983a0, num=4) at ssl_lib.c:990
#1 0x00007fffe6854602 in ssl_do (buf=0x5555557983a0, len=4, func=0x7fffe6625a90 <SSL_write>, this=0x5555557e8690, this=0x5555557e8690)
at socket.c:281
#2 0x00007fffe685495e in __socket_rwv (this=this@entry=0x5555557e8690, vector=<optimized out>, count=<optimized out>,
pending_vector=pending_vector@entry=0x5555557984b0, pending_count=pending_count@entry=0x5555557984b8, bytes=bytes@entry=0x0, write=write@entry=1)
at socket.c:565
#3 0x00007fffe6854ff2 in __socket_writev (pending_count=<optimized out>, pending_vector=<optimized out>, count=<optimized out>,
vector=<optimized out>, this=0x5555557e8690) at socket.c:684
#4 __socket_ioq_churn_entry (this=this@entry=0x5555557e8690, entry=entry@entry=0x555555798390, direct=direct@entry=1) at socket.c:1082
#5 0x00007fffe68554a4 in socket_submit_request (this=0x5555557e8690, req=<optimized out>) at socket.c:3304
#6 0x00007ffff792f1a2 in rpc_clnt_submit (rpc=rpc@entry=0x5555557e45a0, prog=prog@entry=0x7fffed061f60 <glusterd_dump_prog>,
procnum=procnum@entry=1, cbkfn=cbkfn@entry=0x7fffecdb4970 <glusterd_peer_dump_version_cbk>, proghdr=proghdr@entry=0x7fffffffcc30,
proghdrcount=1, progpayload=progpayload@entry=0x0, progpayloadcount=progpayloadcount@entry=0, iobref=iobref@entry=0x5555557f85e0,
frame=frame@entry=0x7ffff5bb26b4, rsphdr=rsphdr@entry=0x0, rsphdr_count=rsphdr_count@entry=0, rsp_payload=rsp_payload@entry=0x0,
rsp_payload_count=rsp_payload_count@entry=0, rsp_iobref=rsp_iobref@entry=0x0) at rpc-clnt.c:1555
#7 0x00007fffecd7d8a4 in glusterd_submit_request_unlocked (rpc=rpc@entry=0x5555557e45a0, req=req@entry=0x7fffffffcd10,
frame=frame@entry=0x7ffff5bb26b4, prog=prog@entry=0x7fffed061f60 <glusterd_dump_prog>, procnum=procnum@entry=1, iobref=0x5555557f85e0,
iobref@entry=0x0, this=this@entry=0x55555579ca30, cbkfn=cbkfn@entry=0x7fffecdb4970 <glusterd_peer_dump_version_cbk>,
xdrproc=xdrproc@entry=0x7ffff7714380 <xdr_gf_dump_req>) at glusterd-utils.c:271
#8 0x00007fffecd7da1a in glusterd_submit_request (rpc=0x5555557e45a0, req=req@entry=0x7fffffffcd10, frame=0x7ffff5bb26b4,
prog=prog@entry=0x7fffed061f60 <glusterd_dump_prog>, procnum=procnum@entry=1, iobref=iobref@entry=0x0, this=this@entry=0x55555579ca30,
cbkfn=cbkfn@entry=0x7fffecdb4970 <glusterd_peer_dump_version_cbk>, xdrproc=0x7ffff7714380 <xdr_gf_dump_req>) at glusterd-utils.c:296
#9 0x00007fffecdb8b1e in glusterd_peer_dump_version (this=this@entry=0x55555579ca30, rpc=rpc@entry=0x5555557e45a0,
peerctx=peerctx@entry=0x5555557d6ac0) at glusterd-handshake.c:2008
#10 0x00007fffecd6b58e in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x5555557e45a0, mydata=mydata@entry=0x5555557d6ac0,
event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) at glusterd-handler.c:4352
#11 0x00007fffecd6401c in glusterd_big_locked_notify (rpc=0x5555557e45a0, mydata=0x5555557d6ac0, event=RPC_CLNT_CONNECT, data=0x0,
notify_fn=0x7fffecd6b2a0 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:69
#12 0x00007ffff79303d0 in rpc_clnt_notify (trans=<optimized out>, mydata=0x5555557e45d0, event=<optimized out>, data=<optimized out>)
at rpc-clnt.c:923
#13 0x00007ffff792c2f3 in rpc_transport_notify (this=this@entry=0x5555557e8690, event=event@entry=RPC_TRANSPORT_CONNECT,
data=data@entry=0x5555557e8690) at rpc-transport.c:516
#14 0x00007fffe6855977 in socket_connect_finish (this=this@entry=0x5555557e8690) at socket.c:2301
#15 0x00007fffe685abff in socket_event_handler (fd=<optimized out>, idx=2, data=data@entry=0x5555557e8690, poll_in=1, poll_out=4, poll_err=16)
---Type <return> to continue, or q <return> to quit---
at socket.c:2331
#16 0x00007ffff7bb15f2 in event_dispatch_epoll_handler (i=<optimized out>, events=0x5555557dc2a0, event_pool=0x555555788d00) at event-epoll.c:384
#17 event_dispatch_epoll (event_pool=0x555555788d00) at event-epoll.c:445
#18 0x000055555555a012 in main (argc=3, argv=0x7fffffffe0d8) at glusterfsd.c:2043
(gdb)
Created attachment 1014229 [details]
Same content as reproduction_steps_for_glusterd_crash.odt, converted to plaintext
Proper steps were not followed to enable SSL on the management path. Precisely, these are the steps to be followed. 1) Stop all gluster processes on all nodes (glusterd, glusterfs, glusterfsd) 2) create file /var/lib/glusterd/secure-access on all nodes 3) start glusterd process However, this bug will be kept open and verified once again after we have fix for 1244415 This is not a security bug, not going to fix this in 3.6.x because of http://www.gluster.org/pipermail/gluster-users/2016-July/027682.html If the issue persists in the latest releases, please feel free to clone them |