Bug 1641969 - Mounted Dir Gets Error in GlusterFS Storage Cluster with SSL/TLS Encryption as Doing add-brick and remove-brick Repeatly
Summary: Mounted Dir Gets Error in GlusterFS Storage Cluster with SSL/TLS Encryption a...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 4.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-23 09:14 UTC by Terry Cui
Modified: 2019-08-06 11:28 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-08-06 11:28:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Terry Cui 2018-10-23 09:14:25 UTC
Description of problem:
Mounted dir gets error in GlusterFS storage cluster with SSL/TLS encryption as doing add-brick and remove-brick repeatly.


Version-Release number of selected component (if applicable):
4.1.5 or older versions


How reproducible:
It could be reproduced easily.


Steps to Reproduce:
1. SSL is enabled for GlusterFS Replicated Volume afr_vol, which is mounted on /mnt/gluster by GlusterFS native client.
2. The command "add-brick" and "remove-brick" are executed repeatedly. 
3. At the same time, the mounted dir is read or written continuously for a while. Here I used the command "find /mnt/gluster" or "ls /mnt/gluster".


Actual results:
Later, The error message is as below:
find: ‘/mnt/gluster’: Transport endpoint is not connected


Expected results:
The mounted dir should be always OK when it is read or written.


Additional info:
However, everything would be OK if SSL is disabled for GlusterFS Replicated Volume afr_vol.

As the mounted dir cannot be accessed, glusterfs process is using 100% CPU. And strace result of glusterfs process is as below:

strace -f -p 6576


[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295) = 1 ([{fd=12, revents=POLLIN}])
[pid 14601] poll([{fd=12, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=11, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, 4294967295^CProcess 6576 detached
Process 6577 detached
Process 6578 detached
Process 6579 detached
Process 6580 detached
Process 6581 detached
Process 6584 detached
Process 6585 detached
Process 6596 detached
Process 6597 detached
Process 1578 detached
Process 1581 detached
Process 11623 detached
Process 14581 detached
Process 14601 detached
 <detached ...>
Process 15032 detached

Comment 1 Terry Cui 2018-11-19 05:45:42 UTC
Is there someone who can help me with this issue?
Thanks a lot. :-)

Comment 2 Amar Tumballi 2019-07-16 04:00:49 UTC
Terry, Apologies for the delay.

While the issue may be valid, the usecase of 'repeated' add-brick and remove-brick was not in scope of Gluster's design. Ie, it is true that it is a software defined storage solution, and we do provide scale-out and scale-down operation. But we normally designed it to be a rare operation (ie, once in a year, once in a quarter etc).

Comment 3 Amar Tumballi 2019-08-06 11:28:27 UTC
The usecase of 'repeated' add-brick remove-brick is something we are not focusing right now. Marking it DEFERRED till it gets a focus.


Note You need to log in before you can comment on or make changes to this bug.