Description of problem: while executing uss enable/disable in loop on one node and restarting glusterd on other node, glusterd failed to start Version-Release number of selected component (if applicable): glusterfs-3.8.4-25.el7rhgs.x86_64 [root@dhcp46-132 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 Beta (Maipo How reproducible: 1/1 Steps to Reproduce: 1. Create 2*2 distribute replicate node 2. Mount on client and start IO 3. Enable USS 4. Snapshot two snapshot and activate it 5. enable and disable uss on one node in loop and on other node restart glusterd Actual results: glusterd failed on start on node where restart glusterd command was executing Seeing AVC denied message too type=AVC msg=audit(1495788895.100:403): avc: denied { unlink } for pid=27165 comm="glusterd" name="glusterd.socket" dev="tmpfs" ino=204648 scontext=system_u:system_r:glusterd_t:s0 tcontext=unconfined_u:object_r:var_run_t:s0 tclass=sock_file type=AVC msg=audit(1495788927.092:405): avc: denied { unlink } for pid=27192 comm="glusterd" name="glusterd.socket" dev="tmpfs" ino=204648 scontext=system_u:system_r:glusterd_t:s0 tcontext=unconfined_u:object_r:var_run_t:s0 tclass=sock_file Expected results: glusterd shouldn't fail Additional info: ======================= glusterd logs [2017-05-26 09:16:18.525243] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.8.4/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2017-05-26 09:16:18.525289] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2017-05-26 09:16:18.525305] W [rpcsvc.c:1646:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2017-05-26 09:16:18.525330] E [MSGID: 106243] [glusterd.c:1722:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2017-05-26 09:16:22.515271] I [MSGID: 106513] [glusterd-store.c:2101:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31100 [2017-05-26 09:16:22.515867] E [MSGID: 101032] [store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/vols/vol0/info. [No such file or directory] [2017-05-26 09:16:22.515896] E [MSGID: 106200] [glusterd-store.c:2653:glusterd_store_update_volinfo] 0-management: volinfo handle is NULL [2017-05-26 09:16:22.515908] E [MSGID: 106207] [glusterd-store.c:2953:glusterd_store_retrieve_volume] 0-management: Failed to update volinfo for vol0 volume [2017-05-26 09:16:22.515969] E [MSGID: 106201] [glusterd-store.c:3169:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: vol0 [2017-05-26 09:16:22.516007] E [MSGID: 101019] [xlator.c:486:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2017-05-26 09:16:22.516024] E [MSGID: 101066] [graph.c:324:glusterfs_graph_init] 0-management: initializing translator failed [2017-05-26 09:16:22.516037] E [MSGID: 101176] [graph.c:680:glusterfs_graph_activate] 0-graph: init failed [2017-05-26 09:16:22.516829] W [glusterfsd.c:1291:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x55c7fc85f9dd] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x1b1) [0x55c7fc85f881] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55c7fc85ed6b] ) 0-: received signum (1), shutting down
We have the same problem with GlusterFS 3.12 running under CentOS 7.4: root@int-gluster-01:~ # cat /etc/redhat-release CentOS Linux release 7.4.1708 (Core) root@int-gluster-01:~ # rpm -qa | grep gluster glusterfs-client-xlators-3.12.0-1.el7.x86_64 glusterfs-cli-3.12.0-1.el7.x86_64 glusterfs-3.12.0-1.el7.x86_64 glusterfs-fuse-3.12.0-1.el7.x86_64 glusterfs-api-3.12.0-1.el7.x86_64 glusterfs-server-3.12.0-1.el7.x86_64 gluster-nagios-addons-1.1.0-0.el7.centos.x86_64 gluster-nagios-common-1.1.0-0.el7.centos.noarch glusterfs-libs-3.12.0-1.el7.x86_6 root@int-gluster-03:~ # systemctl start glusterd Job for glusterd.service failed because the control process exited with error code. See "systemctl status glusterd.service" and "journalctl -xe" for details. root@int-gluster-03:~ # tail /var/log/glusterfs/glusterd.log [2017-09-14 04:30:33.223274] E [MSGID: 101176] [graph.c:698:glusterfs_graph_activate] 0-graph: init failed [2017-09-14 04:30:33.224212] W [glusterfsd.c:1347:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x561c65678dbd] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x163) [0x561c65678c63] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x561c6567818b] ) 0-: received signum (-1), shutting down [2017-09-14 04:30:35.363754] E [MSGID: 101032] [store.c:437:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/vols/askizzy_docker_dev_static/info. [No such file or directory] [2017-09-14 04:30:35.363862] E [MSGID: 106200] [glusterd-store.c:2837:glusterd_store_update_volinfo] 0-management: volinfo handle is NULL [2017-09-14 04:30:35.363890] E [MSGID: 106207] [glusterd-store.c:3140:glusterd_store_retrieve_volume] 0-management: Failed to update volinfo for askizzy_docker_dev_static volume [2017-09-14 04:30:35.363925] E [MSGID: 106201] [glusterd-store.c:3357:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: askizzy_docker_dev_static [2017-09-14 04:30:35.364026] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2017-09-14 04:30:35.364055] E [MSGID: 101066] [graph.c:327:glusterfs_graph_init] 0-management: initializing translator failed [2017-09-14 04:30:35.364078] E [MSGID: 101176] [graph.c:698:glusterfs_graph_activate] 0-graph: init failed [2017-09-14 04:30:35.365404] W [glusterfsd.c:1347:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x5564a8909dbd] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x163) [0x5564a8909c63] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x5564a890918b] ) 0-: received signum (-1), shutting down I'm then noticing a lot of SELinux denials: ---- time->Thu Sep 14 14:30:19 2017 type=PROCTITLE msg=audit(1505363419.360:13201): proctitle=2F7573722F7362696E2F676C757374657264002D70002F7661722F72756E2F676C7573746572642E706964002D2D6C6F672D6C6576656C005741524E494E47 type=SYSCALL msg=audit(1505363419.360:13201): arch=c000003e syscall=49 success=no exit=-13 a0=f a1=7ffc8241dfa0 a2=10 a3=22 items=0 ppid=24983 pid=24984 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="glusterd" exe="/usr/sbin/glusterfsd" subj=system_u:system_r:glusterd_t:s0 key=(null) type=AVC msg=audit(1505363419.360:13201): avc: denied { name_bind } for pid=24984 comm="glusterd" src=61000 scontext=system_u:system_r:glusterd_t:s0 tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket permissive=0 ---- time->Thu Sep 14 14:30:32 2017 type=PROCTITLE msg=audit(1505363432.477:13204): proctitle=2F7573722F7362696E2F676C757374657264002D70002F7661722F72756E2F676C7573746572642E706964002D2D6C6F672D6C6576656C005741524E494E47 type=SYSCALL msg=audit(1505363432.477:13204): arch=c000003e syscall=49 success=no exit=-13 a0=f a1=7ffddddb3720 a2=10 a3=22 items=0 ppid=25019 pid=25020 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="glusterd" exe="/usr/sbin/glusterfsd" subj=system_u:system_r:glusterd_t:s0 key=(null) type=AVC msg=audit(1505363432.477:13204): avc: denied { name_bind } for pid=25020 comm="glusterd" src=61000 scontext=system_u:system_r:glusterd_t:s0 tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket permissive=0 ---- time->Thu Sep 14 14:30:34 2017 type=PROCTITLE msg=audit(1505363434.531:13206): proctitle=2F7573722F7362696E2F676C757374657264002D70002F7661722F72756E2F676C7573746572642E706964002D2D6C6F672D6C6576656C005741524E494E47 type=SYSCALL msg=audit(1505363434.531:13206): arch=c000003e syscall=49 success=no exit=-13 a0=f a1=7ffc18923a80 a2=10 a3=22 items=0 ppid=25034 pid=25035 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="glusterd" exe="/usr/sbin/glusterfsd" subj=system_u:system_r:glusterd_t:s0 key=(null) type=AVC msg=audit(1505363434.531:13206): avc: denied { name_bind } for pid=25035 comm="glusterd" src=61000 scontext=system_u:system_r:glusterd_t:s0 tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket permissive=0 ----
(In reply to Sam McLeod from comment #6) > We have the same problem with GlusterFS 3.12 running under CentOS 7.4: > > root@int-gluster-01:~ # cat /etc/redhat-release > CentOS Linux release 7.4.1708 (Core) > > root@int-gluster-01:~ # rpm -qa | grep gluster > glusterfs-client-xlators-3.12.0-1.el7.x86_64 > glusterfs-cli-3.12.0-1.el7.x86_64 > glusterfs-3.12.0-1.el7.x86_64 > glusterfs-fuse-3.12.0-1.el7.x86_64 > glusterfs-api-3.12.0-1.el7.x86_64 > glusterfs-server-3.12.0-1.el7.x86_64 > gluster-nagios-addons-1.1.0-0.el7.centos.x86_64 > gluster-nagios-common-1.1.0-0.el7.centos.noarch > glusterfs-libs-3.12.0-1.el7.x86_6 > > > root@int-gluster-03:~ # systemctl start glusterd > Job for glusterd.service failed because the control process exited with > error code. See "systemctl status glusterd.service" and "journalctl -xe" for > details. > > root@int-gluster-03:~ # tail /var/log/glusterfs/glusterd.log > [2017-09-14 04:30:33.223274] E [MSGID: 101176] > [graph.c:698:glusterfs_graph_activate] 0-graph: init failed > [2017-09-14 04:30:33.224212] W [glusterfsd.c:1347:cleanup_and_exit] > (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x561c65678dbd] > -->/usr/sbin/glusterd(glusterfs_process_volfp+0x163) [0x561c65678c63] > -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x561c6567818b] ) 0-: received > signum (-1), shutting down > [2017-09-14 04:30:35.363754] E [MSGID: 101032] > [store.c:437:gf_store_handle_retrieve] 0-: Path corresponding to > /var/lib/glusterd/vols/askizzy_docker_dev_static/info. [No such file or > directory] The above log is the culprit. Apparently the info file of the volume is missing here. Have you by mistakenly deleted this fine? Technically, this file should never being nulled down by glusterd. As a workaround, you can copy the same file from other peer and restart the glusterd instance. > [2017-09-14 04:30:35.363862] E [MSGID: 106200] > [glusterd-store.c:2837:glusterd_store_update_volinfo] 0-management: volinfo > handle is NULL > [2017-09-14 04:30:35.363890] E [MSGID: 106207] > [glusterd-store.c:3140:glusterd_store_retrieve_volume] 0-management: Failed > to update volinfo for askizzy_docker_dev_static volume > [2017-09-14 04:30:35.363925] E [MSGID: 106201] > [glusterd-store.c:3357:glusterd_store_retrieve_volumes] 0-management: Unable > to restore volume: askizzy_docker_dev_static > [2017-09-14 04:30:35.364026] E [MSGID: 101019] [xlator.c:503:xlator_init] > 0-management: Initialization of volume 'management' failed, review your > volfile again > [2017-09-14 04:30:35.364055] E [MSGID: 101066] > [graph.c:327:glusterfs_graph_init] 0-management: initializing translator > failed > [2017-09-14 04:30:35.364078] E [MSGID: 101176] > [graph.c:698:glusterfs_graph_activate] 0-graph: init failed > [2017-09-14 04:30:35.365404] W [glusterfsd.c:1347:cleanup_and_exit] > (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x5564a8909dbd] > -->/usr/sbin/glusterd(glusterfs_process_volfp+0x163) [0x5564a8909c63] > -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x5564a890918b] ) 0-: received > signum (-1), shutting down > > > > I'm then noticing a lot of SELinux denials: > > > ---- > time->Thu Sep 14 14:30:19 2017 > type=PROCTITLE msg=audit(1505363419.360:13201): > proctitle=2F7573722F7362696E2F676C757374657264002D70002F7661722F72756E2F676C7 > 573746572642E706964002D2D6C6F672D6C6576656C005741524E494E47 > type=SYSCALL msg=audit(1505363419.360:13201): arch=c000003e syscall=49 > success=no exit=-13 a0=f a1=7ffc8241dfa0 a2=10 a3=22 items=0 ppid=24983 > pid=24984 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 > fsgid=0 tty=(none) ses=4294967295 comm="glusterd" exe="/usr/sbin/glusterfsd" > subj=system_u:system_r:glusterd_t:s0 key=(null) > type=AVC msg=audit(1505363419.360:13201): avc: denied { name_bind } for > pid=24984 comm="glusterd" src=61000 scontext=system_u:system_r:glusterd_t:s0 > tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket permissive=0 This is known. The issue seems like because of the following entry in /proc/sys/net/ipv4/ip_local_port_range 32768 60999 Here the upper cap of the local port is 60999, however gluster maintains it local port range for its portmap logic which is up to 65535 and now when on a glusterd restart portmap table is rebuilt through pmap_registry_new (), bind () fails for 61000 port. The good news is in glusterfs-3.12 we have introduced a way to tackle this. If you open glusterd.vol file and see a configuration called max-port, you just need to uncomment it and set it o 60999 in your case and restart glusterd. > ---- > time->Thu Sep 14 14:30:32 2017 > type=PROCTITLE msg=audit(1505363432.477:13204): > proctitle=2F7573722F7362696E2F676C757374657264002D70002F7661722F72756E2F676C7 > 573746572642E706964002D2D6C6F672D6C6576656C005741524E494E47 > type=SYSCALL msg=audit(1505363432.477:13204): arch=c000003e syscall=49 > success=no exit=-13 a0=f a1=7ffddddb3720 a2=10 a3=22 items=0 ppid=25019 > pid=25020 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 > fsgid=0 tty=(none) ses=4294967295 comm="glusterd" exe="/usr/sbin/glusterfsd" > subj=system_u:system_r:glusterd_t:s0 key=(null) > type=AVC msg=audit(1505363432.477:13204): avc: denied { name_bind } for > pid=25020 comm="glusterd" src=61000 scontext=system_u:system_r:glusterd_t:s0 > tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket permissive=0 > ---- > time->Thu Sep 14 14:30:34 2017 > type=PROCTITLE msg=audit(1505363434.531:13206): > proctitle=2F7573722F7362696E2F676C757374657264002D70002F7661722F72756E2F676C7 > 573746572642E706964002D2D6C6F672D6C6576656C005741524E494E47 > type=SYSCALL msg=audit(1505363434.531:13206): arch=c000003e syscall=49 > success=no exit=-13 a0=f a1=7ffc18923a80 a2=10 a3=22 items=0 ppid=25034 > pid=25035 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 > fsgid=0 tty=(none) ses=4294967295 comm="glusterd" exe="/usr/sbin/glusterfsd" > subj=system_u:system_r:glusterd_t:s0 key=(null) > type=AVC msg=audit(1505363434.531:13206): avc: denied { name_bind } for > pid=25035 comm="glusterd" src=61000 scontext=system_u:system_r:glusterd_t:s0 > tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket permissive=0 > ----
(In reply to Atin Mukherjee from comment #7) Thanks for your response Atin, > The above log is the culprit. Apparently the info file of the volume is > missing here. Have you by mistakenly deleted this fine? Technically, this > file should never being nulled down by glusterd. As a workaround, you can > copy the same file from other peer and restart the glusterd instance. What we've experienced is that if Gluster is interrupted while applying volume settings either by leaking memory resulting in OOM, by a user cancelling the operation or by a node being restarted when a background process may have been applying settings (e.g. automation etc...) then Gluster appears to leave info files in a partial / corrupted state which in turn seems to cause this error. Once the info file is incorrect / corrupted - what is the easiest way to regenerate it? Ideally I'd like to make that an automated process as there are a number of ways Gluster could be interrupted when managing the service at scale. At present we only have around 120 volumes on the cluster but another site will soon be commissioned with 300~ volumes and thus the risk of corruption will likely increase if Gluster can leave these info files partially complete. > > > [2017-09-14 04:30:35.363862] E [MSGID: 106200] > > [glusterd-store.c:2837:glusterd_store_update_volinfo] 0-management: volinfo > > handle is NULL > > [2017-09-14 04:30:35.363890] E [MSGID: 106207] > > [glusterd-store.c:3140:glusterd_store_retrieve_volume] 0-management: Failed > > to update volinfo for askizzy_docker_dev_static volume > > [2017-09-14 04:30:35.363925] E [MSGID: 106201] > > [glusterd-store.c:3357:glusterd_store_retrieve_volumes] 0-management: Unable > > to restore volume: askizzy_docker_dev_static > > [2017-09-14 04:30:35.364026] E [MSGID: 101019] [xlator.c:503:xlator_init] > > 0-management: Initialization of volume 'management' failed, review your > > volfile again > > [2017-09-14 04:30:35.364055] E [MSGID: 101066] > > [graph.c:327:glusterfs_graph_init] 0-management: initializing translator > > failed > > [2017-09-14 04:30:35.364078] E [MSGID: 101176] > > [graph.c:698:glusterfs_graph_activate] 0-graph: init failed > > [2017-09-14 04:30:35.365404] W [glusterfsd.c:1347:cleanup_and_exit] > > (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x5564a8909dbd] > > -->/usr/sbin/glusterd(glusterfs_process_volfp+0x163) [0x5564a8909c63] > > -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x5564a890918b] ) 0-: received > > signum (-1), shutting down > > > > > > > > I'm then noticing a lot of SELinux denials: > > > > > > ---- > > time->Thu Sep 14 14:30:19 2017 > > type=PROCTITLE msg=audit(1505363419.360:13201): > > proctitle=2F7573722F7362696E2F676C757374657264002D70002F7661722F72756E2F676C7 > > 573746572642E706964002D2D6C6F672D6C6576656C005741524E494E47 > > type=SYSCALL msg=audit(1505363419.360:13201): arch=c000003e syscall=49 > > success=no exit=-13 a0=f a1=7ffc8241dfa0 a2=10 a3=22 items=0 ppid=24983 > > pid=24984 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 > > fsgid=0 tty=(none) ses=4294967295 comm="glusterd" exe="/usr/sbin/glusterfsd" > > subj=system_u:system_r:glusterd_t:s0 key=(null) > > type=AVC msg=audit(1505363419.360:13201): avc: denied { name_bind } for > > pid=24984 comm="glusterd" src=61000 scontext=system_u:system_r:glusterd_t:s0 > > tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket permissive=0 > > This is known. The issue seems like because of the following entry in > /proc/sys/net/ipv4/ip_local_port_range > > 32768 60999 > > Here the upper cap of the local port is 60999, however gluster maintains it > local port range for its portmap logic which is up to 65535 and now when on > a glusterd restart portmap table is rebuilt through pmap_registry_new (), > bind () fails for 61000 port. I suspected it was that, so I watched the port usage while Gluster was running and it never used more than a few ports - perhaps I wasn't monitoring them properly. > The good news is in glusterfs-3.12 we have introduced a way to tackle this. > If you open glusterd.vol file and see a configuration called max-port, you > just need to uncomment it and set it o 60999 in your case and restart > glusterd. That's really useful, thanks for pointing that out!