Bug 1702316 - Cannot upgrade 5.x volume to 6.1 because of unused 'crypt' and 'bd' xlators
Summary: Cannot upgrade 5.x volume to 6.1 because of unused 'crypt' and 'bd' xlators
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-23 13:37 UTC by Rob de Wit
Modified: 2019-10-07 10:09 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-05-08 15:08:52 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github gluster glusterfs issues 665 0 'None' closed Provide noop xlators for 'crypt' and 'bd' 2020-09-29 11:01:02 UTC

Description Rob de Wit 2019-04-23 13:37:09 UTC
Description of problem: After upgrade from 5.3 to 6.1, gluster refuses to start bricks that apparently have 'crypt' and 'bd' xlators. None of these have been provided at creation and according to 'gluster get VOLUME all' they are not used.


Version-Release number of selected component (if applicable): 6.1



[2019-04-23 10:36:44.325141] I [MSGID: 100030] [glusterfsd.c:2849:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 6.1 (args: /usr/sbin/glusterd --pid-file=/run/glusterd.pid)
[2019-04-23 10:36:44.325505] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 31705
[2019-04-23 10:36:44.327314] I [MSGID: 106478] [glusterd.c:1422:init] 0-management: Maximum allowed open file descriptors set to 65536
[2019-04-23 10:36:44.327354] I [MSGID: 106479] [glusterd.c:1478:init] 0-management: Using /var/lib/glusterd as working directory
[2019-04-23 10:36:44.327363] I [MSGID: 106479] [glusterd.c:1484:init] 0-management: Using /var/run/gluster as pid file working directory
[2019-04-23 10:36:44.330126] I [socket.c:931:__socket_server_bind] 0-socket.management: process started listening on port (36203)
[2019-04-23 10:36:44.330258] E [rpc-transport.c:297:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/6.1/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2019-04-23 10:36:44.330267] W [rpc-transport.c:301:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2019-04-23 10:36:44.330274] W [rpcsvc.c:1985:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2019-04-23 10:36:44.330281] E [MSGID: 106244] [glusterd.c:1785:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2019-04-23 10:36:44.331976] I [socket.c:902:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 13
[2019-04-23 10:36:46.805843] I [MSGID: 106513] [glusterd-store.c:2394:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 50000
[2019-04-23 10:36:46.878878] I [MSGID: 106544] [glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID: 5104ed01-f959-4a82-bbd6-17d4dd177ec2
[2019-04-23 10:36:46.881463] E [mem-pool.c:351:__gf_free] (-->/usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so(+0x49190) [0x7fb0ecb64190] -->/usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so(+0x48f72) [0x7fb0ecb63f
72] -->/usr/lib64/libglusterfs.so.0(__gf_free+0x21d) [0x7fb0f25091dd] ) 0-: Assertion failed: mem_acct->rec[header->type].size >= header->size
[2019-04-23 10:36:46.908134] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2019-04-23 10:36:46.910052] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2019-04-23 10:36:46.910135] W [MSGID: 106061] [glusterd-handler.c:3472:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2019-04-23 10:36:46.910167] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-04-23 10:36:46.911425] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:  
+------------------------------------------------------------------------------+
[2019-04-23 10:36:46.911405] W [MSGID: 106061] [glusterd-handler.c:3472:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2019-04-23 10:36:46.914845] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-04-23 10:36:47.265981] I [MSGID: 106493] [glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: a6ff7d5b-1e8d-4cdc-97cf-4e03b89462a3, host: 10.10.0.25, port: 0
[2019-04-23 10:36:47.271481] I [glusterd-utils.c:6312:glusterd_brick_start] 0-management: starting a fresh brick process for brick /local.mnt/glfs/brick
[2019-04-23 10:36:47.273759] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-04-23 10:36:47.336220] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2019-04-23 10:36:47.336328] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: nfs already stopped
[2019-04-23 10:36:47.336383] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-04-23 10:36:47.336735] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2019-04-23 10:36:47.337733] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: glustershd already stopped
[2019-04-23 10:36:47.337755] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: glustershd service is stopped
[2019-04-23 10:36:47.337804] I [MSGID: 106567] [glusterd-svc-mgmt.c:220:glusterd_svc_start] 0-management: Starting glustershd service
[2019-04-23 10:36:48.340193] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2019-04-23 10:36:48.340446] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: quotad already stopped
[2019-04-23 10:36:48.340482] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: quotad service is stopped
[2019-04-23 10:36:48.340525] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2019-04-23 10:36:48.340662] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already stopped
[2019-04-23 10:36:48.340686] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is stopped
[2019-04-23 10:36:48.340721] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2019-04-23 10:36:48.340851] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already stopped
[2019-04-23 10:36:48.340865] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is stopped
[2019-04-23 10:36:48.340913] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2019-04-23 10:36:48.341005] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-gfproxyd: setting frame-timeout to 600
[2019-04-23 10:36:48.342056] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: a6ff7d5b-1e8d-4cdc-97cf-4e03b89462a3
[2019-04-23 10:36:48.342125] I [MSGID: 106493] [glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 88496e0c-298b-47ef-98a1-a884ca68d7d4, host: 10.10.0.208, port: 0
[2019-04-23 10:36:48.378690] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 88496e0c-298b-47ef-98a1-a884ca68d7d4
[2019-04-23 10:37:15.410095] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory
The message "W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory" repeated 2 times between [2019-04-23 10:37:15.410095] and [2019-04-23 10:37:15.410162]
[2019-04-23 10:37:15.417228] E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so: undefined symbol: xlator_api
The message "E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so: undefined symbol: xlator_api" repeated 7 times between [2019-04-23 10:37:15.417228] and [2019-04-23 10:37:15.417319]
[2019-04-23 10:37:15.449809] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/storage/bd.so: cannot open shared object file: No such file or directory
[2019-04-23 12:23:14.757482] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory
[2019-04-23 12:23:14.765810] E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so: undefined symbol: xlator_api
[2019-04-23 12:23:14.801394] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/storage/bd.so: cannot open shared object file: No such file or directory
The message "W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory" repeated 2 times between [2019-04-23 12:23:14.757482] and [2019-04-23 12:23:14.757578]
The message "E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so: undefined symbol: xlator_api" repeated 7 times between [2019-04-23 12:23:14.765810] and [2019-04-23 12:23:14.765864]
[2019-04-23 12:29:45.957524] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2019-04-23 12:30:06.917403] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
[2019-04-23 12:38:25.514866] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory
[2019-04-23 12:38:25.522473] E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so: undefined symbol: xlator_api
[2019-04-23 12:38:25.555952] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/storage/bd.so: cannot open shared object file: No such file or directory
The message "W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory" repeated 2 times between [2019-04-23 12:38:25.514866] and [2019-04-23 12:38:25.514931]
The message "E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so: undefined symbol: xlator_api" repeated 7 times between [2019-04-23 12:38:25.522473] and [2019-04-23 12:38:25.522545]
[2019-04-23 12:52:00.569988] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7504) [0x7fb0f1310504] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xd5) [0x409f45] -->/usr/sbin/glusterd(cleanup_and_exit+0x57) [0x409db7] ) 0-: received signum (15), shutting down




Option                                  Value                                   
------                                  -----                                   
cluster.lookup-unhashed                 on                                      
cluster.lookup-optimize                 on                                      
cluster.min-free-disk                   10%                                     
cluster.min-free-inodes                 5%                                      
cluster.rebalance-stats                 off                                     
cluster.subvols-per-directory           (null)                                  
cluster.readdir-optimize                on                                      
cluster.rsync-hash-regex                (null)                                  
cluster.extra-hash-regex                (null)                                  
cluster.dht-xattr-name                  trusted.glusterfs.dht                   
cluster.randomize-hash-range-by-gfid    off                                     
cluster.rebal-throttle                  normal                                  
cluster.lock-migration                  off                                     
cluster.force-migration                 off                                     
cluster.local-volume-name               (null)                                  
cluster.weighted-rebalance              on                                      
cluster.switch-pattern                  (null)                                  
cluster.entry-change-log                on                                      
cluster.read-subvolume                  (null)                                  
cluster.read-subvolume-index            -1                                      
cluster.read-hash-mode                  1                                       
cluster.background-self-heal-count      8                                       
cluster.metadata-self-heal              on                                      
cluster.data-self-heal                  on                                      
cluster.entry-self-heal                 on                                      
cluster.self-heal-daemon                enable                                  
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-change-log                 on                                      
cluster.metadata-change-log             on                                      
cluster.data-self-heal-algorithm        (null)                                  
cluster.eager-lock                      on                                      
disperse.eager-lock                     on                                      
disperse.other-eager-lock               on                                      
disperse.eager-lock-timeout             1                                       
disperse.other-eager-lock-timeout       1                                       
cluster.quorum-type                     auto                                    
cluster.quorum-count                    (null)                                  
cluster.choose-local                    true                                    
cluster.self-heal-readdir-size          1KB                                     
cluster.post-op-delay-secs              1                                       
cluster.ensure-durability               on                                      
cluster.consistent-metadata             no                                      
cluster.heal-wait-queue-length          128                                     
cluster.favorite-child-policy           none                                    
cluster.full-lock                       yes                                     
cluster.stripe-block-size               128KB                                   
cluster.stripe-coalesce                 true                                    
diagnostics.latency-measurement         off                                     
diagnostics.dump-fd-stats               off                                     
diagnostics.count-fop-hits              off                                     
diagnostics.brick-log-level             CRITICAL                                
diagnostics.client-log-level            CRITICAL                                
diagnostics.brick-sys-log-level         CRITICAL                                
diagnostics.client-sys-log-level        CRITICAL                                
diagnostics.brick-logger                (null)                                  
diagnostics.client-logger               (null)                                  
diagnostics.brick-log-format            (null)                                  
diagnostics.client-log-format           (null)                                  
diagnostics.brick-log-buf-size          5                                       
diagnostics.client-log-buf-size         5                                       
diagnostics.brick-log-flush-timeout     120                                     
diagnostics.client-log-flush-timeout    120                                     
diagnostics.stats-dump-interval         0                                       
diagnostics.fop-sample-interval         0                                       
diagnostics.stats-dump-format           json                                    
diagnostics.fop-sample-buf-size         65535                                   
diagnostics.stats-dnscache-ttl-sec      86400                                   
performance.cache-max-file-size         0                                       
performance.cache-min-file-size         0                                       
performance.cache-refresh-timeout       1                                       
performance.cache-priority                                                      
performance.cache-size                  32MB                                    
performance.io-thread-count             16                                      
performance.high-prio-threads           16                                      
performance.normal-prio-threads         16                                      
performance.low-prio-threads            16                                      
performance.least-prio-threads          1                                       
performance.enable-least-priority       on                                      
performance.iot-watchdog-secs           (null)                                  
performance.iot-cleanup-disconnected-reqsoff                                     
performance.iot-pass-through            false                                   
performance.io-cache-pass-through       false                                   
performance.cache-size                  128MB                                   
performance.qr-cache-timeout            1                                       
performance.cache-invalidation          on                                      
performance.ctime-invalidation          false                                   
performance.flush-behind                on                                      
performance.nfs.flush-behind            on                                      
performance.write-behind-window-size    1MB                                     
performance.resync-failed-syncs-after-fsyncoff                                     
performance.nfs.write-behind-window-size1MB                                     
performance.strict-o-direct             off                                     
performance.nfs.strict-o-direct         off                                     
performance.strict-write-ordering       off                                     
performance.nfs.strict-write-ordering   off                                     
performance.write-behind-trickling-writeson                                      
performance.aggregate-size              128KB                                   
performance.nfs.write-behind-trickling-writeson                                      
performance.lazy-open                   yes                                     
performance.read-after-open             yes                                     
performance.open-behind-pass-through    false                                   
performance.read-ahead-page-count       4                                       
performance.read-ahead-pass-through     false                                   
performance.readdir-ahead-pass-through  false                                   
performance.md-cache-pass-through       false                                   
performance.md-cache-timeout            600                                     
performance.cache-swift-metadata        true                                    
performance.cache-samba-metadata        false                                   
performance.cache-capability-xattrs     true                                    
performance.cache-ima-xattrs            true                                    
performance.md-cache-statfs             off                                     
performance.xattr-cache-list                                                    
performance.nl-cache-pass-through       false                                   
features.encryption                     off                                     
encryption.master-key                   (null)                                  
encryption.data-key-size                256                                     
encryption.block-size                   4096                                    
network.frame-timeout                   1800                                    
network.ping-timeout                    42                                      
network.tcp-window-size                 (null)                                  
network.remote-dio                      disable                                 
client.event-threads                    2                                       
client.tcp-user-timeout                 0                                       
client.keepalive-time                   20                                      
client.keepalive-interval               2                                       
client.keepalive-count                  9                                       
network.tcp-window-size                 (null)                                  
network.inode-lru-limit                 200000                                  
auth.allow                              *                                       
auth.reject                             (null)                                  
transport.keepalive                     1                                       
server.allow-insecure                   on                                      
server.root-squash                      off                                     
server.anonuid                          65534                                   
server.anongid                          65534                                   
server.statedump-path                   /var/run/gluster                        
server.outstanding-rpc-limit            64                                      
server.ssl                              (null)                                  
auth.ssl-allow                          *                                       
server.manage-gids                      off                                     
server.dynamic-auth                     on                                      
client.send-gids                        on                                      
server.gid-timeout                      300                                     
server.own-thread                       (null)                                  
server.event-threads                    1                                       
server.tcp-user-timeout                 0                                       
server.keepalive-time                   20                                      
server.keepalive-interval               2                                       
server.keepalive-count                  9                                       
transport.listen-backlog                1024                                    
ssl.own-cert                            (null)                                  
ssl.private-key                         (null)                                  
ssl.ca-list                             (null)                                  
ssl.crl-path                            (null)                                  
ssl.certificate-depth                   (null)                                  
ssl.cipher-list                         (null)                                  
ssl.dh-param                            (null)                                  
ssl.ec-curve                            (null)                                  
transport.address-family                inet                                    
performance.write-behind                on                                      
performance.read-ahead                  on                                      
performance.readdir-ahead               on                                      
performance.io-cache                    on                                      
performance.quick-read                  on                                      
performance.open-behind                 on                                      
performance.nl-cache                    off                                     
performance.stat-prefetch               on                                      
performance.client-io-threads           off                                     
performance.nfs.write-behind            on                                      
performance.nfs.read-ahead              off                                     
performance.nfs.io-cache                off                                     
performance.nfs.quick-read              off                                     
performance.nfs.stat-prefetch           off                                     
performance.nfs.io-threads              off                                     
performance.force-readdirp              true                                    
performance.cache-invalidation          on                                      
features.uss                            off                                     
features.snapshot-directory             .snaps                                  
features.show-snapshot-directory        off                                     
features.tag-namespaces                 off                                     
network.compression                     off                                     
network.compression.window-size         -15                                     
network.compression.mem-level           8                                       
network.compression.min-size            0                                       
network.compression.compression-level   -1                                      
network.compression.debug               false                                   
features.default-soft-limit             80%                                     
features.soft-timeout                   60                                      
features.hard-timeout                   5                                       
features.alert-time                     86400                                   
features.quota-deem-statfs              off                                     
geo-replication.indexing                off                                     
geo-replication.indexing                off                                     
geo-replication.ignore-pid-check        off                                     
geo-replication.ignore-pid-check        off                                     
features.quota                          off                                     
features.inode-quota                    off                                     
features.bitrot                         disable                                 
debug.trace                             off                                     
debug.log-history                       no                                      
debug.log-file                          no                                      
debug.exclude-ops                       (null)                                  
debug.include-ops                       (null)                                  
debug.error-gen                         off                                     
debug.error-failure                     (null)                                  
debug.error-number                      (null)                                  
debug.random-failure                    off                                     
debug.error-fops                        (null)                                  
nfs.enable-ino32                        no                                      
nfs.mem-factor                          15                                      
nfs.export-dirs                         on                                      
nfs.export-volumes                      on                                      
nfs.addr-namelookup                     off                                     
nfs.dynamic-volumes                     off                                     
nfs.register-with-portmap               on                                      
nfs.outstanding-rpc-limit               16                                      
nfs.port                                2049                                    
nfs.rpc-auth-unix                       on                                      
nfs.rpc-auth-null                       on                                      
nfs.rpc-auth-allow                      all                                     
nfs.rpc-auth-reject                     none                                    
nfs.ports-insecure                      off                                     
nfs.trusted-sync                        off                                     
nfs.trusted-write                       off                                     
nfs.volume-access                       read-write                              
nfs.export-dir                                                                  
nfs.disable                             on                                      
nfs.nlm                                 on                                      
nfs.acl                                 on                                      
nfs.mount-udp                           off                                     
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab             
nfs.rpc-statd                           /sbin/rpc.statd                         
nfs.server-aux-gids                     off                                     
nfs.drc                                 off                                     
nfs.drc-size                            0x20000                                 
nfs.read-size                           (1 * 1048576ULL)                        
nfs.write-size                          (1 * 1048576ULL)                        
nfs.readdir-size                        (1 * 1048576ULL)                        
nfs.rdirplus                            on                                      
nfs.event-threads                       1                                       
nfs.exports-auth-enable                 (null)                                  
nfs.auth-refresh-interval-sec           (null)                                  
nfs.auth-cache-ttl-sec                  (null)                                  
features.read-only                      off                                     
features.worm                           off                                     
features.worm-file-level                off                                     
features.worm-files-deletable           on                                      
features.default-retention-period       120                                     
features.retention-mode                 relax                                   
features.auto-commit-period             180                                     
storage.linux-aio                       off                                     
storage.batch-fsync-mode                reverse-fsync                           
storage.batch-fsync-delay-usec          0                                       
storage.owner-uid                       -1                                      
storage.owner-gid                       -1                                      
storage.node-uuid-pathinfo              off                                     
storage.health-check-interval           30                                      
storage.build-pgfid                     off                                     
storage.gfid2path                       on                                      
storage.gfid2path-separator             :                                       
storage.reserve                         1                                       
storage.health-check-timeout            10                                      
storage.fips-mode-rchecksum             off                                     
storage.force-create-mode               0000                                    
storage.force-directory-mode            0000                                    
storage.create-mask                     0777                                    
storage.create-directory-mask           0777                                    
storage.max-hardlinks                   100                                     
storage.ctime                           off                                     
storage.bd-aio                          off                                     
config.gfproxyd                         off                                     
cluster.server-quorum-type              off                                     
cluster.server-quorum-ratio             0                                       
changelog.changelog                     off                                     
changelog.changelog-dir                 {{ brick.path }}/.glusterfs/changelogs  
changelog.encoding                      ascii                                   
changelog.rollover-time                 15                                      
changelog.fsync-interval                5                                       
changelog.changelog-barrier-timeout     120                                     
changelog.capture-del-path              off                                     
features.barrier                        disable                                 
features.barrier-timeout                120                                     
features.trash                          off                                     
features.trash-dir                      .trashcan                               
features.trash-eliminate-path           (null)                                  
features.trash-max-filesize             5MB                                     
features.trash-internal-op              off                                     
cluster.enable-shared-storage           disable                                 
locks.trace                             off                                     
locks.mandatory-locking                 off                                     
cluster.disperse-self-heal-daemon       enable                                  
cluster.quorum-reads                    no                                      
client.bind-insecure                    (null)                                  
features.timeout                        45                                      
features.failover-hosts                 (null)                                  
features.shard                          off                                     
features.shard-block-size               64MB                                    
features.shard-lru-limit                16384                                   
features.shard-deletion-rate            100                                     
features.scrub-throttle                 lazy                                    
features.scrub-freq                     biweekly                                
features.scrub                          false                                   
features.expiry-time                    120                                     
features.cache-invalidation             on                                      
features.cache-invalidation-timeout     600                                     
features.leases                         off                                     
features.lease-lock-recall-timeout      60                                      
disperse.background-heals               8                                       
disperse.heal-wait-qlength              128                                     
cluster.heal-timeout                    600                                     
dht.force-readdirp                      on                                      
disperse.read-policy                    gfid-hash                               
cluster.shd-max-threads                 1                                       
cluster.shd-wait-qlength                1024                                    
cluster.locking-scheme                  full                                    
cluster.granular-entry-heal             no                                      
features.locks-revocation-secs          0                                       
features.locks-revocation-clear-all     false                                   
features.locks-revocation-max-blocked   0                                       
features.locks-monkey-unlocking         false                                   
features.locks-notify-contention        no                                      
features.locks-notify-contention-delay  5                                       
disperse.shd-max-threads                1                                       
disperse.shd-wait-qlength               1024                                    
disperse.cpu-extensions                 auto                                    
disperse.self-heal-window-size          1                                       
cluster.use-compound-fops               off                                     
performance.parallel-readdir            off                                     
performance.rda-request-size            131072                                  
performance.rda-low-wmark               4096                                    
performance.rda-high-wmark              128KB                                   
performance.rda-cache-limit             10MB                                    
performance.nl-cache-positive-entry     false                                   
performance.nl-cache-limit              10MB                                    
performance.nl-cache-timeout            60                                      
cluster.brick-multiplex                 off                                     
cluster.max-bricks-per-process          0                                       
disperse.optimistic-change-log          on                                      
disperse.stripe-cache                   4                                       
cluster.halo-enabled                    False                                   
cluster.halo-shd-max-latency            99999                                   
cluster.halo-nfsd-max-latency           5                                       
cluster.halo-max-latency                5                                       
cluster.halo-max-replicas               99999                                   
cluster.halo-min-replicas               2                                       
cluster.daemon-log-level                INFO                                    
debug.delay-gen                         off                                     
delay-gen.delay-percentage              10%                                     
delay-gen.delay-duration                100000                                  
delay-gen.enable                                                                
disperse.parallel-writes                on                                      
features.sdfs                           on                                      
features.cloudsync                      off                                     
features.utime                          off                                     
ctime.noatime                           on                                      
feature.cloudsync-storetype             (null)

Comment 1 Renich Bon Ciric 2019-04-23 19:43:57 UTC
Geo-replication is failing as well due to this:

==> cli.log <==
[2019-04-23 19:37:29.048169] I [cli.c:845:main] 0-cli: Started running gluster with version 6.1
[2019-04-23 19:37:29.108778] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-04-23 19:37:29.109073] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1

==> cmd_history.log <==
[2019-04-23 19:37:30.341565]  : volume geo-replication mariadb 11.22.33.44::mariadb create push-pem : FAILED : Passwordless ssh login has not been setup with 11.22.33.44 for user root.

==> cli.log <==
[2019-04-23 19:37:30.341932] I [input.c:31:cli_batch] 0-: Exiting with: -1

==> glusterd.log <==
The message "W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory" repeated 2 times between [2019-04-23 19:36:27.419582] and [2019-04-23 19:36:27.419641]
The message "E [MSGID: 106316] [glusterd-geo-rep.c:2890:glusterd_verify_slave] 0-management: Not a valid slave" repeated 2 times between [2019-04-23 19:35:42.340661] and [2019-04-23 19:37:30.340518]
The message "E [MSGID: 106316] [glusterd-geo-rep.c:3282:glusterd_op_stage_gsync_create] 0-management: 11.22.33.44::mariadb is not a valid slave volume. Error: Passwordless ssh login has not been setup with 11.22.33.44 for user root." repeated 2 times between [2019-04-23 19:35:42.340803] and [2019-04-23 19:37:30.340611]
The message "E [MSGID: 106301] [glusterd-syncop.c:1317:gd_stage_op_phase] 0-management: Staging of operation 'Volume Geo-replication Create' failed on localhost : Passwordless ssh login has not been setup with 11.22.33.44 for user root." repeated 2 times between [2019-04-23 19:35:42.340842] and [2019-04-23 19:37:30.340618]

Comment 2 Atin Mukherjee 2019-05-08 04:30:17 UTC
I don't think the upgrade failure or the geo-replication session issue is due to the missing xlators what you highlighted in the report.

If you notice the following log snippet, the cleanup_and_exit which is a shutdown trigger of glusterd happened much later than the logs which complaint about the missing xlators and I can confirm that they are benign.

[2019-04-23 12:38:25.514866] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory
[2019-04-23 12:38:25.522473] E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so: undefined symbol: xlator_api
[2019-04-23 12:38:25.555952] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/storage/bd.so: cannot open shared object file: No such file or directory
The message "W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.1/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory" repeated 2 times between [2019-04-23 12:38:25.514866] and [2019-04-23 12:38:25.514931]
The message "E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.1/rpc-transport/socket.so: undefined symbol: xlator_api" repeated 7 times between [2019-04-23 12:38:25.522473] and [2019-04-23 12:38:25.522545]

################################# There's a gap of ~14 minutes here ###################################################

[2019-04-23 12:52:00.569988] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7504) [0x7fb0f1310504] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xd5) [0x409f45] -->/usr/sbin/glusterd(cleanup_and_exit+0x57) [0x409db7] ) 0-: received signum (15), shutting down

You'd need to provide us the brick logs along with glusterd logs, gluster volume status and gluster get-state output from the node where you see this happening.

Related to geo-rep failures, I'd suggest you to file a different bug once this stabilises.

Comment 3 Rob de Wit 2019-05-08 07:45:19 UTC
Hi,

I tried upgrading one of the nodes again:

1) shutdown glusterd 5.6
2) install 6.1
3) start glusterd 6.1
4) no working brick
5) shutdown glusterd 6.1
6) downgrade to 5.6
7) start glusterd 5.6
8) brick is working fine again


The volume status is showing only the other nodes as the node running 6.1 is failing the brick process:


=== START volume status ===
Status of volume: jf-vol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.0.25:/local.mnt/glfs/brick      49153     0          Y       20952
Brick 10.10.0.208:/local.mnt/glfs/brick     49153     0          Y       29631
Self-heal Daemon on localhost               N/A       N/A        Y       3487 
Self-heal Daemon on 10.10.0.208             N/A       N/A        Y       27031
 
Task Status of Volume jf-vol0
------------------------------------------------------------------------------
There are no active volume tasks
=== END volume status ===


=== START glusterd.log ===
[2019-05-08 07:23:26.043605] I [MSGID: 100030] [glusterfsd.c:2849:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 6.1 (args: /usr/sbin/glusterd --pid-file=/run/glusterd.pid)
[2019-05-08 07:23:26.044499] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 21399
[2019-05-08 07:23:26.047235] I [MSGID: 106478] [glusterd.c:1422:init] 0-management: Maximum allowed open file descriptors set to 65536
[2019-05-08 07:23:26.047270] I [MSGID: 106479] [glusterd.c:1478:init] 0-management: Using /var/lib/glusterd as working directory
[2019-05-08 07:23:26.047284] I [MSGID: 106479] [glusterd.c:1484:init] 0-management: Using /var/run/gluster as pid file working directory
[2019-05-08 07:23:26.051068] I [socket.c:931:__socket_server_bind] 0-socket.management: process started listening on port (44950)
[2019-05-08 07:23:26.051268] E [rpc-transport.c:297:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/6.1/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2019-05-08 07:23:26.051282] W [rpc-transport.c:301:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2019-05-08 07:23:26.051292] W [rpcsvc.c:1985:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2019-05-08 07:23:26.051302] E [MSGID: 106244] [glusterd.c:1785:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2019-05-08 07:23:26.053127] I [socket.c:902:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 13
[2019-05-08 07:23:28.584285] I [MSGID: 106513] [glusterd-store.c:2394:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 50000
[2019-05-08 07:23:28.650177] I [MSGID: 106544] [glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID: 5104ed01-f959-4a82-bbd6-17d4dd177ec2
[2019-05-08 07:23:28.656448] E [mem-pool.c:351:__gf_free] (-->/usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so(+0x49190) [0x7fa26784e190] -->/usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so(+0x48f72) [0x7fa26784df72] -->/usr/lib64/libglusterfs.so.0(__gf_free+0x21d) [0x7fa26d1f31dd] ) 0-: Assertion failed: mem_acct->rec[header->type].size >= header->size
[2019-05-08 07:23:28.683589] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2019-05-08 07:23:28.686748] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2019-05-08 07:23:28.686787] W [MSGID: 106061] [glusterd-handler.c:3472:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2019-05-08 07:23:28.686819] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-05-08 07:23:28.687629] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:
+------------------------------------------------------------------------------+
[2019-05-08 07:23:28.687625] W [MSGID: 106061] [glusterd-handler.c:3472:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2019-05-08 07:23:28.689771] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-05-08 07:23:29.388437] I [MSGID: 106493] [glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 88496e0c-298b-47ef-98a1-a884ca68d7d4, host: 10.10.0.208, port: 0
[2019-05-08 07:23:29.393409] I [glusterd-utils.c:6312:glusterd_brick_start] 0-management: starting a fresh brick process for brick /local.mnt/glfs/brick
[2019-05-08 07:23:29.395426] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-05-08 07:23:29.460728] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2019-05-08 07:23:29.460868] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: nfs already stopped
[2019-05-08 07:23:29.460911] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-05-08 07:23:29.461360] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2019-05-08 07:23:29.462857] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: glustershd already stopped
[2019-05-08 07:23:29.462902] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: glustershd service is stopped
[2019-05-08 07:23:29.462959] I [MSGID: 106567] [glusterd-svc-mgmt.c:220:glusterd_svc_start] 0-management: Starting glustershd service
[2019-05-08 07:23:30.465107] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2019-05-08 07:23:30.465293] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: quotad already stopped
[2019-05-08 07:23:30.465314] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: quotad service is stopped
[2019-05-08 07:23:30.465351] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2019-05-08 07:23:30.465477] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already stopped
[2019-05-08 07:23:30.465489] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is stopped
[2019-05-08 07:23:30.465517] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2019-05-08 07:23:30.465633] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already stopped
[2019-05-08 07:23:30.465645] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is stopped
[2019-05-08 07:23:30.465689] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2019-05-08 07:23:30.465772] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-gfproxyd: setting frame-timeout to 600
[2019-05-08 07:23:30.466776] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 88496e0c-298b-47ef-98a1-a884ca68d7d4
[2019-05-08 07:23:30.466822] I [MSGID: 106493] [glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: a6ff7d5b-1e8d-4cdc-97cf-4e03b89462a3, host: 10.10.0.25, port: 0
[2019-05-08 07:23:30.490461] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: a6ff7d5b-1e8d-4cdc-97cf-4e03b89462a3
[2019-05-08 07:23:47.540967] I [MSGID: 106584] [glusterd-handler.c:5995:__glusterd_handle_get_state] 0-management: Received request to get state for glusterd
[2019-05-08 07:23:47.541003] I [MSGID: 106061] [glusterd-handler.c:5517:glusterd_get_state] 0-management: Default output directory: /var/run/gluster/
[2019-05-08 07:23:47.541052] I [MSGID: 106061] [glusterd-handler.c:5553:glusterd_get_state] 0-management: Default filename: glusterd_state_20190508_092347
=== END glusterd.log ===


=== START glustershd.log ===
[2019-05-08 07:23:29.465963] I [MSGID: 100030] [glusterfsd.c:2849:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 6.1 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/dc47fa45e83d2326.socket --xlator-option *replicate*.node-uuid=5104ed01-f959-4a82-bbd6-17d4dd177ec2 --process-name glustershd --client-pid=-6)
[2019-05-08 07:23:29.466783] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 29165
[2019-05-08 07:23:29.469726] I [socket.c:902:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 10
[2019-05-08 07:23:29.471280] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-05-08 07:23:29.471317] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2019-05-08 07:23:29.471326] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2019-05-08 07:23:29.471518] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-05-08 07:23:29.471540] W [glusterfsd.c:1570:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(+0xe7b3) [0x7f8e5adb37b3] -->/usr/sbin/glusterfs() [0x411629] -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x409db7] ) 0-: received signum (1), shutting down
=== END glustershd.log ===


=== START local.mnt-glfs-brick.log ===
[2019-05-08 07:23:29.396753] I [MSGID: 100030] [glusterfsd.c:2849:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 6.1 (args: /usr/sbin/glusterfsd -s 10.10.0.177 --volfile-id jf-vol0.10.10.0.177.local.mnt-glfs-brick -p /var/run/gluster/vols/jf-vol0/10.10.0.177-local.mnt-glfs-brick.pid -S /var/run/gluster/ccdac309d72f1df7.socket --brick-name /local.mnt/glfs/brick -l /var/log/glusterfs/bricks/local.mnt-glfs-brick.log --xlator-option *-posix.glusterd-uuid=5104ed01-f959-4a82-bbd6-17d4dd177ec2 --process-name brick --brick-port 49153 --xlator-option jf-vol0-server.listen-port=49153)
[2019-05-08 07:23:29.397519] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 28996
[2019-05-08 07:23:29.400575] I [socket.c:902:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 10
[2019-05-08 07:23:29.401901] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-05-08 07:23:29.402622] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-05-08 07:23:29.402631] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: 10.10.0.177
[2019-05-08 07:23:29.402649] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2019-05-08 07:23:29.402770] W [glusterfsd.c:1570:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(+0xe7b3) [0x7fe46b1f77b3] -->/usr/sbin/glusterfsd() [0x411629] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x57) [0x409db7] ) 0-: received signum (1), shutting down
[2019-05-08 07:23:29.403338] I [socket.c:3754:socket_submit_outgoing_msg] 0-glusterfs: not connected (priv->connected = 0)
[2019-05-08 07:23:29.403353] W [rpc-clnt.c:1704:rpc_clnt_submit] 0-glusterfs: failed to submit rpc-request (unique: 0, XID: 0x2 Program: Gluster Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)
[2019-05-08 07:23:29.403420] W [glusterfsd.c:1570:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(+0xe7b3) [0x7fe46b1f77b3] -->/usr/sbin/glusterfsd() [0x411629] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x57) [0x409db7] ) 0-: received signum (1), shutting down
=== END local.mnt-glfs-brick.log ===


=== START glusterd_state_20190508_092347 ===
[Global]
MYUUID: 5104ed01-f959-4a82-bbd6-17d4dd177ec2
op-version: 50000

[Global options]

[Peers]
Peer1.primary_hostname: 10.10.0.208
Peer1.uuid: 88496e0c-298b-47ef-98a1-a884ca68d7d4
Peer1.state: Peer in Cluster
Peer1.connected: Connected
Peer1.othernames:
Peer2.primary_hostname: 10.10.0.25
Peer2.uuid: a6ff7d5b-1e8d-4cdc-97cf-4e03b89462a3
Peer2.state: Peer in Cluster
Peer2.connected: Connected
Peer2.othernames:

[Volumes]
Volume1.name: jf-vol0
Volume1.id: f90d35dd-b2a4-461b-9ae9-dcfc68dac322
Volume1.type: Replicate
Volume1.transport_type: tcp
Volume1.status: Started
Volume1.profile_enabled: 0
Volume1.brickcount: 3
Volume1.Brick1.path: 10.10.0.177:/local.mnt/glfs/brick
Volume1.Brick1.hostname: 10.10.0.177
Volume1.Brick1.port: 49153
Volume1.Brick1.rdma_port: 0
Volume1.Brick1.port_registered: 0
Volume1.Brick1.status: Stopped
Volume1.Brick1.spacefree: 1891708428288Bytes
Volume1.Brick1.spacetotal: 1891966050304Bytes
Volume1.Brick2.path: 10.10.0.25:/local.mnt/glfs/brick
Volume1.Brick2.hostname: 10.10.0.25
Volume1.Brick3.path: 10.10.0.208:/local.mnt/glfs/brick
Volume1.Brick3.hostname: 10.10.0.208
Volume1.snap_count: 0
Volume1.stripe_count: 1
Volume1.replica_count: 3
Volume1.subvol_count: 1
Volume1.arbiter_count: 0
Volume1.disperse_count: 0
Volume1.redundancy_count: 0
Volume1.quorum_status: not_applicable
Volume1.snapd_svc.online_status: Offline
Volume1.snapd_svc.inited: True
Volume1.rebalance.id: 00000000-0000-0000-0000-000000000000
Volume1.rebalance.status: not_started
Volume1.rebalance.failures: 0
Volume1.rebalance.skipped: 0
Volume1.rebalance.lookedup: 0
Volume1.rebalance.files: 0
Volume1.rebalance.data: 0Bytes
Volume1.time_left: 0
Volume1.gsync_count: 0
Volume1.options.cluster.readdir-optimize: on
Volume1.options.cluster.self-heal-daemon: enable
Volume1.options.cluster.lookup-optimize: on
Volume1.options.network.inode-lru-limit: 200000
Volume1.options.performance.md-cache-timeout: 600
Volume1.options.performance.cache-invalidation: on
Volume1.options.performance.stat-prefetch: on
Volume1.options.features.cache-invalidation-timeout: 600
Volume1.options.features.cache-invalidation: on
Volume1.options.diagnostics.brick-sys-log-level: INFO
Volume1.options.diagnostics.brick-log-level: INFO
Volume1.options.diagnostics.client-log-level: INFO
Volume1.options.transport.address-family: inet
Volume1.options.nfs.disable: on
Volume1.options.performance.client-io-threads: off


[Services]
svc1.name: glustershd
svc1.online_status: Offline

svc2.name: nfs
svc2.online_status: Offline

svc3.name: bitd
svc3.online_status: Offline

svc4.name: scrub
svc4.online_status: Offline

svc5.name: quotad
svc5.online_status: Offline


[Misc]
Base port: 49152
Last allocated port: 49153
=== END glusterd_state_20190508_092347 ===

Comment 4 Atin Mukherjee 2019-05-08 14:02:11 UTC
[2019-05-08 07:23:29.471317] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2019-05-08 07:23:29.471326] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers

The above two logs from the brick log file are the cause. It appears that brick is unable to talk to glusterd.

Could you please check what's the content of glusterd.vol file in this node (please locate the file and do paste the 'cat glusterd.vol' output) ? Do you see an entry 'option transport.socket.listen-port 24007' in the glusterd.vol file? If not, could you add that, restart the node and see if that makes any difference?

Comment 5 Rob de Wit 2019-05-08 15:08:52 UTC
That was it!

The brick now starts up OK.

Thanks a lot!

=== START glusterd.vol ===
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off

# Adding this line made it work:   
    option transport.socket.listen-port 24007

    option ping-timeout 0
    option event-threads 1
#   option transport.address-family inet6
#   option base-port 49152
end-volume
=== END glusterd.vol ===

Comment 6 Dmitry Melekhov 2019-08-19 10:08:51 UTC
Just got the same problem during upgrade from 5 to 6 and the same solution.
It is not clear for me why it is closed as not a bug.
There is nothing about it in 6 release notes, so it should work with default values.
Thank you!

Comment 7 lejeczek 2019-10-07 10:09:56 UTC
I also see this with glusterfs-6.5-1.el7.x86_64 on Centos 7.7

[2019-10-07 09:17:37.071409] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/6.5/xlator/mgmt/glusterd.so(+0xe8faa) [0x7fd6204d3faa] -->/usr/lib64/glusterfs/6.5/xlator/mgmt/glusterd.so(+0xe8a75) [0x7fd6204d3a75] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd62c360495] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=IT-RELATED --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-10-07 09:17:37.099416] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/6.5/xlator/mgmt/glusterd.so(+0xe8faa) [0x7fd6204d3faa] -->/usr/lib64/glusterfs/6.5/xlator/mgmt/glusterd.so(+0xe8a75) [0x7fd6204d3a75] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd62c360495] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=IT-RELATED --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-10-07 09:42:26.314045] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.5/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory
[2019-10-07 09:42:26.328413] E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.5/rpc-transport/socket.so: undefined symbol: xlator_api
[2019-10-07 09:42:26.330640] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.5/xlator/nfs/server.so: cannot open shared object file: No such file or directory
[2019-10-07 09:42:26.348399] W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.5/xlator/storage/bd.so: cannot open shared object file: No such file or directory
The message "W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.5/xlator/encryption/crypt.so: cannot open shared object file: No such file or directory" repeated 2 times between [2019-10-07 09:42:26.314045] and [2019-10-07 09:42:26.314307]
The message "E [MSGID: 101097] [xlator.c:218:xlator_volopt_dynload] 0-xlator: dlsym(xlator_api) missing: /usr/lib64/glusterfs/6.5/rpc-transport/socket.so: undefined symbol: xlator_api" repeated 7 times between [2019-10-07 09:42:26.328413] and [2019-10-07 09:42:26.328590]
The message "W [MSGID: 101095] [xlator.c:210:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/6.5/xlator/nfs/server.so: cannot open shared object file: No such file or directory" repeated 30 times between [2019-10-07 09:42:26.330640] and [2019-10-07 09:42:26.331499]


Note You need to log in before you can comment on or make changes to this bug.