Bug 1694139 - Error waiting for job 'heketi-storage-copy-job' to complete on one-node k3s deployment.
Summary: Error waiting for job 'heketi-storage-copy-job' to complete on one-node k3s d...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 4.1
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Raghavendra Talur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-29 15:46 UTC by it.sergm
Modified: 2019-10-28 22:15 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-28 22:15:31 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description it.sergm 2019-03-29 15:46:29 UTC
Description of problem:
Deploying k3s with gluster in single-node deployment. Solution worked with kubernetes, but not working with k3s(https://github.com/rancher/k3s)


Version-Release number of selected component (if applicable):
gluster-kubernetes 1.2.0(https://github.com/gluster/gluster-kubernetes.git)
k3s - tested with v0.2.0 and v0.3.0-rc4 (https://github.com/rancher/k3s)
Glusterfs package - tested with 3.8.x, 3.12.x and 3.13.2
OS - tested with Ubuntu 16.04.6(4.4.0-143-generic) and Ubuntu 18.04.2(4.15.0-46-generic) with `apt full-upgrade` applied.


Steps to Reproduce:
1. install and configure k3s.
# make sure hostname included in /etc/hosts with relevant ip
git clone --depth 1 https://github.com/rancher/k3s.git
cd k3s; sh install.sh 

# Label node:
kubectl label node k3s-gluster node-role.kubernetes.io/master=""


2. pre-configuring gluster.
# install packages needed for gluster
apt -y install thin-provisioning-tools glusterfs-client

# required modules
cat << 'EOF' > /etc/modules-load.d/kubernetes-glusterfs.conf
# this module is required for glusterfs deployment on kubernetes
dm_thin_pool
EOF

## load the module
modprobe dm_thin_pool

# get the gk-deploy code
cd $HOME
mkdir src
cd src
git clone https://github.com/gluster/gluster-kubernetes.git
cd gluster-kubernetes/deploy
# creating topology file. Ip 10.0.0.10 was added in separate deployment as private one using 'ip addr add dev ens3 10.0.0.10/24'
cat <<EOF > topology.json
{
  "clusters": [
	{
	  "nodes": [
		{
		  "node": {
			"hostnames": {
			  "manage": [
				"k3s-gluster"
			  ],
			  "storage": [
				"10.0.0.10"
			  ]
			},
			"zone": 1
		  },
		  "devices": [
			"/dev/vdb"
		  ]
		}
	  ]
	}
  ]
}
EOF

# patching kube-templates/glusterfs-daemonset.yaml regarding to patch https://github.com/gluster/gluster-kubernetes/issues/539#issuecomment-454668538

3. Deploying gluster:
root@k3s-gluster:~/src/gluster-kubernetes/deploy# ./gk-deploy -n kube-system --single-node -gvy topology.json


Using Kubernetes CLI.

Checking status of namespace matching 'kube-system':
kube-system   Active   4m36s
Using namespace "kube-system".
Checking for pre-existing resources...
  GlusterFS pods ... 
Checking status of pods matching '--selector=glusterfs=pod':

Timed out waiting for pods matching '--selector=glusterfs=pod'.
not found.
  deploy-heketi pod ... 
Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'.
not found.
  heketi pod ... 
Checking status of pods matching '--selector=heketi=pod':

Timed out waiting for pods matching '--selector=heketi=pod'.
not found.
  gluster-s3 pod ... 
Checking status of pods matching '--selector=glusterfs=s3-pod':

Timed out waiting for pods matching '--selector=glusterfs=s3-pod'.
not found.
Creating initial resources ... /usr/local/bin/kubectl -n kube-system create -f /root/src/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1
serviceaccount/heketi-service-account created
/usr/local/bin/kubectl -n kube-system create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=kube-system:heketi-service-account 2>&1
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view created
/usr/local/bin/kubectl -n kube-system label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view labeled
OK
Marking 'k3s-gluster' as a GlusterFS node.
/usr/local/bin/kubectl -n kube-system label nodes k3s-gluster storagenode=glusterfs --overwrite 2>&1
node/k3s-gluster labeled
Deploying GlusterFS pods.
sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g' /root/src/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml | /usr/local/bin/kubectl -n kube-system create -f - 2>&1
daemonset.extensions/glusterfs created
Waiting for GlusterFS pods to start ... 
Checking status of pods matching '--selector=glusterfs=pod':
glusterfs-xvkrp   1/1   Running   0     70s
OK
/usr/local/bin/kubectl -n kube-system create secret generic heketi-config-secret --from-file=private_key=/dev/null --from-file=./heketi.json --from-file=topology.json=topology.json
secret/heketi-config-secret created
/usr/local/bin/kubectl -n kube-system label --overwrite secret heketi-config-secret glusterfs=heketi-config-secret heketi=config-secret
secret/heketi-config-secret labeled
sed -e 's/\${HEKETI_EXECUTOR}/kubernetes/' -e 's#\${HEKETI_FSTAB}#/var/lib/heketi/fstab#' -e 's/\${HEKETI_ADMIN_KEY}' -e 's/\${HEKETI_USER_KEY}' /root/src/gluster-kubernetes/deploy/kube-templates/deploy-heketi-deployment.yaml | /usr/local/bin/kubectl -n kube-system create -f - 2>&1
service/deploy-heketi created
deployment.extensions/deploy-heketi created
Waiting for deploy-heketi pod to start ... 
Checking status of pods matching '--selector=deploy-heketi=pod':
deploy-heketi-5f6c465bb8-zl959   1/1   Running   0     19s
OK
Determining heketi service URL ... OK
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 -- heketi-cli -s http://localhost:8080 --user admin --secret '' topology load --json=/etc/heketi/topology.json 2>&1
Creating cluster ... ID: 949e5d5063a1c1589940b7ff4705dae8
Allowing file volumes on cluster.
Allowing block volumes on cluster.
Creating node k3s-gluster ... ID: 6f8e3cbc0cbf6d668d718cd9bd6022f5
Adding device /dev/vdb ... OK
heketi topology loaded.
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 -- heketi-cli -s http://localhost:8080 --user admin --secret '' setup-openshift-heketi-storage --help --durability=none >/dev/null 2>&1
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 -- heketi-cli -s http://localhost:8080 --user admin --secret '' setup-openshift-heketi-storage --listfile=/tmp/heketi-storage.json --durability=none 2>&1
Saving /tmp/heketi-storage.json
/usr/local/bin/kubectl -n kube-system exec -i deploy-heketi-5f6c465bb8-zl959 -- cat /tmp/heketi-storage.json | /usr/local/bin/kubectl -n kube-system create -f - 2>&1
secret/heketi-storage-secret created
endpoints/heketi-storage-endpoints created
service/heketi-storage-endpoints created
job.batch/heketi-storage-copy-job created

Checking status of pods matching '--selector=job-name=heketi-storage-copy-job':
heketi-storage-copy-job-xft9f   0/1   ContainerCreating   0     5m16s
Timed out waiting for pods matching '--selector=job-name=heketi-storage-copy-job'.
Error waiting for job 'heketi-storage-copy-job' to complete.


Actual results:
root@k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl get pods -n kube-system
NAME                             READY   STATUS              RESTARTS   AGE
coredns-7748f7f6df-cchx7         1/1     Running             0          177m
deploy-heketi-5f6c465bb8-5f27p   1/1     Running             0          173m
glusterfs-ntmq7                  1/1     Running             0          174m
heketi-storage-copy-job-qzpr7    0/1     ContainerCreating   0          170m
svclb-traefik-957cdf677-c4j76    2/2     Running             1          177m
traefik-7b6bd6cbf6-rnrxj         1/1     Running             0          177m

root@k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl -n kube-system describe po/heketi-storage-copy-job-qzpr7
Name:               heketi-storage-copy-job-qzpr7
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               k3s-gluster/104.36.17.63
Start Time:         Fri, 29 Mar 2019 08:54:08 +0000
Labels:             controller-uid=36e114ae-5200-11e9-a826-227e2ba50104
                    job-name=heketi-storage-copy-job
Annotations:        <none>
Status:             Pending
IP:                 
Controlled By:      Job/heketi-storage-copy-job
Containers:
  heketi:
    Container ID:  
    Image:         heketi/heketi:dev
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      /db/heketi.db
      /heketi
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /db from heketi-storage-secret (rw)
      /heketi from heketi-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-98jvk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  heketi-storage:
    Type:           Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:  heketi-storage-endpoints
    Path:           heketidbstorage
    ReadOnly:       false
  heketi-storage-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-storage-secret
    Optional:    false
  default-token-98jvk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-98jvk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                    From                  Message
  ----     ------       ----                   ----                  -------
  Warning  FailedMount  3m56s (x74 over 169m)  kubelet, k3s-gluster  Unable to mount volumes for pod "heketi-storage-copy-job-qzpr7_kube-system(36e1b013-5200-11e9-a826-227e2ba50104)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"heketi-storage-copy-job-qzpr7". list of unmounted volumes=[heketi-storage]. list of unattached volumes=[heketi-storage heketi-storage-secret default-token-98jvk]


Expected results:
all pods running, gk-deploy works with no errors


Additional info:
Same Gluster procedure works with single-node kubernetes, but won't work with k3s.
Firewall is default and only modified with k3s iptables rules.
I've been trying different configurations and they don't work:
private IP in the topology(also used main public ip)
deploying with a clean drive
mounting the volume from outside
updating the gluster client to v3.12.x on ubuntu16 and 3.13.2 on ubuntu18



Gluster logs, volumes:
root@k3s-gluster:~/src/gluster-kubernetes/deploy# kubectl -n kube-system exec -it glusterfs-ntmq7 /bin/bash
[root@k3s-gluster /]# cat /var/log/glusterfs/glusterd.log 
[2019-03-29 08:50:44.968074] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 4.1.7 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2019-03-29 08:50:44.977762] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536
[2019-03-29 08:50:44.977790] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory
[2019-03-29 08:50:44.977797] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory
[2019-03-29 08:50:45.002831] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device]
[2019-03-29 08:50:45.002862] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device
[2019-03-29 08:50:45.002873] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2019-03-29 08:50:45.002957] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2019-03-29 08:50:45.002968] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2019-03-29 08:50:46.040712] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:50:46.040765] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:50:46.040768] I [MSGID: 106514] [glusterd-store.c:2262:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 40100
[2019-03-29 08:50:46.044266] I [MSGID: 106194] [glusterd-store.c:3850:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 10
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/lib/glusterd
 15: end-volume
 16:  
+------------------------------------------------------------------------------+
[2019-03-29 08:50:46.044640] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-03-29 08:54:07.698610] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-03-29 08:54:07.698686] I [MSGID: 106477] [glusterd.c:190:glusterd_uuid_generate_save] 0-management: generated UUID: 9dc908c2-0e7d-4b40-a951-095b78dbaeeb
[2019-03-29 08:54:07.706214] W [MSGID: 101095] [xlator.c:181:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/4.1.7/xlator/nfs/server.so: cannot open shared object file: No such file or directory
[2019-03-29 08:54:07.730620] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a) [0x7f7f4e7f1c9a] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765) [0x7f7f4e7f1765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/create/post/S10selinux-label-brick.sh --volname=heketidbstorage
[2019-03-29 08:54:07.863432] I [glusterd-utils.c:6090:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick
[2019-03-29 08:54:07.989260] I [MSGID: 106142] [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick on port 49152
[2019-03-29 08:54:07.998472] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-03-29 08:54:08.007817] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2019-03-29 08:54:08.008060] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-gfproxyd: setting frame-timeout to 600
[2019-03-29 08:54:08.008256] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2019-03-29 08:54:08.008335] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2019-03-29 08:54:08.008360] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-03-29 08:54:08.008376] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed
[2019-03-29 08:54:08.008402] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600
[2019-03-29 08:54:08.008493] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600
[2019-03-29 08:54:08.008656] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600
[2019-03-29 08:54:08.008772] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2019-03-29 08:54:08.008785] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped
[2019-03-29 08:54:08.008808] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600
[2019-03-29 08:54:08.008907] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2019-03-29 08:54:08.008917] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped
[2019-03-29 08:54:08.015319] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a) [0x7f7f4e7f1c9a] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765) [0x7f7f4e7f1765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2019-03-29 08:54:08.025189] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2c9a) [0x7f7f4e7f1c9a] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3) [0x7f7f4e7f16c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7f5395d0f5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=heketidbstorage --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd

[root@k3s-gluster /]# lsblk 
NAME                                                                              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop1                                                                               7:1    0 87.9M  1 loop 
vdb                                                                               252:16   0  100G  0 disk 
├─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4_tdata   253:1    0    2G  0 lvm  
│ └─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4-tpool 253:2    0    2G  0 lvm  
│   ├─vg_fef96eab984d116ab3815e7479781110-brick_65d5aa6369e265d641f3557e6c9736b7  253:4    0    2G  0 lvm  /var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7
│   └─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4     253:3    0    2G  0 lvm  
└─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4_tmeta   253:0    0   12M  0 lvm  
  └─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4-tpool 253:2    0    2G  0 lvm  
    ├─vg_fef96eab984d116ab3815e7479781110-brick_65d5aa6369e265d641f3557e6c9736b7  253:4    0    2G  0 lvm  /var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7
    └─vg_fef96eab984d116ab3815e7479781110-tp_c3a55a7f206b236eba17b954622543b4     253:3    0    2G  0 lvm  
loop2                                                                               7:2    0   91M  1 loop 
loop0                                                                               7:0    0 89.3M  1 loop 
vda                                                                               252:0    0   10G  0 disk 
├─vda2                                                                            252:2    0   10G  0 part /var/lib/misc/glusterfsd
└─vda1                                                                            252:1    0    1M  0 part 

[root@k3s-gluster /]# gluster volume info
Volume Name: heketidbstorage
Type: Distribute
Volume ID: 32608bdb-a4a3-494e-9c6e-68d8f780f12c
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.0.0.10:/var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

[root@k3s-gluster /]# mount -t glusterfs 10.0.0.10:/heketidbstorage /mnt/glustertest
WARNING: getfattr not found, certain checks will be skipped..

[root@k3s-gluster /]# mount | grep 10.0.0.10
10.0.0.10:/heketidbstorage on /mnt/glustertest type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

Comment 1 Atin Mukherjee 2019-04-01 12:54:54 UTC
Could you elaborate the problem bit more? Are you seeing volume mount failing or something wrong with the clustering? From the quick scan through of the bug report, I don't see anything problematic from glusterd end.

Comment 2 it.sergm 2019-04-02 06:49:40 UTC
The thing is i don't see any exception there also and heketidbstorage volume is manually mounting(no files inside), but still its not working with k3s and pod seems cannot mount needed stuff before starting. K3s itself works fine and can deploy stuff with no errors.

I could be wrong, but there is only one volume listed from gluster pod:
[root@k3s-gluster /]# gluster volume list
heketidbstorage

but regarding to pod's error there should be more:
3d12h       Warning   FailedMount               Pod       Unable to mount volumes for pod "heketi-storage-copy-job-qzpr7_kube-system(36e1b013-5200-11e9-a826-227e2ba50104)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"heketi-storage-copy-job-qzpr7". list of unmounted volumes=[heketi-storage]. list of unattached volumes=[heketi-storage heketi-storage-secret default-token-98jvk]

Here is the list of volumes on gluster pod:
[root@k3s-gluster /]# df -h
Filesystem                                                                              Size  Used Avail Use% Mounted on
overlay                                                                                 9.8G  6.9G  2.5G  74% /
udev                                                                                    3.9G     0  3.9G   0% /dev
/dev/vda2                                                                               9.8G  6.9G  2.5G  74% /run
tmpfs                                                                                   798M  1.3M  797M   1% /run/lvm
tmpfs                                                                                   3.9G     0  3.9G   0% /dev/shm
tmpfs                                                                                   3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                                                                                   3.9G   12K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
/dev/mapper/vg_fef96eab984d116ab3815e7479781110-brick_65d5aa6369e265d641f3557e6c9736b7  2.0G   33M  2.0G   2% /var/lib/heketi/mounts/vg_fef96eab984d116ab3815e7479781110/brick_65d5aa6369e265d641f3557e6c9736b7

[root@k3s-gluster /]# blkid
/dev/loop0: TYPE="squashfs" 
/dev/loop1: TYPE="squashfs" 
/dev/loop2: TYPE="squashfs" 
/dev/vda1: PARTUUID="258e4699-a592-442c-86d7-3d7ee4a0dfb7" 
/dev/vda2: UUID="b394d2be-6b9e-11e8-82ca-22c5fe683ae4" TYPE="ext4" PARTUUID="97104384-f79f-4a39-b3d4-56d717673a18" 
/dev/vdb: UUID="RUR8Cw-eVYg-H26e-yQ4g-7YCe-NzNg-ocJazb" TYPE="LVM2_member" 
/dev/mapper/vg_fef96eab984d116ab3815e7479781110-brick_65d5aa6369e265d641f3557e6c9736b7: UUID="ab0e969f-ae85-459c-914f-b008aeafb45e" TYPE="xfs" 


Also here what i've found on main node - host ip is NULL (btw i've changed topology before with external and private ips - nothing changed for this):
root@k3s-gluster:~# cat /var/log/glusterfs/cli.log.1 
[2019-03-29 08:54:07.634136] I [cli.c:773:main] 0-cli: Started running gluster with version 4.1.7
[2019-03-29 08:54:07.678012] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-03-29 08:54:07.678105] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-03-29 08:54:07.678268] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-03-29 08:54:07.721606] I [cli-rpc-ops.c:1169:gf_cli_create_volume_cbk] 0-cli: Received resp to create volume
[2019-03-29 08:54:07.721773] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-03-29 08:54:07.817416] I [cli.c:773:main] 0-cli: Started running gluster with version 4.1.7
[2019-03-29 08:54:07.861767] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-03-29 08:54:07.861943] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-03-29 08:54:07.862016] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-03-29 08:54:08.009116] I [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start volume
[2019-03-29 08:54:08.009314] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2019-03-29 14:18:51.209759] I [cli.c:773:main] 0-cli: Started running gluster with version 4.1.7
[2019-03-29 14:18:51.256846] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-03-29 14:18:51.256985] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2019-03-29 14:18:51.257093] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-03-29 14:18:51.259408] I [cli-rpc-ops.c:875:gf_cli_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2019-03-29 14:18:51.259587] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0
[2019-03-29 14:18:51.260102] I [cli-rpc-ops.c:875:gf_cli_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2019-03-29 14:18:51.260143] I [input.c:31:cli_batch] 0-: Exiting with: 0

Comment 3 Assen Sharlandjiev 2019-05-08 13:31:33 UTC
Hi, I ran into the same problem.
Checked the syslog on the k3s host node:

#tail -f /var/log/syslog

shows

May  8 16:26:53 k3s-node2 k3s[922]: E0508 16:26:53.466167     922 desired_state_of_world_populator.go:298] Failed to add volume "heketi-storage" (specName: "heketi-storage") for pod "9a3ec318-718e-11e9-9557-3e1cb9b46815" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "heketi-storage" err=no volume plugin matched
May  8 16:26:53 k3s-node2 k3s[922]: E0508 16:26:53.569733     922 desired_state_of_world_populator.go:298] Failed to add volume "heketi-storage" (specName: "heketi-storage") for pod "9a3ec318-718e-11e9-9557-3e1cb9b46815" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "heketi-storage" err=no volume plugin matched

I guess we are missing something in the k3s agent node. 

hope this info helps.

Comment 4 Atin Mukherjee 2019-07-15 03:14:31 UTC
Talur - could you check what's going wrong here? I believe this isn't a (core) gluster problem as such.

Comment 5 Raghavendra Talur 2019-10-28 22:15:31 UTC
The error message "list of unmounted volumes=[heketi-storage]" indicates that k3s wasn't able to mount "glusterfs" volume type, look at the other error message : err=failed to get Plugin from volumeSpec for volume "heketi-storage" err=no volume plugin matched. 

For what I could find online, I see that k3s does not have the intree volume plugin for gluster. We (heketi developers have not tested heketi with anything other than k8s and therefore you are on a uncharted territory. 
A simple workaround I can think of is to split the deployment steps and do the copy job manually, you can find the copy job definition here https://github.com/heketi/heketi/blob/master/client/cli/go/cmds/heketi_storage.go#L256.

I will close the bug as we don't really support k3s yet.


Note You need to log in before you can comment on or make changes to this bug.