Description of problem: On a OCP deployment with CNS configured via ansible, heketi pod goes into error state when restarted. The fix for https://bugzilla.redhat.com/show_bug.cgi?id=1548322 isn't available in the ansible heketi templates. # oc logs heketi-storage-1-rhdf7 Heketi 6.0.0 [heketi] ERROR 2018/03/12 05:58:29 /src/github.com/heketi/heketi/apps/glusterfs/app.go:79: invalid log level: [heketi] INFO 2018/03/12 05:58:29 Loaded kubernetes executor [heketi] INFO 2018/03/12 05:58:29 Please refer to the Heketi troubleshooting documentation for more information on how to resolve this issue. [heketi] WARNING 2018/03/12 05:58:29 Server refusing to start. [heketi] ERROR 2018/03/12 05:58:29 /src/github.com/heketi/heketi/apps/glusterfs/app.go:156: Heketi was terminated while performing one or more operations. Server may refuse to start as long as pending operations are present in the db. ERROR: Unable to start application Version-Release number of the following components: rpm -q openshift-ansible - openshift-ansible-3.9.3-1.git.0.e166207.el7.noarch rpm -q ansible - ansible-2.4.2.0-2.el7.noarch ansible --version ansible 2.4.2.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] How reproducible: Always Steps to Reproduce: 1. deploy OCP + CNS via ansible 2. create pvc and at the same time restart heketi pod Actual results: heketi pod goes into error state Expected results: heketi pod should come up Additional info:
Initial PR on master created: https://github.com/openshift/openshift-ansible/pull/7494
Thanks for the fix. Verified with [fedora@ip-172-31-55-221 openshift-ansible]$ git log --oneline -1 4d0941a02 (HEAD -> bz1554219) GlusterFS: Add HEKETI_IGNORE_STALE_OPERATIONS to templates # yum list installed | grep openshift atomic-openshift.x86_64 3.9.4-1.git.0.35fdfc4.el7 # oc get pod -n glusterfs -o yaml | grep "image:" | sort -u image: registry.reg-aws.openshift.com:443/rhgs3/rhgs-gluster-block-prov-rhel7:3.3.1-3 image: registry.reg-aws.openshift.com:443/rhgs3/rhgs-server-rhel7:3.3.1-7 image: registry.reg-aws.openshift.com:443/rhgs3/rhgs-volmanager-rhel7:3.3.1-4 # oc rsh -n glusterfs heketi-storage-1-wf68c sh-4.2# printenv | grep -i stale HEKETI_IGNORE_STALE_OPERATIONS=true
Backport for 3.9 created: https://github.com/openshift/openshift-ansible/pull/7511
Verified with version openshift-ansible-3.9.9-1.git.0.1a1f7d8.el7, code has merged and has effect. # oc get template heketi -n glusterfs -o yaml | grep -A1 HEKETI_IGNORE_STALE_OPERATIONS - name: HEKETI_IGNORE_STALE_OPERATIONS value: "true" # oc rsh heketi-storage-1-b5vhp sh-4.2# env | grep HEKETI_IGNORE_STALE_OPERATIONS HEKETI_IGNORE_STALE_OPERATIONS=true sh-4.2# exit exit