Bug 1592487

Summary: Job sync thread fails when /queue directory becomes empty
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: gowtham <gshanmug>
Component: web-admin-tendrl-commonsAssignee: gowtham <gshanmug>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: dahorak, gshanmug, mbukatov, nthomas, rhs-bugs, sankarshan
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tendrl-commons-1.6.3-8.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 07:07:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503137    

Description gowtham 2018-06-18 15:57:08 UTC
Description of problem:
Each job in /queue directory has TTL of 2 days from its creation time. When no new job came for two days and all jobs are deleted by TTL then sync thread in all tendrl components are failing with exception:

Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: Traceback (most recent call last):
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: self.run()
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib64/python2.7/threading.py", line 765, in run
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: self.__target(*self.__args, **self.__kwargs)
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 84, in process_job
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: job.save()
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib/python2.7/site-packages/tendrl/commons/objects/job/__init__.py", line 23, in save
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: if "parent" in self.payload:
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: TypeError: argument of type 'NoneType' is not iterable


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. To manually reproduce set TTL for all jobs in /queue directory to 10 sec
2. Then do service {tendrl component name} status -l  (e.g) service tendrl-node-
   agent status -l 
 To set TTL manually: (create python script)
    import etcd
    _etcd_args = dict(
        host="127.0.0.1",  # add you etcd machine ip here
        port="2379"
    )
    # if certificates are enabled then add those also
    _etcd_args.update(
        {
            "ca_cert": 'etcd_ca_cert_file_path',
            "cert": (
                'etcd_cert_file_path',
                'etcd_key_file_path'
            ),
            "protocol": "https"
        }
    )
    client = etcd.Client(**_etcd_args)
    jobs = client.read("/queue")
    for job in jobs.leaves:
        etcd.refresh(job.key, 10)

Actual results:
Job sync thread is failing in all tendrl components when /queue directory is empty

Expected results:
Job sync thread should not fail when /queue directory is empty


Additional info:
Initially in fresh machine /queue directory is not present in that case it is working fine, But when /queue directory in present but it contains no job then only it failing.

Here not present and present with empty both are different cases, the problem with present but empty case only.

Comment 2 Martin Bukatovic 2018-06-19 09:37:29 UTC
Could you specify a full version of the affected component? During testing, QE
team will need to reproduce the bug on affected version  before moving on the
version with the fix.

Since the reproducer looks clear, I will add qe ack then.

Comment 3 gowtham 2018-06-19 09:53:31 UTC
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-7.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-node-agent-1.6.3-7.el7rhgs.noarch
tendrl-ui-1.6.3-4.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch

Comment 4 Martin Bukatovic 2018-06-19 09:55:46 UTC
Thank you. Adding qe ack.

Comment 7 Nishanth Thomas 2018-06-27 10:42:08 UTC
*** Bug 1594655 has been marked as a duplicate of this bug. ***

Comment 9 Daniel Horák 2018-07-13 07:59:45 UTC
Tested and verified on:
  etcd-3.2.7-1.el7.x86_64
  python-etcd-0.4.5-2.el7rhgs.noarch
  rubygem-etcd-0.3.0-2.el7rhgs.noarch
  tendrl-ansible-1.6.3-5.el7rhgs.noarch
  tendrl-api-1.6.3-4.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
  tendrl-commons-1.6.3-8.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-6.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-6.el7rhgs.noarch
  tendrl-node-agent-1.6.3-8.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-6.el7rhgs.noarch

With slightly modified the script from comment 0:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  import etcd
  _etcd_args = dict(
      host="rhgswa_server.example.com",
      port=2379
  )
  # if certificates are enabled then add those also
  _etcd_args.update(
      {
          "ca_cert": '/etc/pki/tls/certs/ca-usmqe.crt',
          "cert": (
              '/etc/pki/tls/certs/etcd.crt',
              '/etc/pki/tls/private/etcd.key'
          ),
          "protocol": "https"
      }
  )
  client = etcd.Client(**_etcd_args)
  jobs = client.read("/queue")
  for job in jobs.leaves:
      client.refresh(job.key, 10)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After short time, the /queue directory is empty:
  # etcdctl --ca-file /etc/pki/tls/certs/ca-usmqe.crt \
    --cert-file /etc/pki/tls/certs/etcd.crt 
    --key-file /etc/pki/tls/private/etcd.key 
    --endpoints https://${HOSTNAME}:2379 \
    ls /queue
  #

And no traceback appears in the logs from any tendrl service.

>> VERIFIED

Comment 11 errata-xmlrpc 2018-09-04 07:07:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616