1592487 – Job sync thread fails when /queue directory becomes empty

Bug 1592487 - Job sync thread fails when /queue directory becomes empty

Summary: Job sync thread fails when /queue directory becomes empty

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-commons
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	gowtham
QA Contact:	Daniel Horák
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1594655 (view as bug list)
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-06-18 15:57 UTC by gowtham
Modified:	2018-09-04 07:08 UTC (History)
CC List:	6 users (show)
Fixed In Version:	tendrl-commons-1.6.3-8.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 07:07:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	Tendrl commons pull 994	0	None	None	None	2018-06-18 16:07:03 UTC
Red Hat Product Errata	RHSA-2018:2616	0	None	None	None	2018-09-04 07:08:29 UTC

Description gowtham 2018-06-18 15:57:08 UTC

Description of problem:
Each job in /queue directory has TTL of 2 days from its creation time. When no new job came for two days and all jobs are deleted by TTL then sync thread in all tendrl components are failing with exception:

Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: Traceback (most recent call last):
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: self.run()
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib64/python2.7/threading.py", line 765, in run
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: self.__target(*self.__args, **self.__kwargs)
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 84, in process_job
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: job.save()
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib/python2.7/site-packages/tendrl/commons/objects/job/__init__.py", line 23, in save
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: if "parent" in self.payload:
Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: TypeError: argument of type 'NoneType' is not iterable


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. To manually reproduce set TTL for all jobs in /queue directory to 10 sec
2. Then do service {tendrl component name} status -l  (e.g) service tendrl-node-
   agent status -l 
 To set TTL manually: (create python script)
    import etcd
    _etcd_args = dict(
        host="127.0.0.1",  # add you etcd machine ip here
        port="2379"
    )
    # if certificates are enabled then add those also
    _etcd_args.update(
        {
            "ca_cert": 'etcd_ca_cert_file_path',
            "cert": (
                'etcd_cert_file_path',
                'etcd_key_file_path'
            ),
            "protocol": "https"
        }
    )
    client = etcd.Client(**_etcd_args)
    jobs = client.read("/queue")
    for job in jobs.leaves:
        etcd.refresh(job.key, 10)

Actual results:
Job sync thread is failing in all tendrl components when /queue directory is empty

Expected results:
Job sync thread should not fail when /queue directory is empty


Additional info:
Initially in fresh machine /queue directory is not present in that case it is working fine, But when /queue directory in present but it contains no job then only it failing.

Here not present and present with empty both are different cases, the problem with present but empty case only.

Comment 2 Martin Bukatovic 2018-06-19 09:37:29 UTC

Could you specify a full version of the affected component? During testing, QE
team will need to reproduce the bug on affected version  before moving on the
version with the fix.

Since the reproducer looks clear, I will add qe ack then.

Comment 3 gowtham 2018-06-19 09:53:31 UTC

tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-7.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-node-agent-1.6.3-7.el7rhgs.noarch
tendrl-ui-1.6.3-4.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch

Comment 4 Martin Bukatovic 2018-06-19 09:55:46 UTC

Thank you. Adding qe ack.

Comment 7 Nishanth Thomas 2018-06-27 10:42:08 UTC

*** Bug 1594655 has been marked as a duplicate of this bug. ***

Comment 9 Daniel Horák 2018-07-13 07:59:45 UTC

Tested and verified on:
  etcd-3.2.7-1.el7.x86_64
  python-etcd-0.4.5-2.el7rhgs.noarch
  rubygem-etcd-0.3.0-2.el7rhgs.noarch
  tendrl-ansible-1.6.3-5.el7rhgs.noarch
  tendrl-api-1.6.3-4.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
  tendrl-commons-1.6.3-8.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-6.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-6.el7rhgs.noarch
  tendrl-node-agent-1.6.3-8.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-6.el7rhgs.noarch

With slightly modified the script from comment 0:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  import etcd
  _etcd_args = dict(
      host="rhgswa_server.example.com",
      port=2379
  )
  # if certificates are enabled then add those also
  _etcd_args.update(
      {
          "ca_cert": '/etc/pki/tls/certs/ca-usmqe.crt',
          "cert": (
              '/etc/pki/tls/certs/etcd.crt',
              '/etc/pki/tls/private/etcd.key'
          ),
          "protocol": "https"
      }
  )
  client = etcd.Client(**_etcd_args)
  jobs = client.read("/queue")
  for job in jobs.leaves:
      client.refresh(job.key, 10)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After short time, the /queue directory is empty:
  # etcdctl --ca-file /etc/pki/tls/certs/ca-usmqe.crt \
    --cert-file /etc/pki/tls/certs/etcd.crt 
    --key-file /etc/pki/tls/private/etcd.key 
    --endpoints https://${HOSTNAME}:2379 \
    ls /queue
  #

And no traceback appears in the logs from any tendrl service.

>> VERIFIED

Comment 11 errata-xmlrpc 2018-09-04 07:07:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.