Description of problem: Each job in /queue directory has TTL of 2 days from its creation time. When no new job came for two days and all jobs are deleted by TTL then sync thread in all tendrl components are failing with exception: Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: Traceback (most recent call last): Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: self.run() Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib64/python2.7/threading.py", line 765, in run Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: self.__target(*self.__args, **self.__kwargs) Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py", line 84, in process_job Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: job.save() Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: File "/usr/lib/python2.7/site-packages/tendrl/commons/objects/job/__init__.py", line 23, in save Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: if "parent" in self.payload: Jun 18 15:31:34 tendrl-server tendrl-monitoring-integration[6703]: TypeError: argument of type 'NoneType' is not iterable Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. To manually reproduce set TTL for all jobs in /queue directory to 10 sec 2. Then do service {tendrl component name} status -l (e.g) service tendrl-node- agent status -l To set TTL manually: (create python script) import etcd _etcd_args = dict( host="127.0.0.1", # add you etcd machine ip here port="2379" ) # if certificates are enabled then add those also _etcd_args.update( { "ca_cert": 'etcd_ca_cert_file_path', "cert": ( 'etcd_cert_file_path', 'etcd_key_file_path' ), "protocol": "https" } ) client = etcd.Client(**_etcd_args) jobs = client.read("/queue") for job in jobs.leaves: etcd.refresh(job.key, 10) Actual results: Job sync thread is failing in all tendrl components when /queue directory is empty Expected results: Job sync thread should not fail when /queue directory is empty Additional info: Initially in fresh machine /queue directory is not present in that case it is working fine, But when /queue directory in present but it contains no job then only it failing. Here not present and present with empty both are different cases, the problem with present but empty case only.
Could you specify a full version of the affected component? During testing, QE team will need to reproduce the bug on affected version before moving on the version with the fix. Since the reproducer looks clear, I will add qe ack then.
tendrl-ansible-1.6.3-5.el7rhgs.noarch tendrl-commons-1.6.3-7.el7rhgs.noarch tendrl-api-1.6.3-3.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-node-agent-1.6.3-7.el7rhgs.noarch tendrl-ui-1.6.3-4.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
Thank you. Adding qe ack.
*** Bug 1594655 has been marked as a duplicate of this bug. ***
Tested and verified on: etcd-3.2.7-1.el7.x86_64 python-etcd-0.4.5-2.el7rhgs.noarch rubygem-etcd-0.3.0-2.el7rhgs.noarch tendrl-ansible-1.6.3-5.el7rhgs.noarch tendrl-api-1.6.3-4.el7rhgs.noarch tendrl-api-httpd-1.6.3-4.el7rhgs.noarch tendrl-commons-1.6.3-8.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-6.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-6.el7rhgs.noarch tendrl-node-agent-1.6.3-8.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-6.el7rhgs.noarch With slightly modified the script from comment 0: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ import etcd _etcd_args = dict( host="rhgswa_server.example.com", port=2379 ) # if certificates are enabled then add those also _etcd_args.update( { "ca_cert": '/etc/pki/tls/certs/ca-usmqe.crt', "cert": ( '/etc/pki/tls/certs/etcd.crt', '/etc/pki/tls/private/etcd.key' ), "protocol": "https" } ) client = etcd.Client(**_etcd_args) jobs = client.read("/queue") for job in jobs.leaves: client.refresh(job.key, 10) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After short time, the /queue directory is empty: # etcdctl --ca-file /etc/pki/tls/certs/ca-usmqe.crt \ --cert-file /etc/pki/tls/certs/etcd.crt --key-file /etc/pki/tls/private/etcd.key --endpoints https://${HOSTNAME}:2379 \ ls /queue # And no traceback appears in the logs from any tendrl service. >> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616