Bug 1635035

Summary: Running sosreport on OCP cluster node fails with IndexError from kubernetes plugin trying to get namespaces
Product: Red Hat Enterprise Linux 7 Reporter: Candace Sheremeta <cshereme>
Component: sosAssignee: Pavel Moravec <pmoravec>
Status: CLOSED ERRATA QA Contact: Miroslav HradĂ­lek <mhradile>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: agk, bmr, cshereme, cww, gavin, klaas, pdwyer, plambri, sbradley
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sos-3.7-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 13:15:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1594286, 1648022    

Description Candace Sheremeta 2018-10-01 21:24:56 UTC
Description of problem: Running a sosreport on an OCP cluster node fails with the following error from the kubernetes plugin:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1252, in setup
    plug.setup()
  File "/usr/lib/python2.7/site-packages/sos/plugins/kubernetes.py", line 75, in setup
    knsps = [n.split()[0] for n in kn['output'].splitlines()[1:] if n]
IndexError: list index out of range


Version-Release number of selected component (if applicable):
sos-3.5-9.el7_5
seen on OCP 3.9 and 3.10 clusters


How reproducible:
100%

Steps to Reproduce:
1. Set up OCP cluster (I have customers reporting this issue from both 3.9 and 3.10 clusters, and was able to reproduce the issue on master nodes for both versions)
2. Run sosreport
3.

Actual results:
~~~
# sosreport

sosreport (version 3.5)

This command will collect diagnostic and configuration information from
this Red Hat Enterprise Linux system and installed applications.

...

 Setting up archive ...
 Setting up plugins ...
caught exception in plugin method "kubernetes.setup()"
writing traceback to sos_logs/kubernetes-plugin-errors.txt
 Running plugins. Please wait ...
~~~

sos_logs/kubernetes-plugin-errors.txt shows:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1252, in setup
    plug.setup()
  File "/usr/lib/python2.7/site-packages/sos/plugins/kubernetes.py", line 75, in setup
    knsps = [n.split()[0] for n in kn['output'].splitlines()[1:] if n]
IndexError: list index out of range

Expected results:
No exception

Additional info:
N/A

Comment 2 Pavel Moravec 2018-10-02 08:28:08 UTC
Defering to 7.7, too late for 7.6 (where the bug is supposed to be present as well as the code hasnt changed since sos 3.5).


If you can reproduce it on your system, could you please provide me access to it?

Or at least, provide either:

1) Add to /usr/lib/python2.7/site-packages/sos/plugins/kubernetes.py the print statement:

        kn = self.get_command_output('%s get namespaces' % kube_cmd)
        print("kube_cmd='%s', kn=%s" % (kube_cmd, kn))
        knsps = [n.split()[0] for n in kn['output'].splitlines()[1:] if n]

re-run sosreport that will fail and provide stdout output.

2) Or provide output of:

kubectl get namespaces
kubectl --kubeconfig=/etc/origin/master/admin.kubeconfig get namespaces

commands (whose parsing to get namespaces is failing)


(FYI in 02193703, I think the sosreport was killed by OOM killer as it was executing another plugin (logs) - see https://bugzilla.redhat.com/show_bug.cgi?id=1183244)

Comment 3 Klaas Demter 2018-10-02 09:43:13 UTC
        if self.check_is_master():
            kube_cmd = "kubectl "
            if path.exists('/etc/origin/master/admin.kubeconfig'):
-                kube_cmd += "--config=/etc/origin/master/admin.kubeconfig"
+                kube_cmd += "--kubeconfig=/etc/origin/master/admin.kubeconfig"


Greetings
Klaas

Comment 4 Pavel Moravec 2018-10-02 09:53:28 UTC
(In reply to Klaas Demter from comment #3)
>         if self.check_is_master():
>             kube_cmd = "kubectl "
>             if path.exists('/etc/origin/master/admin.kubeconfig'):
> -                kube_cmd += "--config=/etc/origin/master/admin.kubeconfig"
> +                kube_cmd +=
> "--kubeconfig=/etc/origin/master/admin.kubeconfig"
> 
> 
> Greetings
> Klaas

That is upstream commit https://github.com/sosreport/sos/commit/63ad6c2 included in 3.6 we rebase to in RHEL 7.6.

Do you suggest this fixes this BZ?

Comment 5 Klaas Demter 2018-10-02 10:02:48 UTC
Yeah, that fixes it for me (Case 02192312)

Comment 8 Candace Sheremeta 2018-10-02 23:42:53 UTC
Hey Pavel,

I can confirm that https://github.com/sosreport/sos/commit/63ad6c2 fixed this issue within my environment as well.

Comment 9 Pavel Moravec 2018-10-03 06:53:41 UTC
.. and here is the problem that is basically workarounded only in sos 3.6:

python
Python 2.7.15 (default, Sep 21 2018, 23:26:48) 
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> kn = {'status': 1, 'output': u'Error: unknown flag: --config\n\n\nExamples:\n'}
>>> kn
{'status': 1, 'output': u'Error: unknown flag: --config\n\n\nExamples:\n'}
>>> [n.split()[0] for n in kn['output'].splitlines()[1:] if n]
[u'Examples:']
>>>

(so far so good, let use longer output snippet, with line containing spaces)

>>> kn={'status': 1, 'output': u'Error: unknown flag: --config\n\n\nExamples:\n  # List all pods in ps output format.\n  kubectl get pods\n  \n  # List all pods in ps output format with more information (such as node name).\n  kubectl get pods -o wide\n  \n'}
>>> [n.split()[0] for n in kn['output'].splitlines()[1:] if n]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>

So the problem is when the output contains '\n  \n' substring (line with spaces only) where:
- this line is nonemtpy, so "if n" is True
- splitting this line returns empty list
- we try to access 1st item of the list

So the code *is* wrong, and it works in 3.6 only b'cos the output does not contain lines with spaces only. Ideally the assignment should be like:

knsps = [n.split()[0] for n in kn['output'].splitlines()[1:] if n and len(n.split())]


So until something gets broken, the code in 3.6 *will* work well. But I will leave the BZ open / to be fixed for the improvement in 

https://github.com/sosreport/sos/pull/1442

Comment 10 Pavel Moravec 2018-10-03 07:00:58 UTC
For verification:

1) mimic you are Kubernetes master and fake kubectl command to return the problematic output:

mkdir -p /etc/origin/master/

echo "echo 'nonempty line'; echo '   '; echo 'another nonempty'" > /usr/bin/kubectl

chmod a+x /usr/bin/kubectl

2) run sosreport:

sosreport -o kubernetes --batch --build

3) check if kubernetes plugin does not raise above exception from Description

Comment 11 Pavel Moravec 2019-03-18 18:13:54 UTC
posted to upstream

Comment 15 errata-xmlrpc 2019-08-06 13:15:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2295