Bug 1513071
Summary: | sos: postgresql plugin unable to collect dump from postgresql95 db | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | David Necpal <dnecpal> |
Component: | sos | Assignee: | Pavel Moravec <pmoravec> |
Status: | CLOSED DUPLICATE | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | agk, bmr, bugs, danken, dnecpal, dougsland, gavin, jbelka, lsvaty, mkalinin, plambri, pmoravec, sbonazzo, sbradley, ylavi |
Target Milestone: | rc | Keywords: | AutomationBlocker, TestBlocker |
Target Release: | 7.5 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | sos-3.5-2.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-19 16:44:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1494420 | ||
Bug Blocks: |
Description
David Necpal
2017-11-14 17:23:30 UTC
David is using sos-3.5: # rpm -qa | grep sos sos-3.5-1.el7.noarch Pavel, any hint here? ovirt-log-collector-analyzer require dump from database for analyze. I dont understand what you are asking for. "it's blocked from SOS site." - does it mean sos gets stuck during its execution? Or it doesnt provide data (what expected data?)? As knowing bit context (that ovirt-log-collector sets PGPASSWORD env.variable and calls sosreport per additional info), could you run from cmdline: export PGPASSWORD=provideThePasswordHere sosreport --batch -o postgresql -k postgresql.dbname=engine -k postgresql.dbhost=localhost -k postgresql.dbport=5432 -k postgresql.username=engine -vv --debug and provide the output. E.g. on a system with no postgres / pg_dump: Setting up plugins ... [plugin:postgresql] could not run 'pg_dump -U engine -h localhost -p 5432 -w -f /tmp/tmpadBrfZ/sos_pgdump.tar -F t engine': command not found [plugin:postgresql] Unable to execute pg_dump. Error() Running plugins. Please wait ... (try to understand why the pg_dump command fails) RHV 4.2 uses pg 9.5 which is located /opt/rh/rh-postgresql95/root/ ++ find /tmp/tmp.JgJ9IqgbY6/unpacked_sosreport -name '*postgresql-sosreport*tar.xz' + TAR_WITH_POSTGRES_SOSREPORT=/tmp/tmp.JgJ9IqgbY6/unpacked_sosreport/sosreport-LogCollector-20171115133448/log-collector-data/postgresql-sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448.tar.xz + '[' -z /tmp/tmp.JgJ9IqgbY6/unpacked_sosreport/sosreport-LogCollector-20171115133448/log-collector-data/postgresql-sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448.tar.xz ']' ++ dirname /tmp/tmp.JgJ9IqgbY6/unpacked_sosreport/sosreport-LogCollector-20171115133448/log-collector-data/postgresql-sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448.tar.xz + tar -C /tmp/tmp.JgJ9IqgbY6/unpacked_sosreport/sosreport-LogCollector-20171115133448/log-collector-data -Jxf /tmp/tmp.JgJ9IqgbY6/unpacked_sosreport/sosreport-LogCollector-20171115133448/log-collector-data/postgresql-sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448.tar.xz ++ tar tf /tmp/tmp.JgJ9IqgbY6/unpacked_sosreport/sosreport-LogCollector-20171115133448/log-collector-data/postgresql-sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448.tar.xz ++ grep sos_pgdump.tar + PG_DUMP_TAR= + : + '[' '!' '' ']' + echo 'Unable to detect sos_pgdump.tar from sosreport, aborting..' /usr/share/ovirt-log-collector/analyzer/unpackAndPrepareDump.sh: line 86: /dev/stderr: Permission denied bash-4.2$ ll /tmp/tmp.JgJ9IqgbY6/unpacked_sosreport/sosreport-LogCollector-20171115133448/log-collector-data/postgresql-sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448.tar.xz bash-4.2$ tar -tf /tmp/tmp.JgJ9IqgbY6/unpacked_sosreport/sosreport-LogCollector-20171115133448/log-collector-data/postgresql-sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448.tar.xz sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/ sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/sos_commands/ sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/sos_logs/ sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/sos_logs/sos.log sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/sos_logs/ui.log sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/sos_reports/ sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/sos_reports/sos.html sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/sos_reports/sos.txt sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/var/ sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/var/lib/ sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/var/lib/pgsql/ sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/var/lib/pgsql/initdb_rh-postgresql95-postgresql.log sosreport-10-37-138-147.rhev.lab.eng.brq.redhat.com-20171115133448/version.txt There is no database dump. Sosreport postgres sqlplugin uses: option_list = [ ('pghome', 'PostgreSQL server home directory.', '', '/var/lib/pgsql'), ('username', 'username for pg_dump', '', 'postgres'), ('password', 'password for pg_dump' + password_warn_text, '', False), ('dbname', 'database name to dump for pg_dump', '', ''), ('dbhost', 'database hostname/IP (do not use unix socket)', '', ''), ('dbport', 'database server port number', '', '5432') pghome is incorrect in the plugin 2017-11-15 18:27:47,801 DEBUG: [plugin:postgresql] could not run 'pg_dump -U engine -h localhost -p 5432 -w -f /tmp/tmp_FYJKL/sos_pgdump.tar -F t engine': command not found 2017-11-15 18:27:47,801 INFO: [plugin:postgresql] Unable to execute pg_dump. Error() 2017-11-15 18:27:47,869 DEBUG: [plugin:postgresql] could not run 'scl enable rh-postgresql95 -- pg_dump -U engine -h localhost -p 5432 -w -f /tmp/tmpXMhHFL/sos_scl_pgdump.tar -F t engine': command not found 2017-11-15 18:27:47,870 INFO: [plugin:postgresql] Unable to execute pg_dump. Error() (In reply to David Necpal from comment #3) > RHV 4.2 uses pg 9.5 which is located /opt/rh/rh-postgresql95/root/ .. and this is the problem. Since on one side: # which pg_dump /opt/rh/rh-postgresql95/root/usr/bin/pg_dump # that is due to: # echo $PATH /opt/rh/rh-postgresql95/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin # While sosreport knows just this PATH (when I print os.environ["PATH"] there): /usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin I tried to identify where you add the /opt/rh/rh-postgresql95/root/usr/bin to the PATH, but no .bash_profile or .bashrc or /etc/bashrc sets it. So thats why sosreport doesnt see that path. Where the PATH is set that way? In a final solution of this problem, we shouldnt set it inside sosreport or call pg_dump with full path there, since the location of pg_dump depends on package it provides it, e.g.: # which pg_dump /usr/bin/pg_dump # rpm -qf /usr/bin/pg_dump postgresql-9.2.23-1.el7_4.x86_64 # (In reply to Pavel Moravec from comment #5) > (In reply to David Necpal from comment #3) > > RHV 4.2 uses pg 9.5 which is located /opt/rh/rh-postgresql95/root/ > > .. and this is the problem. Since on one side: > > # which pg_dump > /opt/rh/rh-postgresql95/root/usr/bin/pg_dump > # > > that is due to: > > # echo $PATH > /opt/rh/rh-postgresql95/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/ > sbin:/usr/bin:/root/bin > # > > While sosreport knows just this PATH (when I print os.environ["PATH"] there): > > /usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin > > I tried to identify where you add the /opt/rh/rh-postgresql95/root/usr/bin > to the PATH, but no .bash_profile or .bashrc or /etc/bashrc sets it. So > thats why sosreport doesnt see that path. > > Where the PATH is set that way? In a final solution of this problem, we > shouldnt set it inside sosreport or call pg_dump with full path there, since > the location of pg_dump depends on package it provides it, e.g.: > > # which pg_dump > /usr/bin/pg_dump > # rpm -qf /usr/bin/pg_dump > postgresql-9.2.23-1.el7_4.x86_64 > # Shouldn't 'scl enable rh-postgresql95 -- ...' make the trick to find the proper path? (In reply to Douglas Schilling Landgraf from comment #6) > (In reply to Pavel Moravec from comment #5) > > (In reply to David Necpal from comment #3) > > > RHV 4.2 uses pg 9.5 which is located /opt/rh/rh-postgresql95/root/ > > > > .. and this is the problem. Since on one side: > > > > # which pg_dump > > /opt/rh/rh-postgresql95/root/usr/bin/pg_dump > > # > > > > that is due to: > > > > # echo $PATH > > /opt/rh/rh-postgresql95/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/ > > sbin:/usr/bin:/root/bin > > # > > > > While sosreport knows just this PATH (when I print os.environ["PATH"] there): > > > > /usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin > > > > I tried to identify where you add the /opt/rh/rh-postgresql95/root/usr/bin > > to the PATH, but no .bash_profile or .bashrc or /etc/bashrc sets it. So > > thats why sosreport doesnt see that path. > > > > Where the PATH is set that way? In a final solution of this problem, we > > shouldnt set it inside sosreport or call pg_dump with full path there, since > > the location of pg_dump depends on package it provides it, e.g.: > > > > # which pg_dump > > /usr/bin/pg_dump > > # rpm -qf /usr/bin/pg_dump > > postgresql-9.2.23-1.el7_4.x86_64 > > # > > Shouldn't 'scl enable rh-postgresql95 -- ...' make the trick to find the > proper path? I dont know scl, so I cant answer if it *should*, but as it seems it does not: [root@system-ge-3 ~]# export PATH=/usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin [root@system-ge-3 ~]# scl enable rh-postgresql95 -- pg_dump -U engine -h localhost -p 5432 -w -f /tmp/sos_scl_pgdump.tar -F t engine /var/tmp/sclBPKS4Q: line 8: pg_dump: command not found [root@system-ge-3 ~]# (just a hint: the problem is PATH env.variable is "lost" during calling sosreport from bash. Run sosreport inside / from strace to identify what execve does) Hi Pavel, (In reply to Pavel Moravec from comment #8) > (just a hint: the problem is PATH env.variable is "lost" during calling > sosreport from bash. Run sosreport inside / from strace to identify what > execve does) I have added a debug message to display the PATH env var into the postgresql plugin of sos (before executing any command), here the output: # sosreport --batch -o postgresql -k postgresql.dbname=engine -k postgresql.dbhost=localhost -k postgresql.dbport=5432 -k postgresql.username=engine <snip> /usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin pg_dump -U engine -h localhost -p 5432 -w -f /tmp/tmpQLD6KX/sos_pgdump.tar -F t engine /usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin scl enable rh-postgresql95 -- pg_dump -U engine -h localhost -p 5432 -w -f /tmp/tmpHCmGo8/sos_scl_pgdump.tar -F t engine </snip> The last one should be the postgresql95, from the logs: 2017-11-21 09:15:01,735 INFO: [plugin:postgresql] command 'scl' not found in / - re-trying in host root 2017-11-21 09:15:01,750 INFO: [plugin:postgresql] Unable to execute pg_dump. Error() # whereis scl scl: /usr/bin/scl The PATH before the program shows /usr/bin. From outside of sosreport, subprocess module from python seems to find scl command. import subprocess subprocess.call(['scl', 'enable', 'rh-postgresql95', '--', 'pg_dump', '-U', 'engine', '-h', 'localhost', '-p', '5432', '-w','-f','/tmp/tmpHCmGo8/sos_scl_pgdump.tar', '-F', 't', 'engine'], shell=False) Other point, looks like you guys will rename the output tar from sos_pgdump.tar to sos_scl_pgdump.tar, is there a reason to change the name? It can break tools that look for this file. The problem isnt with inability to run "scl" command, but to run "pg_dump" command. Try adding self.add_cmd_output('scl enable rh-postgresql95 -- date') to the plugin and you will notice the "date" command output is collected. I must take back my statement the problem is outside sos - to some extent. sosreport expects all binaries are under several paths, see: https://github.com/sosreport/sos/commit/e0d132e60d4be5fcb4a2aeaf8bdb6042f5b42ab5#diff-dd7b3a988df47c4410a00a2a6caeac66R51 where sos overrides PATH env. variable. There are 3 possible solutions here: 1) make a symlink like: ln -s /opt/rh/rh-postgresql95/root/usr/bin/pg_dump /usr/bin/pg_dump (I dot like this much, since it is one-shot fix just for pg_dump and prevents parallel installing "normal" postgres) 2) Teach sosreport to use SCL paths for commands. I.e. before running a command, from SCL collection $scl use precede any path in PATH by /opt/rh/${scl}/root/ - then e.g. binaries under /opt/rh/{$scl}/root/usr/bin/ will be taken like being under /usr/bin/ . This is imho long-term solution, but risky fix in short term. 3) Do a postgres-related/limited fix of the above in using the three below (now commented) lines in postgresql.py: if scl in self.scls_matched: # newpath = os.environ["PATH"] + os.pathsep + '/opt/rh/rh-postgresql95/root/usr/bin' # self.policy().PATH = newpath # self.policy().set_exec_path() I would vote for 3) in RHEL 7.4.z and 7.5 and 2) in longer perspective (RHEL7.6). Do you agree? > Other point, looks like you guys will rename the output tar from sos_pgdump.tar to sos_scl_pgdump.tar, is there a reason to change the name? It can break tools that look for this file. I wanted to distunguish the filenames to see which one was generated by regular postgres cmd and which one via SCL-based cmd. If this is dealt as a bug and not a feature, I can easily change it. Should I? For the permanent solution, I need to know generic answer to a question "how to properly run a binary deployed from SCL package?". - having a binary B installed "regularly" (not via SCL), B should be accessible via/with PATH = /sbin:/bin:/usr/sbin:/usr/bin:/root/bin - having a binary B from some particular SCL, where the binary should be placed in? What command should be used to invoke it? I.e. for our case, pg_dump is "normally" under /usr/bin/pg_dump. When using rh-postgresql95 SCL, it is under /opt/rh/rh-postgresql95/root/usr/bin . Does it mean that I can add to PATH /opt/rh/${scl}/root/usr/bin for each and every SCL on the system? Or do I require to precede each and every command by: scl enable ${scl} -- .. as well? Path should be already set for this while using scl, however path is being cut down by scl (guess). We saw that pg95 scl path is not included when sosreport is running pg_dump # PATH=/usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin scl enable rh-postgresql95 -- pg_dump -U engine -h localhost -p 5432 -w -f /tmp/sos_scl_pgdump.tar -F t engine /var/tmp/scliff0lx: line 8: pg_dump: command not found # scl enable rh-postgresql95 -- sosreport --batch -o postgresql -k postgresql.dbname=engine -k postgresql.dbhost=localhost -k postgresql.dbport=5432 -k postgresql.username=engine Setting up archive ... Setting up plugins ... DEBUG: PATH=/usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin, pg_dump_command=pg_dump <<<<< missing scl pg 9.5 path (In reply to Pavel Moravec from comment #10) > The problem isnt with inability to run "scl" command, but to run "pg_dump" > command. Try adding > > self.add_cmd_output('scl enable rh-postgresql95 -- date') > > to the plugin and you will notice the "date" command output is collected. > > > I must take back my statement the problem is outside sos - to some extent. > sosreport expects all binaries are under several paths, see: > > https://github.com/sosreport/sos/commit/ > e0d132e60d4be5fcb4a2aeaf8bdb6042f5b42ab5#diff- > dd7b3a988df47c4410a00a2a6caeac66R51 > > where sos overrides PATH env. variable. > > There are 3 possible solutions here: > > 1) make a symlink like: > > ln -s /opt/rh/rh-postgresql95/root/usr/bin/pg_dump /usr/bin/pg_dump > > (I dot like this much, since it is one-shot fix just for pg_dump and > prevents parallel installing "normal" postgres) > > 2) Teach sosreport to use SCL paths for commands. I.e. before running a > command, from SCL collection $scl use precede any path in PATH by > /opt/rh/${scl}/root/ - then e.g. binaries under /opt/rh/{$scl}/root/usr/bin/ > will be taken like being under /usr/bin/ . > > This is imho long-term solution, but risky fix in short term. > > 3) Do a postgres-related/limited fix of the above in using the three below > (now commented) lines in postgresql.py: > > if scl in self.scls_matched: > # newpath = os.environ["PATH"] + os.pathsep + > '/opt/rh/rh-postgresql95/root/usr/bin' > # self.policy().PATH = newpath > # self.policy().set_exec_path() > > I would vote for 3) in RHEL 7.4.z and 7.5 and 2) in longer perspective > (RHEL7.6). Do you agree? > > Works for me. I have tested the your suggestion and works. > > Other point, looks like you guys will rename the output tar from sos_pgdump.tar to sos_scl_pgdump.tar, is there a reason to change the name? It can break tools that look for this file. > > I wanted to distunguish the filenames to see which one was generated by > regular postgres cmd and which one via SCL-based cmd. If this is dealt as a > bug and not a feature, I can easily change it. Should I? I would vote to keep the original name to avoid tools that have being working with sos start breaking. Reassigning to RHEL&sos. Upstream PR for generic solution attached(*) and currently applied on the testing machine (to revert it, copy postgresql.py.orig and __init__.py.orig to their proper location). (*) the only concern I have is [1] - how to easily detect provider of a given SCL? [1] https://github.com/sosreport/sos/pull/1154#issuecomment-346936562 I am currently planning this to RHEL7.5 as followup of bz1515113. If you need a fix in 7.4.z, then please: 1) raise rhel-7.4.z? flag 2) provide business justification 3) add OtherQE and provide a QE contact who can verify the fix once available (In reply to Pavel Moravec from comment #17) > I am currently planning this to RHEL7.5 as followup of bz1515113. > > If you need a fix in 7.4.z, then please: > 1) raise rhel-7.4.z? flag Done > 2) provide business justification We are analyzing RHV users cases/reports from 3.x to 4.x with sosreport attached to identify possible issues or real issues, specially in pre/after upgrade. Currently, sos tool cannot collect postgresql dump from postgresql95 affecting this analyze. > 3) add OtherQE and provide a QE contact who can verify the fix once available I believe the reporter from this report is the QE, David Necpal. The tar archive is really damaged. Reproducible via script: import codecs import os from subprocess import Popen, PIPE, STDOUT dest = '/tmp/pg_dump-mimicked.tar' cmd_env = os.environ cmd_env['LC_ALL'] = 'C' expanded_args = ['timeout', '300s', 'scl', 'enable', 'rh-postgresql95', 'PATH=/opt/rh/rh-postgresql95/usr/local/sbin:/opt/rh/rh-postgresql95/usr/local/bin:/opt/rh/rh-postgresql95/root/bin:/opt/rh/rh-postgresql95/usr/bin:/opt/rh/rh-postgresql95/usr/sbin:/usr/sbin:/usr/bin:/root/bin:/usr/local/bin:/usr/local/sbin pg_dump -U engine -h localhost -p 5432 -w -F t engine'] p = Popen(expanded_args, shell=False, stdout=PIPE, stderr=PIPE, bufsize=-1, env=cmd_env, close_fds=True) stdout, stderr = p.communicate() content = stdout#.decode('utf-8', 'ignore') f = codecs.open(dest, 'w', encoding='utf-8') if isinstance(content, bytes): content = content.decode('utf8', 'ignore') f.write(content) f.close() called as: # export PGPASSWORD=foobar # python mimic_pgdump_via_python.py # tar tvf /tmp/pg_dump-mimicked.tar -rw------- postgres/postgres 1649767 2017-12-19 15:16 toc.dat tar: Skipping to next header tar: Exiting with failure status due to previous errors # While I dont see anything wrong in that script (so far).. TL;DR: sosreport strips some bytes from binary tarball as it deduces these are not UTF-8 characters. Smaller reproducer: script that should copy aaa.tar to bbb.tar: import codecs import os from subprocess import Popen, PIPE, STDOUT dest = 'bbb.tar' expanded_args = ['cat', 'aaa.tar'] p = Popen(expanded_args, shell=False, stdout=PIPE, stderr=PIPE, bufsize=-1, env=os.environ, close_fds=True) stdout, stderr = p.communicate() content = stdout.decode('utf-8', 'ignore') f = codecs.open(dest, 'w', encoding='utf-8') if isinstance(content, bytes): content = content.decode('utf8', 'ignore') f.write(content) f.close() Now run: scl enable rh-postgresql95 'pg_dump -U engine -h localhost -p 5432 -w -F t engine' > aaa.tar rm -f bbb.tar python mimic_pgdump_via_python.py tar tvf bbb.tar -rw------- postgres/postgres 1649767 2017-12-19 15:57 toc.dat tar: Skipping to next header tar: Exiting with failure status due to previous errors hexdump -C shows the python script removes some characters, like: 00000680 43 4c 00 01 00 00 00 00 a2 00 00 00 52 45 56 4f |CL..........REVO| to: 00000680 43 4c 00 01 00 00 00 00 00 00 00 52 45 56 4f 4b |CL.........REVOK| (see the removed "a2" byte there) or: 000008e0 00 00 00 99 08 00 00 00 00 00 00 00 00 04 00 00 |................| to: 000008e0 00 00 00 08 00 00 00 00 00 00 00 00 04 00 00 00 |................| (missing "99") The reason is "stdout.decode('utf-8', 'ignore')" or similar calls ignores these bytes as failing to decode them as UTF8 character :( This problem is dealt under bz1515113 as well, so I am closing this as a duplicate. *** This bug has been marked as a duplicate of bug 1515113 *** |