Description of problem: Didn't notice this one until it was deployed into production unfortunately. For an existing task, if user1 runs "retrace-server-interact 980813160 crash", it will create the 'crash_cmd' file just fine. However a second user running the same command will get this: $ retrace-server-interact 980813160 crash Traceback (most recent call last): File "/usr/bin/retrace-server-interact", line 119, in <module> task.set_crash_cmd(' '.join(crash_cmd)) File "/usr/lib/python2.6/site-packages/retrace/retrace.py", line 2173, in set_crash_cmd self.set(RetraceTask.CRASH_CMD_FILE, data) File "/usr/lib/python2.6/site-packages/retrace/retrace.py", line 1607, in set with open(self._get_file_path(key), mode) as f: IOError: [Errno 13] Permission denied: '/cores/retrace/tasks/980813160/crash_cmd' $ ls -lh /cores/retrace/tasks/980813160/crash_cmd -rw-rw-r--. 1 user1 user1 5 Feb 29 09:29 /cores/retrace/tasks/980813160/crash_cmd This is due to the fact that the task directory does not have setgid bit set, and for some reason everytime we run 'crash' we're trying to re-write that crash_cmd file. I think this is due to kernel version detection getting called from that code path. Version-Release number of selected component (if applicable): retrace-server-1.14-2.el6.noarch How reproducible: Everytime on an existing vmcore. Steps to Reproduce: 1. User1 runs retrace-server-interact <taskid> crash - this creates the 'crash_cmd' file with user's group permissions 2. User2 tries to run retrace-server-interact <taskid> crash - this backtraces due to being unable to write the 'crash_cmd' file Actual results: user2 unable to open the vmcore Expected results: user2 able to open the vmcore Additional info: This is due to a couple things: 1. There is no setgid bit set for each 'taskid' directory 2. The prepare_debuginfo may change the 'crash_cmd' so we need to write it after running prepare_debuginfo 3. Everytime any use calls 'retrace-server-interact <taskid> crash' we call prepare_debuginfo. Here is the affected code. /usr/bin/retrace-server-interact: 110 111 hostarch = os.uname()[4] 112 if hostarch in ["i486", "i586", "i686"]: 113 hostarch = "i386" 114 115 if args.action == "crash": 116 if not task.use_mock(kernelver): 117 crash_cmd = task.get_crash_cmd().split() 118 vmlinux = prepare_debuginfo(vmcore, kernelver=kernelver, crash_cmd=crash_cmd) 119--> task.set_crash_cmd(' '.join(crash_cmd)) 120 if task.has_crashrc(): 121 cmdline = task.get_crash_cmd().split() + ["-i", task.get_crashrc_path(), vmcore, vmlinux] 122 else: 123 cmdline = task.get_crash_cmd().split() + [vmcore, vmlinux] 124 else: 125 cfgdir = os.path.join(CONFIG["SaveDir"], "%d-kernel" % task.get_taskid()) 126 vmlinux = prepare_debuginfo(vmcore, chroot=cfgdir, kernelver=kernelver, crash_cmd=task.get_crash_cmd().split()) 127 if task.has_crashrc(): 128 cmdline = ["/usr/bin/mock", "--configdir", cfgdir, Will need to look at possible solutions. One thought brought up was making the task directory setgid AuthGroup. However, I'll have to look closer at this since it's not obvious to me why we'd want to write that file again if nothing's changed, or even why we're calling prepare_debuginfo unconditionally (maybe we need the vmlinux path?). It seems like detection may be broken here since we should not be calling prepare_debuginfo but one time at kernel version detection time. Once we have the kernelver we shouldn't call that again at least I don't think we should.
The more I think about it the more I think having any file inside a 'tasks' directory with user != retrace or group != AuthGroup is probably wrong. Maybe the only way to do that is the setgid / setuid bits on the directory.
This is actually a much bigger problem than I thought. We have a regression in the retrace-server-worker --restart command, as well as the problem with the crash_cmd file. Now we get the following backtrace if we try a retrace-server-worker --restart. Something changed in the management of the retrace_log file in the latest retrace-server-1.14-2.el6.noarch $ retrace-server-worker --restart 933538188 Traceback (most recent call last): File "/usr/bin/retrace-server-worker", line 28, in <module> worker.begin_logging() File "/usr/lib/python2.6/site-packages/retrace/retrace_worker.py", line 17, in begin_logging self.task._get_file_path(RetraceTask.LOG_FILE)) File "/usr/lib64/python2.6/logging/__init__.py", line 827, in __init__ StreamHandler.__init__(self, self._open()) File "/usr/lib64/python2.6/logging/__init__.py", line 846, in _open stream = open(self.baseFilename, self.mode) IOError: [Errno 13] Permission denied: '/retrace/tasks/933538188/retrace_log'
Created attachment 1133142 [details] WIP patch for resolving crash_cmd issue - setuid / setgid to retrace:CONFIG["AuthGroup"]
Comment on attachment 1133142 [details] WIP patch for resolving crash_cmd issue - setuid / setgid to retrace:CONFIG["AuthGroup"] Experimental patch to resolve this issue
Previous patch posted obviously doesn't work but if we could do that it would be ideal. Confirmed that setgid on the directory solves the original problem with crash_cmd on existing tasks. I'm not sure that's the greatest solution though. We still have the issue with the worker restart backtrace though so we may need to do significant refactoring and add a helper to write files in the retrace directory, not sure.
Actually the retrace-server-worker --restart bug looks like it will require a much different / unrelated fix so I've renamed this one back to its original purpose and opened a different bug for the --restart: https://bugzilla.redhat.com/show_bug.cgi?id=1314897
Note that this bug _only_ affects existing tasks that were created prior to retrace-server-1.14-2 due to the fact that the crash_cmd file was not created during the retrace of the vmcore. It is thus really a problem only present during upgrading, and if someone re-creates the crash_cmd file with improper group ownership. That said, there is a lot of problems with ownership of files in 'SaveDir' if a user runs interactive mode commands. For example, I just found it's even possible to take down the 'manager' interface if a user issues --restart of a task and the 'other' permission is not readable on some file required for the 'manager' page. To prevent such happenings we probably need setgid on the SaveDir or some serious refactoring. This is not a new issue though and it's only become critical due to the 'crash_cmd' file which doesn't exist on old tasks. We could just defer this bug and just prior to the upgrade / outage, do the following: 1. Manually create a default "crash_cmd" file for all tasks and with proper retrace:CONFIG["AuthGroup"] perms 2. Upgrade to retrace-server-1.14-2 3. Check and make sure existing tasks can be loaded by multiple users 4. Check to make sure new tasks can be loaded by multiple users 5. Check that retrace-server-worker --restart works Unfortunately the problem is step #5 will create the same scenario that exists in this bug -- crash_cmd will contain user1's perms and user2 won't be able to access the vmcore. This is due to the fact that --restart removes 'crash_cmd' as well as other task state files, as well as the fact we're re-writing the crash_cmd file everytime we issue retrace-server-interact. I may try to fix this last part - that is, I don't think we should be rewriting crash_cmd everytime we issue retrace-server-interact. However fixing this means dealing with where we obtain the vmlinux file, i.e. prepare_debuginfo, which is a whole other mess...
Created attachment 1133267 [details] patch to fix this bug
Created attachment 1133268 [details] patch to fix this bug
Posted pull request on https://github.com/abrt/retrace-server
Created attachment 1133306 [details] updated patch to fix this bug - make chown/chgrp best effort
Created attachment 1133307 [details] updated patch to fix this bug - make chown/chgrp best effort
*** Bug 1195786 has been marked as a duplicate of this bug. ***
retrace-server-1.15-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-88382c37cc
retrace-server-1.15-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-3fd6c52337
retrace-server-1.15-1.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1082131b97
retrace-server-1.15-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-505f2e5c59
retrace-server-1.15-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-3fd6c52337
retrace-server-1.15-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-505f2e5c59
retrace-server-1.15-1.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1082131b97
retrace-server-1.15-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-88382c37cc
retrace-server-1.15-2.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1082131b97
retrace-server-1.15-2.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-505f2e5c59
retrace-server-1.15-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-3fd6c52337
retrace-server-1.15-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-88382c37cc
retrace-server-1.15-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-3fd6c52337
retrace-server-1.15-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-88382c37cc
retrace-server-1.15-2.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-1082131b97
retrace-server-1.15-2.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-505f2e5c59
retrace-server-1.15-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
retrace-server-1.15-2.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.
retrace-server-1.15-2.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.
retrace-server-1.16-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-81eb6f786a
retrace-server-1.16-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-81eb6f786a
I verified this has been fixed in retrace-server-1.15-1.el6
retrace-server-1.16-2.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-81eb6f786a
retrace-server-1.16-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-81eb6f786a