Bug 677620

Summary: Blocked on QEMU - Add PID to process name resolution to vdsm logging
Product: Red Hat Enterprise Linux 5 Reporter: Dan Yasny <dyasny>
Component: vdsm22Assignee: Dan Kenigsberg <danken>
Status: CLOSED DUPLICATE QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.6CC: abaron, acathrow, bazulay, danken, iheim, srevivo, syeghiay, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-10 07:26:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 677614, 735716    
Bug Blocks:    

Description Dan Yasny 2011-02-15 11:42:36 UTC
Description of problem:
BZ#677614 asks for adding the PID of the process that sent a sigkill/sigterm to qemu to stdout of qemu. 
Since a PID as a number in the logs is useless we need to also resolve the PID to an actuall processname (probably filtering out vdsm itself)

The following patch has been tested and worked with the current vdsm22-4.5-63.9.el5 and kvm .224 patched with the qemu patch in BZ#677614 and resolved a case where a customers' custom script killed VMs, and vdsm logged those as "# Got shutdown request"

# diff -u vm.py.orig vm.py
--- vm.py.orig	2011-02-15 19:20:58.000000000 +0200
+++ vm.py	2011-02-15 19:20:45.000000000 +0200
@@ -1362,7 +1362,18 @@
         except:
             self.log.error(traceback.format_exc())
         try:
-            self.log.debug('qemu stdouterr: ' + file(self.dumpFile).read())
+            qemu_log = file(self.dumpFile).read()
+            qlog_arr = qemu_log.split()
+            for item in range(len(qlog_arr)):
+                if qlog_arr[item] == 'signal':
+                    try:
+                        pid_num = qlog_arr[item + 4]
+                        pid_cmd = '\nPID ' + pid_num + ' resolves to : ' + file('/proc/%s/cmdline' % pid_num).read() + '\n'
+                        qemu_log += qemu_log + pid_cmd
+                    except:
+                        pass
+            self.log.debug('qemu stdouterr: ' + qemu_log)
+                
         except:
             pass
         t = threading.Thread(target=self._prepostVmScript, args=['post_vm'])



Version-Release number of selected component (if applicable):
vdsm22-4.5-63.9.el5


How reproducible:
always

Steps to Reproduce:
1.install host with patched qemu and vm.py
2.kill -s 15 $qemu_pid
3.watch log 
  
Actual results:
will see "# Got shutdown request" as if this was a proper shutdown for the VM

Expected results:
will see the PID of the process and resolve the PID to the actual command and log it for troubleshooting.

It would also make sense to filter out all instances where the PID resolves to vdsm itself (an actual destroy call which is OK), so we only keep the actual external commands that interfered with qemu in the logs

Additional info:

Comment 1 Dan Kenigsberg 2011-02-17 09:55:54 UTC
Please attach how qemu's new stderr looks like, and in which qemu version it will appear (please add dependency on relevant qemu bug)

Comment 2 Dan Yasny 2011-02-17 10:14:39 UTC
(In reply to comment #1)
> Please attach how qemu's new stderr looks like, and in which qemu version it
> will appear (please add dependency on relevant qemu bug)

The QEMU BZ is https://bugzilla.redhat.com/show_bug.cgi?id=677614

The output currently looks like this:
  printf("Got signal %d from pid %d\n", info->si_signo, info->si_pid);   

How it will look when the BZ is finally resolved - don;t know yet, will have to ask Dor or Gleb; same for the qemu version this will appear in.

adding dep.

Comment 3 Dan Kenigsberg 2011-04-10 07:26:24 UTC
According to https://bugzilla.redhat.com/show_bug.cgi?id=677614#c13 this should be solved by having an auditctl rule to log all pids on creation. Such rule would enable the customer to identify the guilty process.

*** This bug has been marked as a duplicate of bug 677614 ***