Bug 1151903

Summary: virt tools --csv output should have a meaningful output
Product: [Community] Virtualization Tools Reporter: Lingfei Kong <lkong>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: NEW --- QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: medium    
Version: unspecifiedCC: leiwang, linl, ptoscano, wshi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1151905 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1151905, 1288337, 1301891    

Description Lingfei Kong 2014-10-13 02:42:19 UTC
Description of problem:
Some virt tools have a --csv option, when it works with other options the output maybe meaningless. Take virt-diff for example. When i run virt-diff with --csv option, i found the output is not a csv file, the output contains many lines which have the same content, and the field number in each line is not equal. It is hard to read such output and import it to a database. If virt-diff provide --csv option then it should make it easy to use, if it combine with other options then it should make sure the output is meaningful. The --csv problem may also exist in other virt tools or combine with other options, such as:
#virt-ls -lR -a rhel.img / --uids --csv |awk -F, '{print NF}' |sort
#virt-diff -a rhel6.6.img -A rhel.img --times --csv  |awk -F, '{print NF}'|sort


Version-Release number of selected component (if applicable):
libguestfs-1.27.61-1.1.el7


How reproducible:
100%


Steps to Reproduce:
1. Get a rhel guest image rhel6.6.img
2. #cp rhel6.6.img rhel.img
3. Boot the guest with rhel.img and shutdown it
4. #virt-diff -a rhel6.6.img -A rhel.img --atime --csv
5. #virt-diff -a rhel6.6.img -A rhel.img --atime --csv  |awk -F, '{print NF}' |sort


Actual results:
.....
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,s,0666,0,/var/spool/postfix/public/cleanup
+,s,0666,0,/var/spool/postfix/public/cleanup
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,s,0666,0,/var/spool/postfix/public/flush
+,s,0666,0,/var/spool/postfix/public/flush
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,p,0622,0,/var/spool/postfix/public/pickup
+,p,0622,0,/var/spool/postfix/public/pickup
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,p,0622,0,/var/spool/postfix/public/qmgr
+,p,0622,0,/var/spool/postfix/public/qmgr
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,s,0666,0,/var/spool/postfix/public/showq
+,s,0666,0,/var/spool/postfix/public/showq
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec

#virt-diff -a rhel6.6.img -A rhel.img --atime --csv  |awk -F, '{print NF}' |sort
....
5
5
3
5
5
3
5
5
3
6
6
3
5
5
....


Expected results:
The best way for the csv output is only contains one header in the first line instead of print it many times and output the field even if the field have no value, i.e. each line should have the same number of field.


Additional info:

Comment 1 Pino Toscano 2014-10-14 09:11:57 UTC
I agree, the CSV output of virt-diff is not optimal at all, as it is basically a comma-separated version of the normal output.

The number of fields changes (lines with the first field being «-», «+», or «=»), other than the virt-diff options (like --extra-stats, etc), depending on:
- file type (symlinks have an additional field after the file name for the link target)
- xattrs (two fields for name and value are added at the end for each xattr)
While it is not a problem to parse, that makes it harder to import directly with less manual check of the fields.

Regarding the repeated lines (like «#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec»): they are not headers, but appear after two lines with the first field being «-» or «+», and they list all the attributes changed in the files mentioned in the two lines above. For example:

  -,s,0666,0,/var/spool/postfix/public/cleanup
  +,s,0666,0,/var/spool/postfix/public/cleanup
  #,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec

This means the content of the file /var/spool/postfix/public/cleanup did not change, but some of its attributes did, and they are: st_ino (inode number), st_atime_sec (access time), st_mtime_sec (modification time), and st_ctime_sec (creation time).
Something that could be changed here is to not output the «changed:» field in csv mode, since that is not useful (the first field being «#» already indicates that). 

Ah, small niptick: uid and gid fields are padded to 4 characters even in csv mode, sent a small patch for it:
https://www.redhat.com/archives/libguestfs/2014-October/msg00105.html

Comment 5 Richard W.M. Jones 2017-02-16 14:54:56 UTC
This is a genuine bug but until RHEL customers complain I'm going
to deal with it as an upstream bug.