Bug 1151903 - virt tools --csv output should have a meaningful output
Summary: virt tools --csv output should have a meaningful output
Keywords:
Status: NEW
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libguestfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1151905 1288337 1301891
TreeView+ depends on / blocked
 
Reported: 2014-10-13 02:42 UTC by Lingfei Kong
Modified: 2021-04-19 10:34 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
: 1151905 (view as bug list)
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Lingfei Kong 2014-10-13 02:42:19 UTC
Description of problem:
Some virt tools have a --csv option, when it works with other options the output maybe meaningless. Take virt-diff for example. When i run virt-diff with --csv option, i found the output is not a csv file, the output contains many lines which have the same content, and the field number in each line is not equal. It is hard to read such output and import it to a database. If virt-diff provide --csv option then it should make it easy to use, if it combine with other options then it should make sure the output is meaningful. The --csv problem may also exist in other virt tools or combine with other options, such as:
#virt-ls -lR -a rhel.img / --uids --csv |awk -F, '{print NF}' |sort
#virt-diff -a rhel6.6.img -A rhel.img --times --csv  |awk -F, '{print NF}'|sort


Version-Release number of selected component (if applicable):
libguestfs-1.27.61-1.1.el7


How reproducible:
100%


Steps to Reproduce:
1. Get a rhel guest image rhel6.6.img
2. #cp rhel6.6.img rhel.img
3. Boot the guest with rhel.img and shutdown it
4. #virt-diff -a rhel6.6.img -A rhel.img --atime --csv
5. #virt-diff -a rhel6.6.img -A rhel.img --atime --csv  |awk -F, '{print NF}' |sort


Actual results:
.....
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,s,0666,0,/var/spool/postfix/public/cleanup
+,s,0666,0,/var/spool/postfix/public/cleanup
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,s,0666,0,/var/spool/postfix/public/flush
+,s,0666,0,/var/spool/postfix/public/flush
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,p,0622,0,/var/spool/postfix/public/pickup
+,p,0622,0,/var/spool/postfix/public/pickup
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,p,0622,0,/var/spool/postfix/public/qmgr
+,p,0622,0,/var/spool/postfix/public/qmgr
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec
-,s,0666,0,/var/spool/postfix/public/showq
+,s,0666,0,/var/spool/postfix/public/showq
#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec

#virt-diff -a rhel6.6.img -A rhel.img --atime --csv  |awk -F, '{print NF}' |sort
....
5
5
3
5
5
3
5
5
3
6
6
3
5
5
....


Expected results:
The best way for the csv output is only contains one header in the first line instead of print it many times and output the field even if the field have no value, i.e. each line should have the same number of field.


Additional info:

Comment 1 Pino Toscano 2014-10-14 09:11:57 UTC
I agree, the CSV output of virt-diff is not optimal at all, as it is basically a comma-separated version of the normal output.

The number of fields changes (lines with the first field being «-», «+», or «=»), other than the virt-diff options (like --extra-stats, etc), depending on:
- file type (symlinks have an additional field after the file name for the link target)
- xattrs (two fields for name and value are added at the end for each xattr)
While it is not a problem to parse, that makes it harder to import directly with less manual check of the fields.

Regarding the repeated lines (like «#,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec»): they are not headers, but appear after two lines with the first field being «-» or «+», and they list all the attributes changed in the files mentioned in the two lines above. For example:

  -,s,0666,0,/var/spool/postfix/public/cleanup
  +,s,0666,0,/var/spool/postfix/public/cleanup
  #,changed:,st_ino,st_atime_sec,st_mtime_sec,st_ctime_sec

This means the content of the file /var/spool/postfix/public/cleanup did not change, but some of its attributes did, and they are: st_ino (inode number), st_atime_sec (access time), st_mtime_sec (modification time), and st_ctime_sec (creation time).
Something that could be changed here is to not output the «changed:» field in csv mode, since that is not useful (the first field being «#» already indicates that). 

Ah, small niptick: uid and gid fields are padded to 4 characters even in csv mode, sent a small patch for it:
https://www.redhat.com/archives/libguestfs/2014-October/msg00105.html

Comment 5 Richard W.M. Jones 2017-02-16 14:54:56 UTC
This is a genuine bug but until RHEL customers complain I'm going
to deal with it as an upstream bug.


Note You need to log in before you can comment on or make changes to this bug.