Here are results from running the benchmark with various kernels, while keeping xen at xen-3.0.3-96.el5:
host kernel guest kernel files/sec app overhead
------------------------------------------------------------
166xen (not virt) 139.1 163551
166xen 166xen 31.6 89855
166xen 155xen 49.7 83991
------------------------------------------------------------
162xen (not virt) 137.1 163551
162xen 166xen 31.8 91714
162xen 155xen 34.1 86513
------------------------------------------------------------
155xen (not virt) 113.3 175644
155xen 166xen 32.0 131550
155xen 155xen 48.8 84895
All numbers from "fs_mark -d test -d test2 -s 51200 -n 4096", running on 1 VCPU for virtualized, and 4 VCPUs for bare metal. The numbers are decently reproducible, but not perfectly, and they are quite weird. So I'd say they're at best non-conclusive. At most they hint that 166xen is not faster than 155xen. :-/ They also say that (more or less...) the host kernel is not relevant.
Important for anyone that wants to reproduce e.g. under KVM: if you use the same VM with different kernel versions, remember to drop caches (echo 3 > /proc/sys/vm/drop_caches) on the guest _and especially on the host_.
What kind of storage are you using? If this is on a local SAS class drive, the lower numbers are definitely more realistic. The 138 files/sec might be an indication that the write barrier code is not working properly?