While working on improving x86-64 memset, I noticed that movnti is called without sfence: 11: movnti %r8,(%rcx) movnti %r8,0x8(%rcx) movnti %r8,0x10(%rcx) movnti %r8,0x18(%rcx) movnti %r8,0x20(%rcx) movnti %r8,0x28(%rcx) movnti %r8,0x30(%rcx) movnti %r8,0x38(%rcx) add $0x40,%rcx dec %rax jne 11b jmp 4b It should be 11: movnti %r8,(%rcx) movnti %r8,0x8(%rcx) movnti %r8,0x10(%rcx) movnti %r8,0x18(%rcx) movnti %r8,0x20(%rcx) movnti %r8,0x28(%rcx) movnti %r8,0x30(%rcx) movnti %r8,0x38(%rcx) add $0x40,%rcx dec %rax jne 11b sfence jmp 4b
Created attachment 250381 [details] A patch to add sfence after movnti
Added upstream.
And fixed in rawhide too.