5
class Base 
{
public:
    virtual void fnc(size_t nm) 
    {
        // do some work here
    }

    void process()
    {
        for(size_t i = 0; i < 1000; i++)
        {
            fnc(i);
        }
    }
}  

Can and will the c++ compiler optimize calls to the fnc function from the process funtion, considering its going to be the same function every time it's invoked inside the loop ? Or is it gonna fetch the function adress from the vtable every time the function is invoked ?

  • 4
    The answer to this question will most likely depend on your compiler, compiler version and compilation flags. Optimizations are mostly at the discretion of the implementation, as long as defined behavior is not altered. – François Andrieux May 4 '17 at 18:12
  • 2
    In short, it is allowed to optimize it, if that's your concern. If it doesn't, feel free to send a bug report to your compiler provider – KABoissonneault May 4 '17 at 18:15
  • 2
    Rules that force optimization generally limit debugability. C++ generally prefers leaving code generation "implementation defined", or "undefined behavior" when it makes sense – KABoissonneault May 4 '17 at 18:17
  • 1
    I just tried it on gcc 6.3.0 (Debian) and it does fetch the function pointer for every iteration. Interestingly, it did seem to have some optimization where it compared the fetched pointer to the address of Base::fnc and if it compared equal then it skipped the call. – Daniel Schepler May 4 '17 at 18:22
  • 2
    One thing to note is that if I remember correctly, fnc is technically allowed to change the dynamic type of *this through placement new, and therefore the compiler has to be conservative on devirtualization. However, I believe Clang provides an extension to make the compiler assume this never happens. Source: blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html – KABoissonneault May 4 '17 at 18:29
0

I checked an example on godbolt.org. the result is that NO, none of the compiler optimise that.

Here's the test source:

class Base 
{
public:
// made it pure virtual to decrease clutter
    virtual void fnc(int nm) =0;
    void process()
    {
        for(int i = 0; i < 1000; i++)
        {
            fnc(i);
        }
    }
};

void test(Base* b ) {
    return b->process();
}

and the generated asm:

test(Base*):
        push    rbp       ; setup function call 
        push    rbx
        mov     rbp, rdi  ; Base* rbp 
        xor     ebx, ebx  ; int ebx=0;
        sub     rsp, 8    ; advance stack ptr
.L2:
        mov     rax, QWORD PTR [rbp+0]  ; read 8 bytes from our Base*
                                        ; rax now contains vtable ptr 
        mov     esi, ebx                ; int parameter for fnc
        add     ebx, 1                  ; i++
        mov     rdi, rbp                ; (Base*) this parameter for fnc
        call    [QWORD PTR [rax]]       ; read vtable and call fnc
        cmp     ebx, 1000               ; back to the top of the loop 
        jne     .L2
        add     rsp, 8                  ; reset stack ptr and return
        pop     rbx
        pop     rbp
        ret

as you can see it reads the vtable on each call. I guess it's because the compiler can't prove that you don't change the vtable inside the function call (e.g. if you call placement new or something silly), so, technically, the virtual function call could change between iterations.

1

Usually, compilers are allowed to optimize anything that doesn't change the observable behavior of a program. There are some exceptions, such as eliding non-trivial copy constructors when returning from a function, but it can be assumed that any change in expected code generation that does not change the output or the side effects of a program in the C++ Abstract Machine can be done by the compiler.

So, can devirtualizing a function change the observable behavior? According to this article, yes.

Relevant passage:

[...] optimizer will have to assume that [virtual function] might change the vptr in passed object. [...]

void A::foo() { // virtual 
 static_assert(sizeof(A) == sizeof(Derived)); 
 new(this) Derived; 
}

This is call of placement new operator - it doesn’t allocate new memory, it just creates a new object in the provided location. So, by constructing a Derived object in the place where an object of type A was living, we change the vptr to point to Derived’s vtable. Is this code even legal? C++ Standard says yes."

Therefore, if the compiler does not have access to the definition of the virtual function (and know the concrete type of *this at compile type), then this optimization is risky.

According to this same article, you use -fstrict-vtable-pointers on Clang to allow this optimization, at the risk of making your code less C++ Standard complying.

  • If someone can help me with the formating, the help would be appreciated – KABoissonneault May 4 '17 at 18:43
  • 3
    I wonder if calling a method that replaced *this would be allowed inside another method that continues to use the this pointer. AFAIK if you access an object whose lifetime has ended, it it UB. You would have to use the pointer you got from placement new to access the new object. My reasoning would be that the implicit this pointer of the process method still points to the old object and therefore the compiler may assume that it is still valid, but I may be wrong about this. – PaulR May 8 '18 at 12:46
  • Actually the following paragraph in your source goes along the same line, so the compiler should be able to assume this is still valid on use. – PaulR May 8 '18 at 12:49
  • Look, this answer's was a whole year ago, so I might be missing something. But creating a new object in the storage of an old one of the same type usually "magically" makes the pointers to the old object point to the new one. Notable exceptions are types with const or reference data members... which I'm not sure if vptrs count as the former. Overall, /shrug – KABoissonneault May 9 '18 at 13:34
0

I wrote a very small implementation and compiled them using g++ --save-temps opt.cpp. This flag kept the temporary preprocessed file, assembly file, & object file. I ran it once with the virtual keyword and once without. Here's the program.

class Base
{
    public:
        virtual int fnc(int nm)
        {
           int i = 0;
           i += 3;
           return i;
        }

        void process()
        {
           int x = 9;
           for(int i = 0; i < 1000; i++)
           {
              x += i;
           }
       }
   };

   int main(int argc, char* argv[]) {
       Base b;

       return 0;
   }

When I ran with the virtual keyword the resulting assembly on an x86_64 Linux box was:

.file  "opt.cpp"
    .section    .text._ZN4Base3fncEi,"axG",@progbits,_ZN4Base3fncEi,comdat
    .align 2
    .weak   _ZN4Base3fncEi
    .type   _ZN4Base3fncEi, @function
_ZN4Base3fncEi:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -24(%rbp)
    movl    %esi, -28(%rbp)
    movl    $0, -4(%rbp)
    addl    $3, -4(%rbp)
    movl    -4(%rbp), %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   _ZN4Base3fncEi, .-_ZN4Base3fncEi
    .text
    .globl  main
    .type   main, @function
main:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    leaq    16+_ZTV4Base(%rip), %rax
    movq    %rax, -16(%rbp)
    movl    $0, %eax
    movq    -8(%rbp), %rdx
    xorq    %fs:40, %rdx
    je  .L5
    call    __stack_chk_fail@PLT
.L5:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   main, .-main
    .weak   _ZTV4Base
    .section    .data.rel.ro.local._ZTV4Base,"awG",@progbits,_ZTV4Base,comdat
    .align 8
    .type   _ZTV4Base, @object
    .size   _ZTV4Base, 24
_ZTV4Base:
    .quad   0
    .quad   _ZTI4Base
    .quad   _ZN4Base3fncEi
    .weak   _ZTI4Base
    .section    .data.rel.ro._ZTI4Base,"awG",@progbits,_ZTI4Base,comdat
    .align 8
    .type   _ZTI4Base, @object
    .size   _ZTI4Base, 16
_ZTI4Base:
    .quad   _ZTVN10__cxxabiv117__class_type_infoE+16
    .quad   _ZTS4Base
    .weak   _ZTS4Base
    .section    .rodata._ZTS4Base,"aG",@progbits,_ZTS4Base,comdat
    .type   _ZTS4Base, @object
    .size   _ZTS4Base, 6
_ZTS4Base:
    .string "4Base"
    .ident  "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005"
    .section    .note.GNU-stack,"",@progbits

Without the virtual keyword, the final assembly was:

    .file   "opt.cpp"
    .text
    .globl  main
    .type   main, @function
main:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005"
    .section    .note.GNU-stack,"",@progbits

Now unlike the Posted question, this example doesn't even utilize the virtual method and the resulting assembly is much larger. I did not try compiling with optimizations but give it a go.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.