Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization



On 6/23/07, Oleg Verych <olecom@xxxxxxxxxxxxxx> wrote:
Why not just show actual objdump output on code (maybe with different
oxygen atoms used in gcc), rather than *talking* about optimization and
standards, hm?
here is the objdump output of the two object files:
As you could see, the older one used 0x38 bytes stack space while the
new one used 0x28 bytes,
and the object code is two bytes less,
I think all these benefits are the gcc's __builtin_memset optimization
than the explicit call to memset.

$ objdump -d /tmp/init.orig.o|grep -A23 -nw '<paging_init>'
525:0000000000000395 <paging_init>:
526- 395: 48 83 ec 38 sub $0x38,%rsp
527- 399: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
528- 39e: fc cld
529- 39f: 31 c0 xor %eax,%eax
530- 3a1: 48 89 d7 mov %rdx,%rdi
531- 3a4: ab stos %eax,%es:(%rdi)
532- 3a5: ab stos %eax,%es:(%rdi)
533- 3a6: ab stos %eax,%es:(%rdi)
534- 3a7: ab stos %eax,%es:(%rdi)
535- 3a8: ab stos %eax,%es:(%rdi)
536- 3a9: 48 89 7c 24 08 mov %rdi,0x8(%rsp)
537- 3ae: ab stos %eax,%es:(%rdi)
538- 3af: 48 c7 44 24 10 00 10 movq $0x1000,0x10(%rsp)
539- 3b6: 00 00
540- 3b8: 48 c7 44 24 18 00 00 movq $0x100000,0x18(%rsp)
541- 3bf: 10 00
542- 3c1: 48 8b 05 00 00 00 00 mov 0(%rip),%rax #
3c8 <paging_init+0x33>
543- 3c8: 48 89 44 24 20 mov %rax,0x20(%rsp)
544- 3cd: 48 89 d7 mov %rdx,%rdi
545- 3d0: e8 00 00 00 00 callq 3d5 <paging_init+0x40>
546- 3d5: 48 83 c4 38 add $0x38,%rsp
547- 3d9: c3 retq
548-
$ objdump -d /tmp/init.new.o|grep -A23 -nw '<paging_init>'
525:0000000000000395 <paging_init>:
526- 395: 48 83 ec 28 sub $0x28,%rsp
527- 399: 48 89 e7 mov %rsp,%rdi
528- 39c: fc cld
529- 39d: 31 c0 xor %eax,%eax
530- 39f: ab stos %eax,%es:(%rdi)
531- 3a0: ab stos %eax,%es:(%rdi)
532- 3a1: ab stos %eax,%es:(%rdi)
533- 3a2: ab stos %eax,%es:(%rdi)
534- 3a3: ab stos %eax,%es:(%rdi)
535- 3a4: ab stos %eax,%es:(%rdi)
536- 3a5: 48 c7 04 24 00 10 00 movq $0x1000,(%rsp)
537- 3ac: 00
538- 3ad: 48 c7 44 24 08 00 00 movq $0x100000,0x8(%rsp)
539- 3b4: 10 00
540- 3b6: 48 8b 05 00 00 00 00 mov 0(%rip),%rax #
3bd <paging_init+0x28>
541- 3bd: 48 89 44 24 10 mov %rax,0x10(%rsp)
542- 3c2: 48 89 e7 mov %rsp,%rdi
543- 3c5: e8 00 00 00 00 callq 3ca <paging_init+0x35>
544- 3ca: 48 83 c4 28 add $0x28,%rsp
545- 3ce: c3 retq
546-
547-00000000000003cf <alloc_low_page>:
548- 3cf: 41 56 push %r14



I bet, that will be a key for success. And if you are interested in such
optimizations, why not to grep whole source tree for this kind of
things? I'm not sure one function in arch/x86_64 is only such ``unoptimized''.
And after doing that maybe you will see, that "{}" initializer can be
applied not only to integer values (you did init with of *long int*,
with *int*, btw), but to structs and others.
with '{}' initializer, gcc will fill its memory with zeros.

to other potential points to be optimized, I only see this trivial as
the first point, I wonder how people gives comments on this; and if
this optimization can be tested correctly, this can be done as an
optimization example and I'll try others.


Ahh, one more thing about _optimizing_ your time, i.e. not wasting one.

Add to CC list people, who already did reply on you patch. Otherwise
you are showing your disrespect for them and hiding from further
discussion.
Thank you, I know it and I've already subscribed the linux kernel
mailing list(linux-kernel@xxxxxxxxxxxxxxx) so that I won't miss any
further discussion about it.


I think you do not, but Linux development not have an automatic system
for patch tracking, so you are on your own with your text editor and
e-mail client on this. Please take care for your time.
What about that?
Do you mean something such as git by "an automatic system"?


--
frenzy
-o--=O`C
#oo'L O
<___=E M



--
Denis Cheng
Linux Application Developer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • long(!) Re: need help on CFLAGS in /etc/make.conf please
    ... For example, MPlayer sets this high on purpose, so GCC will actually ... and the K&R compiler would've known exactly the kind of optimization we wanted. ... >> A msg from Richard Coleman, taken together with the GCC 3.x Known Bugs ...
    (freebsd-questions)
  • Re: question on const
    ... const object is modified through a pointer with the const attribute cast ... Others observed that gcc doesn't seem ... to do the optimization that causes the apparently strange result. ...
    (comp.lang.c)
  • Re: How does compare gcc to VS C++ ?
    ... Cel: Microsoft Visual Studio runs only under Windows and only produces ... Comparing VC++ to gcc is like comparing apples to ... IDE under Linux. ... remember what the BSD fortune file has to say about optimization ...
    (comp.os.linux.misc)
  • Re: How to profile a VC++ program?
    ... I will look at the multi-thread option, and change it and measure it. ... I tried other optimization options as well, like O1, Ox, or the debug ... It is CPU ... The gcc version of the program do not link with multi thread. ...
    (microsoft.public.vc.language)
  • Re: Mac mini OK for development?
    ... It's my understanding that PPC optimization was pretty bad in GCC ... I'm not saying that the Mini isn't usable - obviously it's good enough. ...
    (comp.sys.mac.programmer.help)

Quantcast