Re: Help about L1 data misses collected by Performance Counters
From: Anton Ertl (anton_at_mips.complang.tuwien.ac.at)
Date: 11/25/04
- Previous message: Michael Heiming: "Re: New machine or update?"
- Next in thread: Anton Ertl: "Re: Help about L1 data misses collected by Performance Counters"
- Maybe reply: Anton Ertl: "Re: Help about L1 data misses collected by Performance Counters"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 25 Nov 2004 10:54:03 GMT
guanglei li <lglyahoo-misc@yahoo.com.cn> writes:
>Hi all,
> I am now using PAPI(Performance API, an API to collect CPU related
> data generated by performance hardware counters(a set of registers)
> located in CPU) to collect data about cache activities on PowerPC 750,
> but some data about cache seems really puzzling. For example, the L1
> data misses. The description of it in PPC 750 user manual is "Number
> of L1 data cache misses. Does not include cache ops". I write two c
> programs and get the L1 load misses of them. For a comparision, I also
> run the same programs on Intel celeron(coppermine) and use PAPI to get
> L1 data misses. I put the platform info in parenthesis right after the
> value to indicate which platform the value is gotten from:
>
>
>Program 1:
> 1 #define SIZ 8096000
> ....
> 2 register i,b,j;
> 3 char *buffer;
> 4 buffer = (char *)malloc(SIZ);
> 5
> 6 for(j=0;j<SIZ;j++)
> 7 buffer[j]=0x03;
> 8
> 9 //begin counting using PAPI_start_counters
>10 for(j=0;j<20;j++) {
>11 for(i=0;i<SIZ;i++) {
>12 b=buffer[i];
>13 }
>14 }
>15 //end counting using PAPI_stop_counters 16
>17 return b;
...
> I don't know the reason. Why does the various Line6-8 variants of
> Program 1 impact the L1 data misses so much? I will be very appreciated
> if someone could help me about this.
If you don't initialize the array, all page table entries will point
to the same zero-filled page (it does not matter if you allocated the
array with malloc, or implicitly as uninitialized data). The caches
on the CPUs you are looking at are physically tagged, so the page
needs only one copy in the cache, and it will satisfy all the accesses
to the array (which have the same physical address). Therefore you
don't see many cache misses: basically, those for startup, those
for loading in the zero-filled page (with 4K pages and 32-byte lines,
that's only 128 misses).
Once you write to the array, the pages will be copied-on-write, and
you will get distinct physical pages (even if the content is the same;
google for mergemem for a remedy). Then the CPU needs to load the
physical memory into the L1 cache for the loads, and will flush out
other cache lines, so you see the behaviour you expected.
Followups set to colds.
- anton
-- M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
- Previous message: Michael Heiming: "Re: New machine or update?"
- Next in thread: Anton Ertl: "Re: Help about L1 data misses collected by Performance Counters"
- Maybe reply: Anton Ertl: "Re: Help about L1 data misses collected by Performance Counters"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|