Re: Help about L1 data misses collected by Performance Counters

From: Anton Ertl (anton_at_mips.complang.tuwien.ac.at)
Date: 11/25/04

  • Next message: Gabro: "[bounty hunt] canon printer driver for debian sarge"
    Date: Thu, 25 Nov 2004 10:54:03 GMT
    
    

    guanglei li <lglyahoo-misc@yahoo.com.cn> writes:
    >Hi all,
    > I am now using PAPI(Performance API, an API to collect CPU related
    > data generated by performance hardware counters(a set of registers)
    > located in CPU) to collect data about cache activities on PowerPC 750,
    > but some data about cache seems really puzzling. For example, the L1
    > data misses. The description of it in PPC 750 user manual is "Number
    > of L1 data cache misses. Does not include cache ops". I write two c
    > programs and get the L1 load misses of them. For a comparision, I also
    > run the same programs on Intel celeron(coppermine) and use PAPI to get
    > L1 data misses. I put the platform info in parenthesis right after the
    > value to indicate which platform the value is gotten from:
    >
    >
    >Program 1:
    > 1 #define SIZ 8096000
    > ....
    > 2 register i,b,j;
    > 3 char *buffer;
    > 4 buffer = (char *)malloc(SIZ);
    > 5
    > 6 for(j=0;j<SIZ;j++)
    > 7 buffer[j]=0x03;
    > 8
    > 9 //begin counting using PAPI_start_counters
    >10 for(j=0;j<20;j++) {
    >11 for(i=0;i<SIZ;i++) {
    >12 b=buffer[i];
    >13 }
    >14 }
    >15 //end counting using PAPI_stop_counters 16
    >17 return b;
    ...
    > I don't know the reason. Why does the various Line6-8 variants of
    > Program 1 impact the L1 data misses so much? I will be very appreciated
    > if someone could help me about this.

    If you don't initialize the array, all page table entries will point
    to the same zero-filled page (it does not matter if you allocated the
    array with malloc, or implicitly as uninitialized data). The caches
    on the CPUs you are looking at are physically tagged, so the page
    needs only one copy in the cache, and it will satisfy all the accesses
    to the array (which have the same physical address). Therefore you
    don't see many cache misses: basically, those for startup, those
    for loading in the zero-filled page (with 4K pages and 32-byte lines,
    that's only 128 misses).

    Once you write to the array, the pages will be copied-on-write, and
    you will get distinct physical pages (even if the content is the same;
    google for mergemem for a remedy). Then the CPU needs to load the
    physical memory into the L1 cache for the loads, and will flush out
    other cache lines, so you see the behaviour you expected.

    Followups set to colds.

    - anton

    -- 
    M. Anton Ertl                    Some things have to be seen to be believed
    anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
    http://www.complang.tuwien.ac.at/anton/home.html
    

  • Next message: Gabro: "[bounty hunt] canon printer driver for debian sarge"

    Relevant Pages

    • IPv6 oops on ifup in latest BK
      ... Using ACPI for SMP configuration information ... Initializing CPU#0 ... CPU: Trace cache: 12K uops, ...
      (Linux-Kernel)
    • SCSI CDROM issue in kernels >= 2.6.14-rc3
      ... CPU: Trace cache: 12K uops, ... MEM window: disabled. ... SCSI device sda: 17928698 512-byte hdwr sectors ...
      (Linux-Kernel)
    • mptscsih: ioc1: attempting task abort! (sc=d6e8a980)
      ... CPU 2: Machine Check Exception: 0000000000000004 ... OEM ID: INTEL Product ID: Bridge CRB APIC at: 0xFEE00000 ... CPU: Trace cache: 12K uops, ... SCSI device sda: 287132440 512-byte hdwr sectors ...
      (Linux-Kernel)
    • Re: 2.6.16-rc6-mm2
      ... CPU: Trace cache: 12K uops, ... Calibrating delay using timer specific routine.. ... # ACPI Support ...
      (Linux-Kernel)
    • 2.6.16-rc5 huge memory detection regression
      ... I just tested 2.6.16-rc5 kernel on MSI 9136 dual Xeon server motherboard with 16 GB of memory and the kernel detects only 8 GB of RAM instead. ... CPU: Trace cache: 12K uops, ... SCSI device sda: 390721968 512-byte hdwr sectors ...
      (Linux-Kernel)