Re: [PATCH][1/7] perfctr-2.7.2 for 2.6.6-mm2: core

From: Mikael Pettersson (
Date: 05/15/04

  • Next message: Mikael Pettersson: "Re: [PATCH][3/7] perfctr-2.7.2 for 2.6.6-mm2: x86_64"
    Date:	Sat, 15 May 2004 16:39:10 +0200 (MEST)

    On Fri, 14 May 2004 15:40:05 +0100, Christoph Hellwig wrote:
    > And even without that it's a really horrible
    >interface. Any chance to do a proper fs-based interface ala oprofile?

    I object to "proper" and "fs-based". If that was universally
    true, then we'd only have 3 file-system related system
    calls (open, read, write) and _everything_ else would
    be expressed using those on various special fs:s.

    Several 64-bit archs already have low-level performance
    counter interfaces (non-fs based I might add) in the kernel.
    This interface is no worse than those.

    The per-process counters API needs to express:
    - Open the perfctr state belonging to a given process
      (a real kernel process, not that process group thing)
      returning a handle (file descriptor).
      The fd is used with mmap() for low-overhead sampling,
      and as an access-rights token in the other operations.
    - Alternatively, create the state and return a handle.
    - Unlink the state represented by a handle from the
      process it is attached to.
    - Update the state's control data. This involves CPU-specific
      data, the signal you want on overflow interrupts, and a
      mask indicating which counter sums you want to preserve
      (otherwise they're reset).
    - Resume the counters. The counters are temporarily suspended
      when the overflow signal handler is invoked; the handler
      uses this operation to tell the kernel to resume the counters.
      The handler can of course choose not to do this.
    - Read the counter sums from the state. This is used when
      user-space can't or doesn't want to use the mmap()ed state.
      (Old P5 and Winchip processors must do this.)
    - Read the control data from the state. Used e.g. when
      the counters are accessed from a different process.

    The global-mode counters API needs to express:
    - Stop all counters on all CPUs.
    - Write control data to a given CPU.
    - Start the counters, with a given sampling interval.
    - Read the control data and counter sums from a given CPU.

    The CPU-specific control data needs to express:
    - Which CPU-counter to map a given counter to. This is
      rarely a 1-to-1 mapping because processors tend to have
      asymmetric counters, and sometimes a large set in which
      only a few are to be used.
      User-space needs to be in charge of this mapping. This
      is NOT something the kernel should be doing behind the
      user's back, precisely because HW isn't symmetric.
      This mapping also affects the user-space sampling code.
    - The per-counter control data to associate with a given counter.
      The amount of this varies considerably.
    - The global control data shared by all counters.
      The amount of this varies considerably.
    - The initial and restart values for interrupt-on-overflow
    - Whether to also sample the CPU's clock-like counter.

    Doing all of this via file-system operations would either
    require a big hierarchy of directories and files, or a smaller
    hierarchy plus parsers for written textual data.

    Passing struct:s works, except for binary compatibility
    issues. (And since the structures must be updated to match
    newer CPUs, these issues are very real.)

    >Haven't looked over much of the code yet, but the people who support
    >32bit userspace on 64bit architectures will probably kill you for
    >the multiplexer syscall.

    The previous ioctl()-based perfctr-2.6 version supports i386
    binaries on x86_64 kernels, as should this syscall() version.

    Key to this is the structure marshalling code which does
    several things:
    - allows the kernel to add fields (e.g. for new processors)
      without affecting older user-space code
    - allows user-space code to work on an older kernel whose
      structures have fewer fields (supports fewer processors),
      as long as user-space does CPU type detection and doesn't
      attempt to use e.g. P4-only fields on a P6 or K7

    The pass-binary-structures-via-marshalling approach works,
    but I admit it is uncommon. Converting to a pseudo-fs
    interface will require a substantial amount of work and code.
    Of course, I will do that if I have no choice...

    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to
    More majordomo info at
    Please read the FAQ at

  • Next message: Mikael Pettersson: "Re: [PATCH][3/7] perfctr-2.7.2 for 2.6.6-mm2: x86_64"