ARM Technical Support Knowledge Articles

How can I define memory mapped performance counters in Streamline?

Applies to: DS-5

Answer

One difficulty with defining memory mapped performance counters in Streamline is that the monitored device is most likely controlled by a driver, either statically linked with the kernel or loaded as a module in runtime. In most cases, the counters are just a special case of control registers, which are normally remapped by the driver, using ioremap() function.

While it is possible to create more than one mapping of the same set of registers, this is not good practice and you should avoid it if there is an alternative.

There are three main methods of addressing of creating memory mapped performance counters:

Direct access, double mapping

While it is not best practice, sometimes it is necessary to double map. For example, the main driver, the PL310 L2 cache controller driver (arch/arm/mm/cache-l2x0.c in the Linux kernel tree) accesses the registers of your device to initialize and control its behavior. At the same time, some of the registers can be used to read performance data, but the driver does not implement this functionality.

Because the driver is a part of the mainline kernel and you have no control over it, it is necessary to ignore all the reservations mentioned in this article and create a second, private mapping for the control registers. This is done in events source implementation of gator, in the gator_events_pl310_probe() function of the gator_events_pl310.c file. It obtains the virtual base address, pl310_base, using an ioremap() call. Then it uses this address to access PL310's register. For example, it reads the performance data in gator_events_pl310_read() function using readl().

Direct access, single mapping

If you have control over the PL310 driver, you could add an internal API to obtain its mapping virtual base address using a function like l2x0_get_base(). Gator event source implementation calls it instead of ioremap() and uses the returned value to access the performance counter using the readl() function.

While this makes the architecturally dubious approach of double mapping unnecessary, it still has drawbacks. Most critically, the device registers are accessed from two distinct code sets. In the worst case scenario, a bug in gator implementation can lead to corruption in L2 cache settings and serious, hard-to-track issues.

Performance API

If the PL310 driver provides an API for accessing performance data, you can call it directly in the gator module. For example, if you export the l2x0_get_perf_counter() function  using EXPORT_SYMBOL macro, you only have to call it in gator_events_pl310_read(). There is no need for ioremap() or a private copy of the base address. Of course, the API also has to provide methods of performance monitor configuration.

Here is an example of an API call in a gator_*_read routine:

/* Defined in driver's header file */
extern int l2x0_get_perf_counter(void);

static int gator_myevents_read(int **buffer)
{
    int len = 0;

    if (myevent_enabled) {
        myevent_buffer[len++] = myevent_key;
        myevent_buffer[len++] = l2x0_get_perf_counter();
    }

    return len;
}

This approach makes the device an exclusive resource. Only the main driver is allowed to access its registers. There is one side effect to using this approach, however. The device driver must be present in the kernel when you load the gator module because the module loader has to resolve any reference to the l2x0_get_perf_counter() function. This is not a problem for statically linked drivers, but it enforces specific loading order when the device driver is a loadable module. You must load it before gator, otherwise the kernel is not able to resolve relevant symbols and so not able to accept gator.

You can work around this, if necessary, by using dynamic symbols resolution. You can use the symbol_get() kernel function to access the main driver's function.

For example:

/* Defined in driver's header file */
extern int l2x0_get_perf_counter(void);

static int (*gator_myevent_get_counter)(void);

static int gator_myevents_start(void)
{
    gator_myevent_get_counter = symbol_get(l2x0_get_perf_counter);
}

static int gator_myevents_stop(void)
{
    if (gator_myevent_get_counter)
        symbol_put(l2x0_get_perf_counter);
}

static int gator_myevents_read(int **buffer)
{
    int len = 0;

    if (gator_myevent_get_counter && myevent_enabled) {
        myevent_buffer[len++] = myevent_key;
        myevent_buffer[len++] = gator_myevent_get_counter();
    }

    return len;
}

Such implementation does not reference the l2x0_get_perf_counter() directly so it might be loaded even if the requested symbol is not available. Then, when the capture operation is starting, symbol_get() returns the function pointer or NULL, if the function is still not available. Resolving symbols is a very expensive operation and must not be done in any heavily used code, including the gator_*_read() function.

Article last edited on: 2011-09-15 17:29:36

Rate this article

[Bad]
|
|
[Good]
Disagree? Move your mouse over the bar and click

Did you find this article helpful? Yes No

How can we improve this article?

Link to this article
Copyright © 2011 ARM Limited. All rights reserved. External (Open), Non-Confidential