www.smartbusinesschoices.com

Leading Business and Technology,
News and information


Part of the Identityscape.com network...

getxfactor.com jmoodmusic.com smartbusinesschoices.com mintdepot.com lowfaresalways.com evangelicalview.com shoppingpodder.com soproudlywehail.com webnews.ws currenthumor.com

 

 

measuring clock cycles per second
Goto page Previous  1, 2, 3  Next
   Smart Linux Business Choices! - the Best of UseNet Postings! Forum Index -> Linux Development - Applications  
View previous topic :: View next topic  
Author Message
Jasen Betts
Guest






PostPosted: Sun Nov 16, 2008 4:58 am    Post subject: Re: measuring clock cycles per second Reply with quote

On 2008-11-15, Chris <chris@thisisnotanemailaddress.ca> wrote:

Quote:
It's better for me to find out now before I invest too much time in this.

Basically, I am benchmarking my algorithm/process for its
appropriateness/likelihood to be executed on specialised, embedded
hardware. The benchmarking app I have written takes a set of input and
executes the algorithm and writes a report on the various timings
(milliseconds) and memory consumption.

measure the execution of the sample algorithm ether in wall time or
time spent executing the process and then divide that by the clock
speed. that'll give you the suggested measure.

Quote:
It was suggested to be, by a coworker, that I look into counting the clock
cycle execution of the algorithm (to report the number lock cycles to
execute the algorithm and various parts of the algorithm). The idea is to
provide a metric that can be used to determine/guestimate the minimum cpu
requirements and to (possible) determine/guestimate the power requirements
of the algorithm

probably better to pick a common benchmark (used to be Dhrystone or
Whetstone) that is quoted for the potential target hardware and
compare that to the same benchmark on your hardware.

Quote:
(i.e. if it takes N clock cycles on a Pentium 150Mhz CPU,
then it requires X watts/volts/etc.).


Quote:
I know there are a lot of factors that I haven't even begun to consider. I
was just curious if there was a way to report that the algorithm took N
clock cycles to complete. Where N would be a number on the order of 1e9 I'm
sure!

The more reading I have done on this suggests that if it takes N clock
cycles on a Pentium 150Mhz CPU, it may not necessarily take M cycles on a
Pentium 3 1Ghz where M < N. It all depends on how many operations the CPU
can execute in one clock cycle.

yeah, benchmarks were invented for this reason (and also to enable comparison
with wildly different architectues like ARM and 68K) back in the 90s when I
was more interested in this the benchmarks were Dhrystone ans Whetstone
but time has moved on. but queries involving the potential target and
the word 'benchmark' may yield useful results.

eg a Via Nehemiah based processor seems to take about twice as many cycles as a
celeron on many benchmarks but only uses 1/4 of the power. non-X86 cores may be
more efficient.
Back to top
Rainer Weikusat
Guest






PostPosted: Sun Nov 16, 2008 9:04 pm    Post subject: Re: measuring clock cycles per second Reply with quote

Nate Eldredge <nate@vulcan.lan> writes:
Quote:
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
scholz.lothar@gmail.com writes:
There is not way to do this. Even Linus agrees that profiling is worse
on Linux and one of things that still needs to be done.

TSC and other CPU counter events are not working because they are
not saved across task switches.

There is no instruction to set the time stamp counter and there cannot
even be one, because this would directly contradict its purpose.

Interestingly, that's not quite true. On my Opteron CPU, the TSC is a
model-specific register and can be written with the appropriate
(privileged) WRMSR instruction. (I just tried it and it works, though
the documentation says this feature is not to be relied upon.) So in
principle, you could have the TSC saved on task switches, so it would
count cycles for each process.

Not without hardware support. How is the correction necessary to
account for the time between start of interrupt handling and reading
the TSC value in order to store it to some memory location supposed to
be determined?

Quote:
FreeBSD has support for CPU performance-monitoring counters (PMC) which
can count not only clock cycles but many other CPU events (jumps taken,
cache misses, pipeline stalls, etc). These can be set to run on a
systemwide or per-process basis. It doesn't appear that Linux has this
support yet, unless I am missing something.

I posted a link to the sourceforge page of a project which provides
exactly this in another subthread (OProfile).
Back to top
Rainer Weikusat
Guest






PostPosted: Sun Nov 16, 2008 9:11 pm    Post subject: Re: measuring clock cycles per second Reply with quote

Jasen Betts <jasen@xnet.co.nz> writes:
Quote:
On 2008-11-15, Chris <chris@thisisnotanemailaddress.ca> wrote:

It's better for me to find out now before I invest too much time in this.

Basically, I am benchmarking my algorithm/process for its
appropriateness/likelihood to be executed on specialised, embedded
hardware. The benchmarking app I have written takes a set of input and
executes the algorithm and writes a report on the various timings
(milliseconds) and memory consumption.

measure the execution of the sample algorithm ether in wall time or
time spent executing the process and then divide that by the clock
speed. that'll give you the suggested measure.

When the assumption that the CPU spent significantly more time with
working than with waiting for some external event, eg data transfer
to/from some sort of memory, is true (plus a few others, eg that
nothing was causing lots of interrupts for some reason during the
measurement period and that no other tasks ran for significant amounts
of time then). The nasty detail about this is that all of these
assumptions will usually be true on a dedicated 'test system', ie
developer workstation, but not on a computer which is actually busy
with executing multiple tasks concurrently.
Back to top
Nate Eldredge
Guest






PostPosted: Mon Nov 17, 2008 2:37 am    Post subject: Re: measuring clock cycles per second Reply with quote

Rainer Weikusat <rweikusat@mssgmbh.com> writes:

Quote:
Nate Eldredge <nate@vulcan.lan> writes:
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
scholz.lothar@gmail.com writes:
There is not way to do this. Even Linus agrees that profiling is worse
on Linux and one of things that still needs to be done.

TSC and other CPU counter events are not working because they are
not saved across task switches.

There is no instruction to set the time stamp counter and there cannot
even be one, because this would directly contradict its purpose.

Interestingly, that's not quite true. On my Opteron CPU, the TSC is a
model-specific register and can be written with the appropriate
(privileged) WRMSR instruction. (I just tried it and it works, though
the documentation says this feature is not to be relied upon.) So in
principle, you could have the TSC saved on task switches, so it would
count cycles for each process.

Not without hardware support. How is the correction necessary to
account for the time between start of interrupt handling and reading
the TSC value in order to store it to some memory location supposed to
be determined?

It couldn't, of course.

I was imagining this as a high-resolution clock() that could be read
without a system call. In this case, the time you mention would get
charged to the process, just as it is in the usual implementation. But
at least now it would be a pretty good approximation to the actual
number of cycles used by the process.

If you want to get it exactly on the nose, I think you'd have to run in
privileged mode and disable interrupts. I'm not sure what the value of
that is, however, since there are plenty of other things that will make
your timing fluctuate.

Quote:
FreeBSD has support for CPU performance-monitoring counters (PMC) which
can count not only clock cycles but many other CPU events (jumps taken,
cache misses, pipeline stalls, etc). These can be set to run on a
systemwide or per-process basis. It doesn't appear that Linux has this
support yet, unless I am missing something.

I posted a link to the sourceforge page of a project which provides
exactly this in another subthread (OProfile).

Aha. I didn't follow the link and so I didn't realize that was how
OProfile worked.
Back to top
Guest







PostPosted: Wed Nov 19, 2008 10:56 am    Post subject: Re: measuring clock cycles per second Reply with quote

On 14 Nov., 13:50, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
Quote:
scholz.lot...@gmail.com writes:
There is not way to do this. Even Linus agrees that profiling is worse
on Linux and one of things that still needs to be done.

TSC and other CPU counter events are not working because they are
not saved across task switches.

There is no instruction to set the time stamp counter and there cannot
even be one, because this would directly contradict its purpose.

Are you really so unexperienced?

I give a shit about TSC just the counter values. So of course the
System
needs to keep an adjustment factor on a thread base which is added to
the counter.

It is also not really important if you have latencies caused by
memory
delays or interrupts. You only need relative values.You compare parts
of your program and improvements in your algorithms with this
relative
numbers.

The fact and keypoint is: Linux sucks here.

Almost 12 years after the introduction of this counters the kernel
guys
still haven't implemented any way in the normal kernel code to measure
CPU events. Yes i know there are special kernel patches but it sucks
and makes it useless for the usual software developers.
Back to top
Rainer Weikusat
Guest






PostPosted: Wed Nov 19, 2008 6:24 pm    Post subject: Re: measuring clock cycles per second Reply with quote

scholz.lothar@gmail.com writes:
Quote:
On 14 Nov., 13:50, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
scholz.lot...@gmail.com writes:
There is not way to do this. Even Linus agrees that profiling is worse
on Linux and one of things that still needs to be done.

TSC and other CPU counter events are not working because they are
not saved across task switches.

There is no instruction to set the time stamp counter and there cannot
even be one, because this would directly contradict its purpose.

Are you really so unexperienced?

That hardly follows from a statement of fact which happens to be only
partially true (there is no instruction to store the TSC as such).

Quote:
I give a shit about TSC just the counter values. So of course the
System needs to keep an adjustment factor on a thread base which is
added to the counter.

There still isn't an instruction to do so. In line with that, while
the register could be written to by 'other means', there is no way to
determine such an adjustment factor, because the counter is just going
to keep on counting (which happens to be its purpose), no matter what
the CPU is presently doing or not doing.

Quote:
It is also not really important if you have latencies caused by
memory delays or interrupts.

Latencies caused by memory delays or execution of other tasks,
including interrupt handlers, mean that the TSC difference, divided by
the (nominal) CPU speed is not an accurate measurement of the amount
of CPU clock cycles necessary to execute a particular instruction
sequence. Which happens to be what was asked for.

[...]

Quote:
Almost 12 years after the introduction of this counters the kernel
guys still haven't implemented any way in the normal kernel code to
measure CPU events. Yes i know there are special kernel patches but
it sucks and makes it useless for the usual software developers.

There is no difference between 'normal kernel code' and 'kernel
patches' which would justify an classification of the code itself.
Your statement should have been "Linus Torvalds sucks, because even
after twelve years ..., he still isn't distributing a kernel source
tree with a particular feature in it".

Coincidentally, this claim is wrong.
Back to top
Guest







PostPosted: Thu Nov 20, 2008 8:08 am    Post subject: Re: measuring clock cycles per second Reply with quote

Quote:
There still isn't an instruction to do so. In line with that, while
the register could be written to by 'other means', there is no way to
determine such an adjustment factor, because the counter is just going
to keep on counting (which happens to be its purpose), no matter what
the CPU is presently doing or not doing.

The kernel needs to read the current value of the counter register
before a thread base CPU schedule and when it comes back it needs
to add the difference between last counter and current counter to the
adjustment factor. When the user code reads the counter register
again it must substract the adjustment factor and gets a correct
value not (primarily) depending on the cpu load (if memory is behaving
well and interrupts are routed to other cpu kernels).

Of couse the best would be to add this to the CPU hardware.
We can also blame Intel here for a typical brainless implementation
thats to the fact that there is no real competition among cpu vendors
anymore.

Quote:
It is also not really important if you have latencies caused by
memory delays or interrupts.

Latencies caused by memory delays or execution of other tasks,
including interrupt handlers, mean that the TSC difference, divided by
the (nominal) CPU speed is not an accurate measurement of the amount
of CPU clock cycles necessary to execute a particular instruction
sequence. Which happens to be what was asked for.

AFAIK there is a counter that tracks executed number of instructions.
Then we need to use this counter. There are hunderts of them. Don't
hang
on with TSC thats a decade old technologie from the first pentium.
Core2
cpus have much more to offer exactly for this reason.

Quote:
There is no difference between 'normal kernel code' and 'kernel
patches' which would justify an classification of the code itself.

No there is an offical kernel instead of the hunderts of different
Linux based
distributions. But here you can really blame the Linux kernel
developers and
Linux Torvalds personally.
Back to top
Jacek Dziedzic
Guest






PostPosted: Mon Dec 01, 2008 4:37 am    Post subject: Re: measuring clock cycles per second Reply with quote

Rainer Weikusat wrote:
Quote:
gprof is basically useless except on ancient hardware.

Could you elaborate why?

- J.
Back to top
Rainer Weikusat
Guest






PostPosted: Mon Dec 01, 2008 4:46 am    Post subject: Re: measuring clock cycles per second Reply with quote

Jacek Dziedzic <jacek.dziedzic__no--spam__@gmail.com> writes:
Quote:
Rainer Weikusat wrote:
gprof is basically useless except on ancient hardware.

Could you elaborate why?

Because its analysis is based on 100Hz-sampling of the instruction
pointer in order to determine where a program spent its running time?
Back to top
John Hasler
Guest






PostPosted: Mon Dec 01, 2008 5:47 am    Post subject: Re: measuring clock cycles per second Reply with quote

Rainer Weikusat wrote:
Quote:
gprof is basically useless except on ancient hardware.

Jacek Dziedzic writes:
Quote:
Could you elaborate why?

Rainer Weikusat wrote:
Quote:
Because its analysis is based on 100Hz-sampling of the instruction
pointer in order to determine where a program spent its running time?

<http://sourceware.org/ml/binutils/2005-07/msg00423.html>
--
John Hasler
john@dhh.gt.org
Dancing Horse Hill
Elmwood, WI USA
Back to top
Rainer Weikusat
Guest






PostPosted: Mon Dec 01, 2008 4:42 pm    Post subject: Re: measuring clock cycles per second Reply with quote

John Hasler <john@dhh.gt.org> writes:
Quote:
Rainer Weikusat wrote:
gprof is basically useless except on ancient hardware.

Jacek Dziedzic writes:
Could you elaborate why?

Rainer Weikusat wrote:
Because its analysis is based on 100Hz-sampling of the instruction
pointer in order to determine where a program spent its running time?

http://sourceware.org/ml/binutils/2005-07/msg00423.html

This communicates that some random routine in some version of glibc
tries to autodetect the resolution of the system clock.
Which is supposed to mean precisely what?
Back to top
Jacek Dziedzic
Guest






PostPosted: Mon Dec 01, 2008 6:22 pm    Post subject: Re: measuring clock cycles per second Reply with quote

Rainer Weikusat wrote:
Quote:
Jacek Dziedzic <jacek.dziedzic__no--spam__@gmail.com> writes:
Rainer Weikusat wrote:
gprof is basically useless except on ancient hardware.
Could you elaborate why?

Because its analysis is based on 100Hz-sampling of the instruction
pointer in order to determine where a program spent its running time?

I was hoping more for an answer than a question.

I still don't see how sampling with an, admittedly, low 100Hz
frequency makes it useless except on ancient hardware. Even if a short
function called from within an inner loop takes 100 us, it will get hits
with a sampling rate of 0.01/s in a statistical sense, provided it is
executed many times. Right, the timings may not be very accurate, but
the percentages will be more or less right.

I can understand how programs that have a walltime of under a second
could get poor statistics, but who profiles such programs?

- J.
Back to top
Rainer Weikusat
Guest






PostPosted: Mon Dec 01, 2008 6:43 pm    Post subject: Re: measuring clock cycles per second Reply with quote

Jacek Dziedzic <jacek.dziedzic__no--spam__@gmail.com> writes:
Quote:
Rainer Weikusat wrote:
Jacek Dziedzic <jacek.dziedzic__no--spam__@gmail.com> writes:
Rainer Weikusat wrote:
gprof is basically useless except on ancient hardware.
Could you elaborate why?
Because its analysis is based on 100Hz-sampling of the instruction
pointer in order to determine where a program spent its running time?

I was hoping more for an answer than a question.

And that was an answer. In form of a rethorical question, ie one which
has an implied answer.

Quote:
I still don't see how sampling with an, admittedly, low 100Hz
frequency makes it useless except on ancient hardware. Even if a short
function called from within an inner loop takes 100 us, it will get
hits with a sampling rate of 0.01/s in a statistical sense, provided
it is executed many times.

Why do you think you can predict which of the millions, if not
billions of values the instruction pointer had during the sampling
interval will be recorded?
Back to top
Jacek Dziedzic
Guest






PostPosted: Mon Dec 01, 2008 8:46 pm    Post subject: Re: measuring clock cycles per second Reply with quote

Rainer Weikusat wrote:
Quote:

And that was an answer. In form of a rethorical question, ie one which
has an implied answer.

OK, fair enough.

Quote:
I still don't see how sampling with an, admittedly, low 100Hz
frequency makes it useless except on ancient hardware. Even if a short
function called from within an inner loop takes 100 us, it will get
hits with a sampling rate of 0.01/s in a statistical sense, provided
it is executed many times.

Why do you think you can predict which of the millions, if not
billions of values the instruction pointer had during the sampling
interval will be recorded?

Mind you, I'm not saying I can predict a single hit of the profiler
at a single EIP value. I'm saying that, given adequate number of
repetitions, every part of the code covered by the sampling interval
gets its hits proportionally, however short the part of the code (even a
single instruction) is. Or, in other words, with the number of
iterations N going to infinity, I can predict that every part of the
code that is executed within these iterations will be sampled any
desired number of times, regardless of its size and the sampling
resolution. Or, in other words still, however rare the sampling and
however short the code portion, I can guarantee, with any desired
probability, e.g. 99.9999% that every EIP value will get at least a
desired number of hits, say 300. Sure, it might take a long time (i.e.
make N really large) if the sampling is done, say, once per second, but
we're arguing about the general principle, right?

So I tend towards "realizing the limits of profiling with a small
resolution", yet the claim that the technique is "useless except on
ancient hardware" eludes me.

- J.
Back to top
Rainer Weikusat
Guest






PostPosted: Mon Dec 01, 2008 9:13 pm    Post subject: Re: measuring clock cycles per second Reply with quote

Jacek Dziedzic <jacek.dziedzic__no--spam__@gmail.com> writes:
Quote:
Rainer Weikusat wrote:
And that was an answer. In form of a rethorical question, ie one
which
has an implied answer.

OK, fair enough.

I still don't see how sampling with an, admittedly, low 100Hz
frequency makes it useless except on ancient hardware. Even if a short
function called from within an inner loop takes 100 us, it will get
hits with a sampling rate of 0.01/s in a statistical sense, provided
it is executed many times.
Why do you think you can predict which of the millions, if not
billions of values the instruction pointer had during the sampling
interval will be recorded?

Mind you, I'm not saying I can predict a single hit of the profiler
at a single EIP value. I'm saying that, given adequate number of
repetitions, every part of the code covered by the sampling interval
gets its hits proportionally, however short the part of the code (even
a single instruction) is.

You already asserted this in your last posting. But this amounts to
'it works because I say it does'. Trying a little though experiment:
Let's assume that a program only executes a single loop and this loop
calls invokes two subroutines. The average execution time of
subroutine #1 is 1/4 of the sampling interval, the average execution
time of #2 3/4. This means the program spends 1/4 of its time in
subroutine one, yet if the profiling timer was started 'close' to the
start of the loop, the instruction pointer should basically always be
somewhere in #2 when the value is recorded.

What's wrong with this example?

[...]

Quote:
So I tend towards "realizing the limits of profiling with a small
resolution", yet the claim that the technique is "useless except on
ancient hardware" eludes me.

The assumption that gprof actually provides useful output at some
point in the past is just a complimentary assumption of mine, because
I have never seen this happen ever since I first encountered the
program on a 25 Mhz processor.
Back to top
Display posts from previous:   
   Smart Linux Business Choices! - the Best of UseNet Postings! Forum Index -> Linux Development - Applications Goto page Previous  1, 2, 3  Next  
Page 2 of 3
All times are GMT

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum