The Quality of Sampled Data

For Engineering Software Management (ESM), a usage metering tool is only as good as the quality and accuracy of the reports it can generate. And since it is highly impractical to generate data for every second of usage, we use data sampling to approximate the usage without compromising its accuracy.

Sampling will always give pretty good overall numbers, particularly when we have high usage counts. However, if a significant number of checkouts lasts for a shorter time than the sampling interval, we will lose some of the short-term “fuzz” on the usage curves. At the individual user level, we will need more samples to catch all of them.

Total use

For overall use, the sampled result will almost always give good numbers since usage patterns are not random. Although some local max/min values will happen more randomly, usage will almost always follow fixed trends, e.g. users come in the morning, usage rises and peaks around noon (with a dip around lunch time) and drops down in the late afternoon. Even combining input from multiple time zones/regions does not remove the pattern of usage.

For sampling, we obviously will not see what happens between each sample points. So, we will miss some max and min values, but the overall trend will be seen even with low frequency sampling.

For example, below is a chart of values that may be considered the “actual” usage of a license, where license usage can change at every “step”.

If we collected samples from every 5th step (1, 6, 11 etc.), we would see the following usage pattern:

Although the small variations are missing, we still have a pretty good picture of the general usage pattern.

When we collect data for usage, we let each use last as long as the sampling interval and summarize it. Although these numbers are not completely accurate per sample, they will provide good results over time if the variation of missed max/min is fairly random, i.e. we will report approximately as much over and under use and this basically averages out over time. In the above example, we would use:

It is important to note that the longer the report period covers for such data, the more accurate it likely is. For instance, a 5-minute value may be markedly wrong, but a whole day report will likely be quite good, and a month-long report will be very close to actual values.

Individual use

For data reporting for specific users, we need to be more careful since we will only see each user occasionally if checkout time is short.

For a first estimate, when the checkout time is 1/N of the polling period, then we need, on average, N samples to catch each user. In an extreme case where a special function is only checked out for a single second every 5 minutes in order to perform some form of calculation, our default 5-minute sampling (300 seconds) will need 300 such samples on average to catch the user. Since there are just under 100 sampling periods in a working day of 8 hours, we would sample each individual user just one or two times per week. If the function is used less, i.e. just a few times/day instead of every 5 minutes, then most individual users will never be sampled in an average week or month, and the max-in-use values will almost certainly be wrong.

On the other hand, if the usage ever reaches the maximum number of licenses, then all the following samples are likely to show this value or one very close to it, so the customer will see the effect of denials.

Learn more about how Open iT meters the true usage of your engineering software assets. Let us take you through a guided tour on Engineering Software Management.