[rrs-users] Re: rrs variable-length months

Jim C. Nasby decibel at rrs.decibel.org
Mon Apr 18 06:11:37 GMT 2005


I'm cc'ing rrs-users at rrs.decibel.org, since others might be interested
in this. You might want to subscribe.

On Thu, Apr 14, 2005 at 06:11:53AM +0100, Tom Flavel wrote:
> Hi,
> 
> I'm looking to log bandwidth and similar types of data, and RRS looks
> like it'll do roughly what I need.
> 
> My data comes from SNMP: I wrote a function for pgsql which goes and
> gets the data (internally it finds the IPs and passwords for SNMP, etc
> from other tables), and materialises it as a table, so you can "SELECT
> getsnmpdata()" and be presented with the various rows. This is entirley
> irrelevant, but I thought you may be interested.

Actually, that's an interesting approach. I'd like to include your code
if I can. I guess it's time for a demo directory or something...
 
> Now, I'd like to take 95th percentiles of the data per-month. Month
> lengths are different per-month. My thoughts are, that if the RRD has
> more than enough rows, I can just ignore the latter couple for shorter
> months (right?).

Actually, 'last month' isn't a very accurate label. It should really be
last 30 days. Right now, the only way could could actually do a real
'last month' would be to pull directly from the source data one month at
a time, which wouldn't be terribly useful.

If you take a look at the rrs table for the last month entry, it's
defined to keep 168 buckets. Each bucket is 4 hours long, although the
time_per_bucket field is only used by the code if the rrs doesn't have a
parent. If you want to know how long each bucket *really* is, you need
to trace back through the rrs's parents until you come to the top level.
In this case, the 'last month' rrs (rrs_id 6 normally) feeds off rrs_id
4 (last day), which feeds off rrs_id 1 (last hour), which is the top
level. Each bucket in rrs_id 1 is 1 minute long. Each bucket is rrs_id 4
is the aggregate of 30 buckets of rrs_id 1, or 30 minutes. Each bucket
in rrs_id 6 is the aggregate of 8 buckets of rrs_id 4. 8 * 30 minutes =
4 hours.

Eventually I'd like to have the ability to do an actual last month, but
I haven't figured out any good way to do it.

How were you thinking of doing a 95th percentile calculation? I think
that would be an interesting example as well.

> I assume RRS keeps the number of the most recently updated row
> somewhere, but I cant for the life of me find it... or does the buckets
> approach (which I am not entirley sure I follow) mean something entirley
> unrelated is happening there?

Take a look at the source_status table. It's got the last end time run
for each source and rrs. You can use that info to find the specific
bucket.

> Any pointers are much appreciated, or advice for other approaches.
> 
> 
> (I'll submit patches when I'm done, if you want them: I'm guessing
> somebody else may want this ability, too)

BTW, something you should be aware of in version 0.4. It's got code to
throttle updates when it's running behind. Previously it could take
hours or more for an update to happen if it was far behind. 0.4 will
limit how much data is processed. But the problem is that if you add a
new RRS to an existing system, the update of that new RRS won't work
properly and it will miss a lot of data. If you need this functionality
you should probably stick to 0.3 for now (or help me fix the update :P).
-- 
Jim C. Nasby, Database Consultant               decibel at decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


----- End forwarded message -----

-- 
Jim C. Nasby, Database Consultant               decibel at decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


More information about the rrs-users mailing list