[rrs-users] Re: rrs variable-length months

Jim C. Nasby decibel at rrs.decibel.org
Tue Apr 19 21:31:14 GMT 2005


On Mon, Apr 18, 2005 at 07:44:20AM +0100, Tom Flavel wrote:
> I was hoping to make an aggregate function. I think I can have it store
> each number as an array, in the current state for the aggregate, and
> then sort and return the 5%th element when it's finished. I'm a little
> unsure about how to access arrays via postgresql's SAPI, though, or if
> they're intended to be sorted.
> 
> As far as I understand, an array is arranged consecutivley in memory, so
> I think qsort() will suffice.
> 
> Maybe inserting in order would be better, or maybe just using the state
> for storing the OID of some tempoary table would turn out more effecient;
> suggestions welcome.

Actually, you could probably just do a couple of limit queries to get
the 95% percent. Just figure out how many rows there are, and from that
which exact row you need.

> > > I assume RRS keeps the number of the most recently updated row
> > > somewhere, but I cant for the life of me find it... or does the buckets
> > > approach (which I am not entirley sure I follow) mean something entirley
> > > unrelated is happening there?
> > 
> > Take a look at the source_status table. It's got the last end time run
> > for each source and rrs. You can use that info to find the specific
> > bucket.
> 
> I thought simply LIMITing the retrieved data by however many days are in the
> given month (which would need to be passed in as some parameter, which means
> a special-case function prototype :/), assuming the data has enough rows for
> all possible month lengths?
> 
> Or, if we can just SELECT WHERE start...end, then great. I'm still not
> entirley sure how that's structured, or where I'm supposed to be
> looking, though.

To get the last updated bucket for each source:
SELECT ss.source_id, b.* FROM rrs.bucket b JOIN rrs.source_status ss ON
(b.rrs_id = ss.rrs_id AND b.end_time = ss.last_end_time);

> > > Any pointers are much appreciated, or advice for other approaches.
> > > 
> > > 
> > > (I'll submit patches when I'm done, if you want them: I'm guessing
> > > somebody else may want this ability, too)
> > 
> > BTW, something you should be aware of in version 0.4. It's got code to
> > throttle updates when it's running behind. Previously it could take
> > hours or more for an update to happen if it was far behind. 0.4 will
> > limit how much data is processed. But the problem is that if you add a
> > new RRS to an existing system, the update of that new RRS won't work
> > properly and it will miss a lot of data. If you need this functionality
> > you should probably stick to 0.3 for now (or help me fix the update :P).
> 
> Thanks for the warning... I didn't realise it'd miss data for
> newly-added RRSes. I'll be creating and removing them frequently, as
> customers join and leave (hopefully not many of the latter :)
> 
> Ideally, I'd like to fix that, if I can. My plpgpsql skills are
> non-existant, though: everything I've written is either pure SQL or C.

Yeah, it's on my to-do list, I just haven't gotten to it yet. Before
doing it I need to clean the code up, first. Basically I think I'll be
adding a first_end_time to source_status and using that to determine
what data to process.
-- 
Jim C. Nasby, Database Consultant               decibel at decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


More information about the rrs-users mailing list