rrs-commit: r62 - trunk

decibel at decibel.org decibel at decibel.org
Wed Mar 30 23:33:11 GMT 2005

Author: decibel
Date: Wed Mar 30 23:33:10 2005
New Revision: 62


Modified: trunk/README
--- trunk/README	(original)
+++ trunk/README	Wed Mar 30 23:33:10 2005
@@ -71,6 +71,86 @@
 I've ommitted the details of how data is grouped into time buckets from the
 SELECT statements, but that is part of the code that isn't configurable. 
+The throttling system works by processing limited time periods of raw input
+data. Previously, all data that was available would be processed at once.
+Any time RRS detects that it is 'behind', it will engage the throttling code.
+There is a setting ('desired run time', see below) that controls the maximum
+amount of time that RRS should run when throttling. Note that this isn't a hard
+limit, and that the commit time at the end of the run can not be accounted for.
+This commit time can be quite substantial, and 'desired run time' must be
+adjusted to account for it.
+The throttling code will take a look at the history of run time for runs when
+the throttling code was active. This is stored in the history_data_interval
+table. In addition to logging how long each run took, this table also logs what
+data interval was used. Using these two pieces of information, RRS calculates
+the ratio of data interval to run time, and multiplies that by the desired run
+time. This provides a new target data interval. This is done for each row of
+history_data_interval (it's actually done in the
+history_data_interval_run_time_v view), and the average of the target data
+interval (next_data_interval in the view) is used as the amount of data to
+process in this run.
+It's not quite this simple though. Because this is a very crude method of
+estimation, it's easy for it to produce numbers that are very out of wack.
+Because of this, limits are imposed. There are hard limits for data interval,
+both minimum and maximum. This ensures that RRS always process at least some
+data, and that RRS doesn't process too much data. There is also a slew-rate
+limit, that prevents the data interval from being increased by more than 3x
+what it was for the last run.
+There are some settings that control the operation of RRS:
+decibel=# select * from setting;
+         setting_name         |  setting   
+ desired run time             | 30 seconds
+ minimum data interval        | 00:08:20
+ maximum data interval        | 13:53:20
+ history length               | 100
+ history_data_interval length | 20
+'Data interval' refers to the interval of data that will be processed from an
+input table. RRS updates in chunks by limiting the amount of data that will be
+processed from the source table.
+desired run time                This is the maximum amount of time you would like a run to take.
+minimum data interval           The minimum amount of source data to process at one time.
+maximum data interval           The maximum amount of source data to process at one time.
+history length                  How many runs of RRS to keep history for.
+history_data_interval length    How many throttled runs of RRS to keep history for.
+Initially, these aren't set at all. The first time update is run, it (well,
+actually calculate_run_time(..) and log_time(..)) will set them to the
+following default values:
+desired run time                30 seconds
+minimum data interval           desired rut time * 10
+maximum data interval           minimum data interval * 100
+history length                  100
+history_data_interval length    20
+These defaults are reasonable for most hardware if RRS is being run once a
+The functions setting_get(setting name (text)) and setting_set(setting name
+(text), setting value (text) are provided for getting and changing settings.
+You may also operate on the table directly. For example, the following might be
+good if you onlyrun RRS every 10 minutes:
+DELETE FROM rrs.setting;
+SELECT rrs.setting_set('desired run time', '6 minutes'); -- Remember to allow for commit time
+SELECT rrs.setting_set('minimum data interval', '10 minutes');
+Note that RRS will always process at least one bucket interval on a run, no
+matter what the throttling minimum is. This ensures that at least some work is

More information about the rrs-commit mailing list