Evaluating user-defined conditions

Chris Jack chris_jack at msn.com
Wed Jun 11 13:41:28 BST 2014


Roger Bell_West <roger at firedrake.org> wrote:

>On Tue, Jun 10, 2014 at 11:59:57AM +0100, Chris Jack wrote:
>>Can I suggest you consider including some rudimentary idea of "cost"
>>when you're deciding whether to allow the query to run or not. "Cost"
>>could be in terms of anticipated rows returned and/or total
>>anticipated CPU time.

>Yeah, it shouldn't be that hard to write code to evaluate an
>expression and work out how long it'll take to run.

A very rudimentary approach would be to disallow expressions without an equality on a column (or better at least one column from a list of columns with known good selectivity).

A better approach (closer to what cost based databases do) would be to create a size n histogram of values of all columns that can be in the restriction: n = 20 is typical. Based on this, you can approximate how many rows both equalities and inequalities would probably return. A better solution might consider column value correlation.

Some databases have features that allow you to grab the estimated run time from the cost based optimiser without having to run the query for real. It would be very quick to implement something that grabbed that but obviously it requires a database with the feature.

Clearly whether it's worthwhile at all depends on the maximum number of rows that can be returned, how much of a problem it is if long queries run and the amount of development time available.

>From memory, commercial reporting tools like Business Objects allow specification of a maximum run time/row count restriction so looking into Dave's proposal if you want to implement an equivalent might be the go. For a robust solution, this is probably essential: but it has the obvious downside that the query still had to run for the maximum time. So it's better to also identify badly performing queries ahead of time.

Users cannot be trusted. Even if it's work, unless a long running query is not possible for other reasons, users will create queries that run a long time.

Chris 		 	   		  


More information about the london.pm mailing list