query equivalent: optimize

Showing posts with label optimize. Show all posts

Friday, March 9, 2012

Looking to optimize a query

Hi, I have a query which is rather heavy which calculates commissions. It
is essentially a table of sales which relate to a table of commission
ranges and exceptions. Due to the way in which it works using left/right
joins does not work.
So, what I have done is a select in the following way
select
field1, field2, field3,
(select top 1 comission from comm
where key1=field2 or key2=field2 or key3=field3
order by nnnn)
from sales
and this works. However, I have to select more than one field from the comm
table, so I have
select
field1, field2, field3,
(select top 1 comission from comm
where key1=field2 or key2=field2 or key3=field3
order by nnnn),
(select top 1 range from comm
where key1=field2 or key2=field2 or key3=field3
order by nnnn),
(select top 1 comissionname from comm
where key1=field2 or key2=field2 or key3=field3
order by nnnn)
from sales
so in essence I am matching a row from the sales table to a single row in
the commisions table and taking three fields from the commisions table.
my problem ? - performance.
Is there any better way to do this? Is there a syntax to pull more than one
field from a subselect into the main query? Fields comission, range and
comissionname are all from the same row.
Alternatly is there any way to get a right join to only join the first row
which matches and not all ?
Regards
Ian MurphyIan Murphy wrote:
> Hi, I have a query which is rather heavy which calculates commissions. It
> is essentially a table of sales which relate to a table of commission
> ranges and exceptions. Due to the way in which it works using left/right
> joins does not work.
> So, what I have done is a select in the following way
> select
> field1, field2, field3,
> (select top 1 comission from comm
> where key1=field2 or key2=field2 or key3=field3
> order by nnnn)
> from sales
> and this works. However, I have to select more than one field from the comm
> table, so I have
> select
> field1, field2, field3,
> (select top 1 comission from comm
> where key1=field2 or key2=field2 or key3=field3
> order by nnnn),
> (select top 1 range from comm
> where key1=field2 or key2=field2 or key3=field3
> order by nnnn),
> (select top 1 comissionname from comm
> where key1=field2 or key2=field2 or key3=field3
> order by nnnn)
> from sales
> so in essence I am matching a row from the sales table to a single row in
> the commisions table and taking three fields from the commisions table.
> my problem ? - performance.
> Is there any better way to do this? Is there a syntax to pull more than one
> field from a subselect into the main query? Fields comission, range and
> comissionname are all from the same row.
> Alternatly is there any way to get a right join to only join the first row
> which matches and not all ?
> Regards
> Ian Murphy
Haven't had my caffeine injection yet this morning, but I think
something like this is what you want. You should also review the
execution plan to look for potential bottlenecks:
SELECT
sales.field1,
sales.field2,
sales.field3,
comm.commission,
comm.range,
comm.commissionname
FROM sales sales
LEFT JOIN comm comm
ON
(
sales.field1 = comm.key1
OR sales.field2 = comm.key2
OR sales.field3 = comm.key3
)
WHERE comm.nnnnn = (
SELECT MIN(nnnnn)
FROM comm
WHERE
(
sales.field1 = comm.key1
OR sales.field2 = comm.key2
OR sales.field3 = comm.key3
)
)
Tracy McKibben
MCDBA
http://www.realsqlguy.com|||Ian,
There are a few options. Here is one of them...
SELECT field1, field2, field3, comission, range, comissionname
FROM comm
INNER JOIN (
SELECT key1, key2, key3, MIN(nnnn) AS min_nnnn
FROM comm
GROUP BY key1, key2, key3
) T
ON T.key1 = comm.key1
AND T.key2 = comm.key2
AND T.key3 = comm.key3
AND T.min_nnnn = comm.nnnn
RIGHT JOIN sales
ON field1 = comm.key1
AND field2 = comm.key2
AND field3 = comm.key3
Hope this helps,
Gert-Jan
Ian Murphy wrote:
> Hi, I have a query which is rather heavy which calculates commissions. It
> is essentially a table of sales which relate to a table of commission
> ranges and exceptions. Due to the way in which it works using left/right
> joins does not work.
> So, what I have done is a select in the following way
> select
> field1, field2, field3,
> (select top 1 comission from comm
> where key1=field2 or key2=field2 or key3=field3
> order by nnnn)
> from sales
> and this works. However, I have to select more than one field from the comm
> table, so I have
> select
> field1, field2, field3,
> (select top 1 comission from comm
> where key1=field2 or key2=field2 or key3=field3
> order by nnnn),
> (select top 1 range from comm
> where key1=field2 or key2=field2 or key3=field3
> order by nnnn),
> (select top 1 comissionname from comm
> where key1=field2 or key2=field2 or key3=field3
> order by nnnn)
> from sales
> so in essence I am matching a row from the sales table to a single row in
> the commisions table and taking three fields from the commisions table.
> my problem ? - performance.
> Is there any better way to do this? Is there a syntax to pull more than one
> field from a subselect into the main query? Fields comission, range and
> comissionname are all from the same row.
> Alternatly is there any way to get a right join to only join the first row
> which matches and not all ?
> Regards
> Ian Murphy|||hmm, hadn't thought of using a subselect in the where condition. I'll have
a go and see if it speeds things up a bit.
Thanks
Ian|||On Mon, 17 Jul 2006 07:17:28 -0500, Tracy McKibben wrote:
>Haven't had my caffeine injection yet this morning, but I think
>something like this is what you want.
Hi Tracy,
Now that you've had your first cup of coffee, I'm sure you'll agree that
Ian will have to replace WHERE with AND in order to make it work as
required.
> You should also review the
>execution plan to look for potential bottlenecks:
>SELECT
> sales.field1,
> sales.field2,
> sales.field3,
> comm.commission,
> comm.range,
> comm.commissionname
>FROM sales sales
>LEFT JOIN comm comm
> ON
> (
> sales.field1 = comm.key1
> OR sales.field2 = comm.key2
> OR sales.field3 = comm.key3
> )
>WHERE comm.nnnnn => (
> SELECT MIN(nnnnn)
> FROM comm
> WHERE
> (
> sales.field1 = comm.key1
> OR sales.field2 = comm.key2
> OR sales.field3 = comm.key3
> )
> )
Hugo Kornelis, SQL Server MVP|||Hugo Kornelis wrote:
> On Mon, 17 Jul 2006 07:17:28 -0500, Tracy McKibben wrote:
>> Haven't had my caffeine injection yet this morning, but I think
>> something like this is what you want.
> Hi Tracy,
> Now that you've had your first cup of coffee, I'm sure you'll agree that
> Ian will have to replace WHERE with AND in order to make it work as
> required.
>
No, actually I intended that to be a WHERE clause. I think in this
instance it will work either way. '
Tracy McKibben
MCDBA
http://www.realsqlguy.com|||On Tue, 18 Jul 2006 15:32:16 -0500, Tracy McKibben wrote:
>Hugo Kornelis wrote:
>> On Mon, 17 Jul 2006 07:17:28 -0500, Tracy McKibben wrote:
>> Haven't had my caffeine injection yet this morning, but I think
>> something like this is what you want.
>> Hi Tracy,
>> Now that you've had your first cup of coffee, I'm sure you'll agree that
>> Ian will have to replace WHERE with AND in order to make it work as
>> required.
>No, actually I intended that to be a WHERE clause. I think in this
>instance it will work either way. '
Hi Tracy,
If every row in the sales table has at least one matching row in the
comm table, then it will work either way - but in that case, I see no
reason to use the LEFT JOIN instead of an INNER JOIN.
If there might be rows in the sales tabel without matching rows in the
comm table, then the LEFT JOIN will retain those rows, with NULL for all
values from the comm table (just as in Ian's original query) - but the
WHERE clause will throw them out after all, since it attempts to compare
NULL to the result of a subquery - and a comparison that involves NULL
can never evaluate to True.
Of course, since Ian didn't post anything to test it with, we don't know
if there can be any sales without commission in his database. :-)
--
Hugo Kornelis, SQL Server MVP|||Hugo Kornelis wrote:
> If every row in the sales table has at least one matching row in the
> comm table, then it will work either way - but in that case, I see no
> reason to use the LEFT JOIN instead of an INNER JOIN.
> If there might be rows in the sales tabel without matching rows in the
> comm table, then the LEFT JOIN will retain those rows, with NULL for all
> values from the comm table (just as in Ian's original query) - but the
> WHERE clause will throw them out after all, since it attempts to compare
> NULL to the result of a subquery - and a comparison that involves NULL
> can never evaluate to True.
> Of course, since Ian didn't post anything to test it with, we don't know
> if there can be any sales without commission in his database. :-)
>
Doh! I see what you're talking about now... You're right, that belongs
in the join...
Tracy McKibben
MCDBA
http://www.realsqlguy.com|||There can indeed be sales without commissions. This is the easy part to
deal with. What has caused me a nightmare with this query is that the
commission is calculated depending on the discount. But the commission can
be different for the product, product family or the client in question. AND
it is applied in ranges, so a discount of 5% will earn 10% commission but a
discount of 15% will earn 3% and so on.
There is a simple order of priority to the above.
Thus for each line there are lots of possible commissions which could apply
(or none) but only the first should be selected, hence my solution of
select field1, field2, field3,
(select top 1 commission from comm table where xxxxxxx order by
zzzz,yyy,xxx)
but maybe I have to take another look at using a join.
Ian|||Unfortunatly I already tried that route. I can't use the min() function
because there is no row which will have a min value. The order of the
subselect will be by family, productid and client ref, meaning I don't have
a numeric or alphanumeric value which will be the lowest.
I can order the subselect so that the value I am after is the top row but
can't then pick it for the join.
I think my solution will be to subselect an index value from the comms
table and use that for the left join.
Thanks
Ian|||Well I've managed an improvement, but its not great. I was unable to use
either your suggestion or Tracy's as they both relied on min() to select a
row, which doesn't work in my case. The only was I see to select the
commision row I want is to order the select result and select top 1
As it happens the comm table contains an index and I so I do
select field1, field2, field3,
(select top 1 indexfield from comm where xxx order by aaa,bbb,ccc) as
index
which I save as a view
and I then have another view which does
select * from view1 left join comm on view1.index=comm.index
It runs about 3-4 times faster but is still slow due to the above select
top 1.
Not sure there is really a better way to achieve what I'm trying to do.
Ian|||If you have multiple columns to order by, then it is still possible to
use the method I posted (using derived tables), but it will be more
complex, and might not give the desired performance gain. The derived
table would look something like this:
SELECT JoinKeys, family, productid, MIN("client ref") as "client ref"
FROM ..
INNER JOIN (
SELECT JoinKeys, family, MIN(productid) as min_productid
FROM ...
INNER JOIN (
SELECT JoinKeys, MIN(family) as min_family
FROM ..
GROUP BY JoinKeys
) T1 ON ..JoinKeys = ..JoinKeys AND ..family = ..min_family
GROUP BY JoinKeys, family
) T2 ON ..JoinKeys = ..JoinKeys AND ..family = ..family AND
..productid = ..min_productid
BTW: If you post incomplete queries then you can expect solutions that
are not usuable in your situation.
Gert-Jan
Ian Murphy wrote:
> Unfortunatly I already tried that route. I can't use the min() function
> because there is no row which will have a min value. The order of the
> subselect will be by family, productid and client ref, meaning I don't have
> a numeric or alphanumeric value which will be the lowest.
> I can order the subselect so that the value I am after is the top row but
> can't then pick it for the join.
> I think my solution will be to subselect an index value from the comms
> table and use that for the left join.
> Thanks
> Ian|||Ian,
Another general approach (since you didn't give specifics for a repro) is
select
f1, f2, f3,
cast(substring(commission_range_commissionname,1,25) as money) as
commission,
rtrim(substring(commission_range_commissionname,26,2)) as range,
rtrim(substring(commission_range_commissionname,28,30)) as commissionname
from (
select
f1, f2, f3,
(
select top 1
cast(commission as char(25)) +
cast(range as char(2)) +
cast(commissionname as char(30))
from comm
where key1=f1 or key2=f2 or key3= f3
order by nnnn
) as commission_range_commissionname
) as T
You could probably also solve this in SQL Server 2005 using the APPLY
operator.
Steve Kass
Drew University
www.stevekass.com
"Ian Murphy" <ian@.integra.antispam.-xp.com> wrote in message
news:o9rgie851g1p.icxpoxp9ubv2.dlg@.40tude.net...
> Well I've managed an improvement, but its not great. I was unable to use
> either your suggestion or Tracy's as they both relied on min() to select a
> row, which doesn't work in my case. The only was I see to select the
> commision row I want is to order the select result and select top 1
> As it happens the comm table contains an index and I so I do
> select field1, field2, field3,
> (select top 1 indexfield from comm where xxx order by aaa,bbb,ccc) as
> index
> which I save as a view
> and I then have another view which does
> select * from view1 left join comm on view1.index=comm.index
> It runs about 3-4 times faster but is still slow due to the above select
> top 1.
> Not sure there is really a better way to achieve what I'm trying to do.
> Ian
>|||Thanks for the suggestion steve. I had thought of doing something similar
but didn't try it as I thought it would rather heavy and I was sure there
had to be a more elegant solution. Converting to strings and back again
seems a lot of unnecessary work.|||On Wed, 19 Jul 2006 14:40:06 +0200, Ian Murphy wrote:
>There can indeed be sales without commissions. This is the easy part to
>deal with. What has caused me a nightmare with this query is that the
>commission is calculated depending on the discount. But the commission can
>be different for the product, product family or the client in question. AND
>it is applied in ranges, so a discount of 5% will earn 10% commission but a
>discount of 15% will earn 3% and so on.
>There is a simple order of priority to the above.
>Thus for each line there are lots of possible commissions which could apply
>(or none) but only the first should be selected, hence my solution of
>select field1, field2, field3,
> (select top 1 commission from comm table where xxxxxxx order by
>zzzz,yyy,xxx)
>but maybe I have to take another look at using a join.
Hi Ian,
It might help if you could post CREATE TABLE statements to show us the
structure of your tables (please do include all constraints, indexes,
and properties), INSERT statements with some sample data to illustrate
the problem, and expected results. That would probably give you better
answers faster than continuing this game of twenty questions.
--
Hugo Kornelis, SQL Server MVP|||i didn't post either the query or the table structure because it involves
- a db written in france with slightly cryptic column names in french
- a couple sub queries pulling data from 5 or 6 tables
- A main query pulling data from 4 tables/queries
- output column names in spanish
- its rather a large query in any case
so it didn't seem to make much sense, its unlikely anyone would have been
much the wiser due to the rather cryptic column names.
Ian

Saturday, February 25, 2012

Looking for query plan determination information

Hi,

I was just helping a coworker optimize a query. He had two versions:
one which used UNION for each value for which he was tallying results
and another query which used GROUP BY. Here is an aproximation of what
they were:

Query #1:
---
SELECT 12 AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
AND T.col2 = 5
UNION
SELECT 13 AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
AND T.col2 = 6

Query #2:
---
SELECT R.row_num AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
INNER JOIN Report_Rows R ON R.col2 = T.col2
WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
GROUP BY ALL R.row_num
ORDER BY R.row_num

The Report_Rows table in this case would have had two rows mapping row
12 to a column value of 5 and row 13 to a column value of 6. The
second query was performing horribly until I noticed the ALL keyword
in the GROUP BY, which I didn't think was necessary. When I removed
that it performed more like I expected it to perform.

Before I had noticed that I was scouring over the query plans and
couldn't figure out why in one instance the query optimizer chose to
join My_Table and Another_Table, yet when the ALL keyword was there it
chose to return all of the records from Another_Table (a rather large
table) and join it to the Report_Rows table before then joining to
My_Table, which had the date criteria in the WHERE clause.

So, if you've read this far without giving up...

1. Why would the ALL keyword cause this? I understand the
functionality of ALL, but I still don't see why that caused the
reordering of the joins.

2. (more importantly) Are there any good resources that you know of
that explain how the query optimizer choices its query paths? Do the
"Inside SQL Server" books go into that much detail? Any good online
resources?

Thanks!
-Tom."Thomas R. Hummel" <tom_hummel@.hotmail.com> wrote in message
news:a2c0eeb8.0309160626.901177@.posting.google.com ...
> Hi,
> I was just helping a coworker optimize a query. He had two versions:
> one which used UNION for each value for which he was tallying results
> and another query which used GROUP BY. Here is an aproximation of what
> they were:
> Query #1:
> ---
> SELECT 12 AS [Row],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
> DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
> @.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
> END), 0) AS [Tuesday]
> FROM My_Table T
> INNER JOIN Another_Table T2 ON T2.col1 = T.col1
> WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
> AND T.col2 = 5
> UNION
> SELECT 13 AS [Row],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
> DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
> @.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
> END), 0) AS [Tuesday]
> FROM My_Table T
> INNER JOIN Another_Table T2 ON T2.col1 = T.col1
> WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
> AND T.col2 = 6
> Query #2:
> ---
> SELECT R.row_num AS [Row],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
> DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
> @.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
> END), 0) AS [Tuesday]
> FROM My_Table T
> INNER JOIN Another_Table T2 ON T2.col1 = T.col1
> INNER JOIN Report_Rows R ON R.col2 = T.col2
> WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
> GROUP BY ALL R.row_num
> ORDER BY R.row_num
> The Report_Rows table in this case would have had two rows mapping row
> 12 to a column value of 5 and row 13 to a column value of 6. The
> second query was performing horribly until I noticed the ALL keyword
> in the GROUP BY, which I didn't think was necessary. When I removed
> that it performed more like I expected it to perform.
> Before I had noticed that I was scouring over the query plans and
> couldn't figure out why in one instance the query optimizer chose to
> join My_Table and Another_Table, yet when the ALL keyword was there it
> chose to return all of the records from Another_Table (a rather large
> table) and join it to the Report_Rows table before then joining to
> My_Table, which had the date criteria in the WHERE clause.
> So, if you've read this far without giving up...
> 1. Why would the ALL keyword cause this? I understand the
> functionality of ALL, but I still don't see why that caused the
> reordering of the joins.
> 2. (more importantly) Are there any good resources that you know of
> that explain how the query optimizer choices its query paths? Do the
> "Inside SQL Server" books go into that much detail? Any good online
> resources?
> Thanks!
> -Tom.

It's almost impossible (at least for me) to know why the optimizer chose a
particular plan without knowing the table structures, indexes and amount of
data, and even with that knowledge, it may not be clear at all. So I can't
say much about your first question, but I can definitely recommend Inside
SQL Server 2000 for a great explanation of what the optimizer considers when
it produces a query plan. There's a lot of detail, including how to go about
using query plans to tune individual queries. Another useful book is
Advanced Transact SQL for SQL Server 2000, which also explains many of the
examples with reference to their query plans.

Simon|||Thomas R. Hummel (tom_hummel@.hotmail.com) writes:
> Before I had noticed that I was scouring over the query plans and
> couldn't figure out why in one instance the query optimizer chose to
> join My_Table and Another_Table, yet when the ALL keyword was there it
> chose to return all of the records from Another_Table (a rather large
> table) and join it to the Report_Rows table before then joining to
> My_Table, which had the date criteria in the WHERE clause.

I can only echo Simon's reply that without table definitions etc, this
is difficult to tell. In fact, even with all information available,
this might be difficult to tell. Understanding the output of a cost-
based optimizer is by no means an easy task.

> 2. (more importantly) Are there any good resources that you know of
> that explain how the query optimizer choices its query paths? Do the
> "Inside SQL Server" books go into that much detail? Any good online
> resources?

Certainly, you learn a great deal from Kalen's book. But I also like to
add that that experience counts a lot too. And some creative thinking.
The basic thing to understand is why a table scan may be better than
an index seek. This is something which also can be extended to joins.
That is a scan + merge/hash join may be faser than seek + loop join.

But then there are all such wild things which includes parallelism that
I find myself understanding only fragments of.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thank you both for the input. I had tried to duplicate the effect with
test tables, but as you know, the query optimizer takes a lot into
account and I couldn't find an example that would be practical for
posting here.

I will give the two books that Simon suggested a more thorough read.
I've seen adverts for Kalen's online webinars as well, so perhaps I'll
look into those.

> But then there are all such wild things which includes parallelism that
> I find myself understanding only fragments of.
I hate to think of something that is complex enough that you have
trouble understanding it Erland... ;-)

Thanks again!
-Tom.

Friday, March 9, 2012

Looking to optimize a query

Looking to optimize a query

Saturday, February 25, 2012

Looking for query plan determination information

query equivalent

Blog Archive

About Me