Showing posts with label union. Show all posts
Showing posts with label union. Show all posts

Friday, March 23, 2012

loop through tables for UNION ALL

Hey folks,
I have a variable number of identically structured & similarly named
tables for which I would like to expeditiously execute a UNION ALL in
succession. e.g. "SELECT * from (x-2) UNION ALL SELECT * FROM (x-1) UNION
ALL SELECT * FROM x UNION ALL" etc... where (x-2), (x-1), x are three such
tables. I can pull all the tables I need to create this very large UNION ALL
query from INFORMATION_SCHEMA.TABLES, so is there any way to somehow
construct a looped stored procedure such that I don't have to manually type
in the full UNION ALL query?
Thanks,
Dave
--
______________________________
Remove "_SPAM" to reply directly.David Shorthouse wrote:
> Hey folks,
> I have a variable number of identically structured & similarly named
> tables
Why?

> for which I would like to expeditiously execute a UNION ALL in
> succession. e.g. "SELECT * from (x-2) UNION ALL SELECT * FROM (x-1) UNION
> ALL SELECT * FROM x UNION ALL" etc... where (x-2), (x-1), x are three such
> tables. I can pull all the tables I need to create this very large UNION A
LL
> query from INFORMATION_SCHEMA.TABLES, so is there any way to somehow
> construct a looped stored procedure such that I don't have to manually typ
e
> in the full UNION ALL query?
>
Create a UNION by querying the INFORMATION_SCHEMA and then cut and
paste the result. Put the UNION in a view and query that in your proc.
Take a look at the partitioned views topic in Books Online.
Better still, consider combining this into a single table unless it's a
case of "I'm not allowed to fix my database".
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||I'll take it as read that performance is not a problem here as if it were
the driving factor a DB redesign would not be out of the question.
Insead of trying to perform a union all, maybe a different appear would be
better...
1. Create a temporary table with the structure of your output.
2. create a cursor which will loop through the tablenames that you need to
query.
3. use the table name to create a dynamic sql statement (see sp_executesql)
that will populate the temporary table with the data that you need to
select.
4. after the loop, perform a single select on the temp table.
Be warned, the performance of this will be extremely bad.
Regards
Colin Dawson
www.cjdawson.com
"David Shorthouse" <davidshorthouse@.shaw_SPAM.ca> wrote in message
news:uX5XwbqPGHA.4900@.TK2MSFTNGP09.phx.gbl...
> Hey folks,
> I have a variable number of identically structured & similarly named
> tables for which I would like to expeditiously execute a UNION ALL in
> succession. e.g. "SELECT * from (x-2) UNION ALL SELECT * FROM (x-1) UNION
> ALL SELECT * FROM x UNION ALL" etc... where (x-2), (x-1), x are three such
> tables. I can pull all the tables I need to create this very large UNION
> ALL query from INFORMATION_SCHEMA.TABLES, so is there any way to somehow
> construct a looped stored procedure such that I don't have to manually
> type in the full UNION ALL query?
> Thanks,
> Dave
> --
> ______________________________
> Remove "_SPAM" to reply directly.
>|||
> Why?
This is simply the way the db has been designed. Individual clients access
individual tables.

> Create a UNION by querying the INFORMATION_SCHEMA and then cut and
> paste the result. Put the UNION in a view and query that in your proc.
> Take a look at the partitioned views topic in Books Online.
> Better still, consider combining this into a single table unless it's a
> case of "I'm not allowed to fix my database".
This is what I am attempting to do; scheduling a table creation by creating
a UNION ALL of these individual tables. Rather than having to manually add
these tables to the UNION ALL query prior to the creation of the one large
result, I am attempting to programmatically create this UNION ALL because
there are a variable number of individual tables. So, I'm not sure how the
idea you have above will accomplish this.
Dave|||
> Insead of trying to perform a union all, maybe a different appear would be
> better...
> 1. Create a temporary table with the structure of your output.
> 2. create a cursor which will loop through the tablenames that you need to
> query.
> 3. use the table name to create a dynamic sql statement (see
> sp_executesql) that will populate the temporary table with the data that
> you need to select.
> 4. after the loop, perform a single select on the temp table.
> Be warned, the performance of this will be extremely bad.
Colin,
Thanks for the ideas. Performance is not an issue here because I am indeed
creating a table from this large UNION ALL query result.|||>> This is simply the way the db has been designed. Individual clients acces
s individual tables. <<
Well, kill the moron who did this and fix it!! The name of this design
flaw is "attribute splitting" and it can take several forms.
1) Putting the same set of entities into many tables. There are two
sub-cases.
a.The absolute newbie usually mimics sequential numbered or
temporally labeled tapes in each table.
b. The newbie with a few ws experience splits the table on the
values of one or more attributes. For example, taking a Personnel
table and making it into "MalePersonnel" and "FemalePersonnel" tables.
2) Taking a single attribute and splitting in into multiple columns.
There are two sub-cases.
a. The multiple columns are in the same table. The most common
example is having a group of bit flag columns instead of a single
well-design encoding in a column. They have to be combined to get the
attribute's value and you need multi-column CHECK() constraints to
control the combinations.
b. The split column is in multiple tables. The most common example
is trying to force an OO design in an RDBMS, so that joins are required
to assemble a meaningful entity. . They have to be combined via a join
to get the attribute's value.|||As David has suggested, look at partitioned views. You can create a
single entity (a view) that contains all of the data from your tables.
Through the use of check constraints, the optimizer can do partition
exclusion, which means it will only select data from the appropriate
underlying tables. The VIEW is also able to be UPDATED, so you end up
dealing with a lot less mess than trying to dynamically select the
appropriate tables during your selection time.
Of course, the only real reason for using a partioned view is to help
performance when dealing with large datasets; in your case, it sounds
like you're having to do an end-run around a design choice that should
be reconsidered.
Stu

Saturday, February 25, 2012

Looking for query plan determination information

Hi,

I was just helping a coworker optimize a query. He had two versions:
one which used UNION for each value for which he was tallying results
and another query which used GROUP BY. Here is an aproximation of what
they were:

Query #1:
---
SELECT 12 AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
AND T.col2 = 5
UNION
SELECT 13 AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
AND T.col2 = 6

Query #2:
---
SELECT R.row_num AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
INNER JOIN Report_Rows R ON R.col2 = T.col2
WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
GROUP BY ALL R.row_num
ORDER BY R.row_num

The Report_Rows table in this case would have had two rows mapping row
12 to a column value of 5 and row 13 to a column value of 6. The
second query was performing horribly until I noticed the ALL keyword
in the GROUP BY, which I didn't think was necessary. When I removed
that it performed more like I expected it to perform.

Before I had noticed that I was scouring over the query plans and
couldn't figure out why in one instance the query optimizer chose to
join My_Table and Another_Table, yet when the ALL keyword was there it
chose to return all of the records from Another_Table (a rather large
table) and join it to the Report_Rows table before then joining to
My_Table, which had the date criteria in the WHERE clause.

So, if you've read this far without giving up...

1. Why would the ALL keyword cause this? I understand the
functionality of ALL, but I still don't see why that caused the
reordering of the joins.

2. (more importantly) Are there any good resources that you know of
that explain how the query optimizer choices its query paths? Do the
"Inside SQL Server" books go into that much detail? Any good online
resources?

Thanks!
-Tom."Thomas R. Hummel" <tom_hummel@.hotmail.com> wrote in message
news:a2c0eeb8.0309160626.901177@.posting.google.com ...
> Hi,
> I was just helping a coworker optimize a query. He had two versions:
> one which used UNION for each value for which he was tallying results
> and another query which used GROUP BY. Here is an aproximation of what
> they were:
> Query #1:
> ---
> SELECT 12 AS [Row],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
> DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
> @.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
> END), 0) AS [Tuesday]
> FROM My_Table T
> INNER JOIN Another_Table T2 ON T2.col1 = T.col1
> WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
> AND T.col2 = 5
> UNION
> SELECT 13 AS [Row],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
> DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
> @.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
> END), 0) AS [Tuesday]
> FROM My_Table T
> INNER JOIN Another_Table T2 ON T2.col1 = T.col1
> WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
> AND T.col2 = 6
> Query #2:
> ---
> SELECT R.row_num AS [Row],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN @.week_start_date AND
> DATEADD(d, 1, @.week_start_date) THEN 1 ELSE 0 END), 0) AS [Monday],
> ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
> @.week_start_date) AND DATEADD(d, 2, @.week_start_date) THEN 1 ELSE 0
> END), 0) AS [Tuesday]
> FROM My_Table T
> INNER JOIN Another_Table T2 ON T2.col1 = T.col1
> INNER JOIN Report_Rows R ON R.col2 = T.col2
> WHERE T.my_date BETWEEN @.week_start_date AND @.week_end_date
> GROUP BY ALL R.row_num
> ORDER BY R.row_num
> The Report_Rows table in this case would have had two rows mapping row
> 12 to a column value of 5 and row 13 to a column value of 6. The
> second query was performing horribly until I noticed the ALL keyword
> in the GROUP BY, which I didn't think was necessary. When I removed
> that it performed more like I expected it to perform.
> Before I had noticed that I was scouring over the query plans and
> couldn't figure out why in one instance the query optimizer chose to
> join My_Table and Another_Table, yet when the ALL keyword was there it
> chose to return all of the records from Another_Table (a rather large
> table) and join it to the Report_Rows table before then joining to
> My_Table, which had the date criteria in the WHERE clause.
> So, if you've read this far without giving up...
> 1. Why would the ALL keyword cause this? I understand the
> functionality of ALL, but I still don't see why that caused the
> reordering of the joins.
> 2. (more importantly) Are there any good resources that you know of
> that explain how the query optimizer choices its query paths? Do the
> "Inside SQL Server" books go into that much detail? Any good online
> resources?
> Thanks!
> -Tom.

It's almost impossible (at least for me) to know why the optimizer chose a
particular plan without knowing the table structures, indexes and amount of
data, and even with that knowledge, it may not be clear at all. So I can't
say much about your first question, but I can definitely recommend Inside
SQL Server 2000 for a great explanation of what the optimizer considers when
it produces a query plan. There's a lot of detail, including how to go about
using query plans to tune individual queries. Another useful book is
Advanced Transact SQL for SQL Server 2000, which also explains many of the
examples with reference to their query plans.

Simon|||Thomas R. Hummel (tom_hummel@.hotmail.com) writes:
> Before I had noticed that I was scouring over the query plans and
> couldn't figure out why in one instance the query optimizer chose to
> join My_Table and Another_Table, yet when the ALL keyword was there it
> chose to return all of the records from Another_Table (a rather large
> table) and join it to the Report_Rows table before then joining to
> My_Table, which had the date criteria in the WHERE clause.

I can only echo Simon's reply that without table definitions etc, this
is difficult to tell. In fact, even with all information available,
this might be difficult to tell. Understanding the output of a cost-
based optimizer is by no means an easy task.

> 2. (more importantly) Are there any good resources that you know of
> that explain how the query optimizer choices its query paths? Do the
> "Inside SQL Server" books go into that much detail? Any good online
> resources?

Certainly, you learn a great deal from Kalen's book. But I also like to
add that that experience counts a lot too. And some creative thinking.
The basic thing to understand is why a table scan may be better than
an index seek. This is something which also can be extended to joins.
That is a scan + merge/hash join may be faser than seek + loop join.

But then there are all such wild things which includes parallelism that
I find myself understanding only fragments of.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thank you both for the input. I had tried to duplicate the effect with
test tables, but as you know, the query optimizer takes a lot into
account and I couldn't find an example that would be practical for
posting here.

I will give the two books that Simon suggested a more thorough read.
I've seen adverts for Kalen's online webinars as well, so perhaps I'll
look into those.

> But then there are all such wild things which includes parallelism that
> I find myself understanding only fragments of.
I hate to think of something that is complex enough that you have
trouble understanding it Erland... ;-)

Thanks again!
-Tom.