I try to convert a Procedure that join 8 tables with INNER AND OUTER JOIN, my understanding is that the Lookup task is the one to use and I should break these joins into smaller block, it takes a long time to load when I do this, since each of these tables had 10-40mill. rows and I have 8 tables to go thru, currently this Stored Procedure took 3-4min to run, by converting this to 8 Lookup tasks, it ran for 20min. has anyone run into this issue before and know a work around for this.
Thanks
Are you aware of the different caching types of the Lookup Tranformation? by default the lookup caches the whole data set from your lookup table/query; if you are not using a query to limit the number of columns/rows to only the ones the lookup needs, so the cached data is limited to what you really need, this would be a good point to start. Try, if possible, to re-arrange the tasks in your data flow so your lookup transforms are based on the tables with the lesser rows.
You can see more information about lookup tranformation here: http://msdn2.microsoft.com/en-us/library/ms141821.aspx
Lookup transforms may not be the best approach for all cases; if the performace you are getting for having your dataset out of a single query/SP is better than the lookup tasks approcah why would you want to change it?
Thanks
Rafael Salas
|||Yes, I am setting 500mb Cache size and select 3 integer and 2 varchar(50) fields only, I also try to re-arrange the lookup tables, but since all of them are big, it's limited what I can do by re-arranging them. The reason that I am doing this is to evaluate if it pay to convert our current procedure or not, the main thing that I try to test is performance, it's only make sense if SSIS tasks can out performance stored procedure during our data pull. I wish that SSIS allows input parameters mapping in the reference table TAB. this will allows me to limit number of lookup rows that get load into SSIS. also I found that the OLE DB Source task is very picky w/what kind of SQL script it can run, if you have a sub query that contains parameter, it won't take it, even it's perfectly ok to run in TSQL, this is so bad since sometime you get better performance by putting parms into sub query to limit number of rows before you join to another huge table.
Anyway I think I found a work around, I am currently switched over to use Merge Join task, and it seems to work better. But it still had not out performance current stored procedure. I just hope that some one out there w/similar issue, so I can compare note.
Thanks
|||
by using Merge Join trasnformation , I think you may actually be walking away from any performace gain. By definition Merge Join is an asynchonous trasnformation, so it would use more resources(memory) than the lookup transformation and more likely to have poorer performance; it actually requieres the inputs to be sorted . I think there are good reasons for not having a parameter mapping ability in the reference table tab of the lookup and I just can see that it would requiere to execute a query for every row comming to the lookup input. Actually if you want to test it, you can go to advanced tab and edit the query to include and map your parameters; the performance will hit the floor.
BTW, 500Mb in cache size may no be enough for the volume of data in your lookups.
You can review an interesting paper about SSIS performance at:
http://www.microsoft.com/technet/prodtechnol/sql/2005/ssisperf.mspx
Sorry if my post were not helpfull at all
Rafael Salas
|||
Thanks for you responses, it does help. I am just a big disappointed, since all the seminars that I attended seems to advise people to convert their SQL script into SSIS tasks for better maintenance and performance, but I have yet to find a work around that can outperform current stored procedure, since our source schema is fairly complex and required a lot of joins in order to pull data out, I guess you were right that SSIS tasks may not be the best choice for every situation in term of performance. Also, you are right about the Merge Join task, it actually performs 2-3 times worse than current stored procedure, and I guess I will stay w /my stored procedures for now.
Thanks a lot your help
No comments:
Post a Comment