Monday, March 12, 2012

Lookup OR Join?

Does anyone have experience with comparing the performance in dealing with very large data sets to determine the presence of a record based on key fields in order to determine wheter to update or insert a record to a table? With another ETL tool, large lookups perform better with a join transformation component, rather than an actual lookup component, because of the large caching requirements.

In using a join, you simply base the condition on the result of the join, as you can with an SSIS lookup component 'erroring' out. Some tools also have an actual 'upsert' component that helps in achieving such a scenario.

Thoughts?Joe,
During the research for this article: http://www.sqlis.com/default.aspx?311 I was comparing the 2 methods. LOOKUP performed significantly better although it wasn't on a big dataset.

Also I have a case study coming out on my blog imminently (it has to go through manager approval first) in which the 2 methods are compared on a very very large dataset (millions of rows). The performance increase when using LOOKUPs was breathtaking - I can't give details as I'm under unofficial NDA at the moment...but look out for it in the next few days on http://blogs.conchango.com/jamiethomson.

-Jamie|||I have been using a SQL join and creating a new column with either a 'U' for update or 'I' for insert. I am comparing a small number of rows against a large dataset. I am interested in the results of your case study, Jamie - I will watch for it.
-Evan Black

No comments:

Post a Comment