query equivalent: lookup and commit; How it works

Monday, March 12, 2012

lookup and commit; How it works

Lookup and commit; How it works
I am importing data from text file and I have data lookup to check reference table.

If the reference row doesn't exist I want to add row to reference table, other wise row is added to the detail table.

I am using oledb destination to saving reference table row with rows per batch to 1 and maximum insert commit size to 1.

When I run the package duplicate rows are in grid view. How can duplicates end in there the commit size is 1? Next time the data exists in reference table and should be going to detail not to reference table.

Funny but this was just answered in another thread about the lookup cache. The dataflow works on buffers not rows so the lookup transform gets a whole buffer to process at one time. It then looks up every row in this buffer and after that passes it on downstream. So any row with the same key in that buffer will be looked up in the same manner and sent to the same output for processing. No setting on another transform can change this. Furthermore, depending on the cache type the lookup will load all its cache into memory (full and partial cache) so it will never (or not reliably) be updated by any changes to the underlying data.

HTH,

Matt

|||Well, I can guess that. How to fix this? add another transform? where? after lookup?|||

Well, you can't fix it. This is the design of the dataflow. You can try some workarounds such as setting the lookup to no cache mode or the dataflow's max rows to a low number (note that setting it to 1 doesn't work because the dataflow does rounding in order to not waste memory). However, none of these are certain to work in all cases. The only option is to write your own custom or script component that has logic to circumvent the buffering.

Matt

|||

HI, I have similar problem. Setting lookup to no cache or the dataflow to low number of max row did not work at all. I ended up building my own asynchronous sript component in order to achieve what the lookup or SCD wizard should be able to do.

Ccote

|||

We are currently investigating a ton of ideas around lookup, and this pattern is one we are taking into consideration.

However, I do want to point out that adding such a pattern requires us to add more logic to the lookup component and we are somewhat cautious about that. SSIS components are typically very tightly scoped in their functionality - design patterns are built with several different smaller components rather than in one more complex "catch all component." Lookup us an example of a component that does one fairly isolated function - complete integration patterns are built around it using other components.

The pattern we are dicussing here is actually more complicated than might first be thought - especially when one looks at the possibility of errors creeping in through system problems or bad data. (I have never in my career seen a data integration process where unexpected errors of all sorts did not creep in, so I tend to be cheerfully pessimistic in my designs.)

I'll recap the current requirement as follows, to make it clear for other readers:

We want to lookup the key from an incoming record in a reference table.|||

Great!!! It's really helpful. I have broken the process into two steps; first step add to reference table and next step lookup works(the way it should).

I can understand the import will be slow if every row is commited before look up, but there should be option, sometime it's should work logically and of course slow is not a word these days with powerful machines.

Thanks again Donald for your response .

Monday, March 12, 2012

lookup and commit; How it works

No comments:

Post a Comment

query equivalent

Blog Archive

About Me