query equivalent: Lookup - full caching vs partial caching vs no caching

Friday, March 9, 2012

Lookup - full caching vs partial caching vs no caching

I needed to do lookup on tables with approx 1 million records (how else do I know if record already exists?).

Full caching:
SSIS caches all million records (consuming around 1 GB RAM) and cripples my lowly system with only 1GB physical RAM. This works for now, but the lookup tables will keep on growing. This means that full pre-caching may not scale for this scenario.

Partial caching and No caching:
I don't really know how SSIS works in these modes. I can make my guesses but it will be great if someone can explain those. The behavior that I see when I use partial cache of varying sizes or when I use no cache at all is that SSIS loads one buffer size of rows (6040) from the source and processes them. Then it loads next 6040 rows from the source. Ofcourse, due to cache limit, processing slows down considerably. Problem with this approach is that the source can have anywhere from 300,000 records to 2 million records. Processing them at 6040 rows at a time at the reduced speed due to caching, takes a long long time. Ofcourse, the system doesn't cripple. But the duration to execute the same package gets magnified a lot (I have not collected statistics yet but it is close to order of 10 i.e. 5 mins with full pre-caching become 40-50 mins or maybe more).
Any suggestions? Ideas? Experience with partial cache?

thanks,
NiteshNitesh,

Partial and no cache are very similar. In no cache the lookup makes a round trip to the database for each lookup. For partial case the lookup sees if the value to lookup is in the case, if so then it uses it. If not then it makes a round trip to the database and adds that record to the cache. If the cache is full then LRU is used to remove an entry so this new one can be added.

So it seems you have several options to evaluate:

1. Order your data so that the lookup hits the cache a lot.
2. Create a temp lookup table that contains the most frequently hit lookups (if this is indeed the case in your data sets) and do a full cache on this, then use an error output from that lookup to go to a partial or no cache lookup for the ones that don't hit in the full cached frequent items lookup.
3. Use a MergeJoin as an alternative as MergeJoin can be used to do a lookup. Generally it is not as performant due to the lookup's caching but it may be if you are using partial or no cache due to memory constraints.

HTH,
Matt|||Of course you must bear in mind that partial and no cache options do not join correctly if there are any null values :(
(This is a bug that is designed in, and documented as so.)
|||I've found that by writing the sql to select the fileds you require rather than checking the boxes from the whole list of available fields for the table the lookup cahces a lot less data. It appears that even if you only select a couple of columns from say a 10 column table i.e 1 filed to be returned and one field for the join (selected from the right table) the lookup still pulls back the whole table rather than just the selected fields it requires.

Therefore if you write the select to pull back only the fields that your require, for example it could be a key field for the join and an email address to be returned by the lookup. Then when you go to the second tab you will only see these two fields in the table on the right side. You will find that your lookup runs a lot quicker and consumes a lot less memory|||How does one turn on "partial caching" for a lookup transformation?
|||

Set CacheType="Partial"

CacheType is a property of the LOOKUP transform.

-Jamie

|||Ah, yes. I was looking for it on the Edit dialog, not the property page. I sometimes forget to look in both places.

Thanks,
Dan

1 comment:

Blogger said...: YoBit lets you to claim FREE CRYPTO-COINS from over 100 unique crypto-currencies, you complete a captcha one time and claim as much as coins you want from the available offers.

After you make about 20-30 claims, you complete the captcha and resume claiming.

You can press claim as much as 50 times per one captcha.

The coins will held in your account, and you can convert them to Bitcoins or any other currency you want.; August 14, 2017 at 11:30 AM

Friday, March 9, 2012

Lookup - full caching vs partial caching vs no caching

1 comment:

Post a Comment

query equivalent

Blog Archive

About Me