Monday, March 12, 2012

Lookup Functionality

Hi,

I have 3 tables that i gather a single field from each and put them into another table.. simple enough you might think

but when i start using a lookup funciton to check if i have already entered an item (this package could be run a number of times) some duplicates slip though, and as i continue to run the pakcage less and less appear, but never reaching 0

very strange..

doing select statements on the target table reveal strange behaviour.. after the first 'iteration' there are only unique entries in the target table (exactly what i want). After the 2nd, 3rd, 4th etc duplicates appear of upper and lower case

i can only assume that there is something wrong with my query,,, but running it in SQL Manager reveals ~40,000 rows no matter how many times i run it...

more magically appear when hte same statement is ran in SSIS!!

I have tried a view, and using test tables (rather the live ones i am working on) and the reuslt is the same...

any help would be muc happreciated

regards

Chris

You might want to make sure you are converting the values for the Lookup and the value you are matching on to UPPERCASE to avoid any case mismatches.

|||

Hi, thanks for your post.. i have already put it into uppercase, but this has not solved the problem completely..

i still don't understand how the query the first time returns all the values as it should, and when it is re-ran more magically appear but in different cases (there is also an issue with foreign text, but thats something different)

one of my tables is nvarchar, and to accomodate this, the target table for all the entries is nvarchar therefore needing the items entered into it to be converted from varchar

could this be causing the problem perhaps? something to do with unicode / non-unicode?

cheers

Chris

|||

Is the problem only in the Source adapter? If you add a RowCount immediately after the Source, is it returning a variable number of rows?

|||

Hi,

The source returns the same number of rows each time, but for some reason is failing to match them up when doing a lookup and thus insertng duplicates...

one such example is:

select * from Snowflake.DimCity

where city = 'zweibrücken'

zweibrucken 2007-08-01 09:33:00

zweibrücken 2007-08-01 09:34:00

how is this discrepancy being missed first time around?!?

my SELECT statement converts all the city fields to nvarchar on select... however i have to do this becasue some of the source tables are nvarchar and some arent, and SSIS throws errors if i don't convert ;(

help help

|||

You are aware that the lookup caches all of its data at the beginning of the data flow, aren't you? So, by default, it can't match duplicates on rows that you are inserting in the same data flow? You can work around this by using an aggregate transform to eliminate the duplicates, or by disabling caching on the lookup (found on the advanced tab), which forces it to query the database for each row. Unfortunately, disabling caching makes it run slower, and doesn't guarantee you won't get duplicates, because of the way rows are handled in batches.

Also, I am fairly positive that because "u" and "ü" are two different characters, the lookup will not match them. If you looked at the actual bytes making up those two strings, they would be different. You might need to use the Fuzzy Lookup to do your matching in this scenario.

|||Hi,

I have solved this issue now, I converted all the fields in use to nvarchar in the query.. and then the collation of the fileds used in the database to Latin1_General_BIN2

From SQL Server Developer Centre:

"Sorts and compares data in SQL Server tables based on Unicode code points for Unicode data. For non-Unicode data, Binary-code point will use comparisons identical to binary sorts.

The advantage of using a Binary-code point sort order is that no data resorting is required in applications that compare sorted SQL Server data. As a result, a Binary-code point sort order provides simpler application development and possible performance increases."

This is basically what you said in your last post jwelch, comparing the acutal bytes and all seems to be well now!! Smile I no longer have spontaneuously appearing entreis when i run the SSIS package.

Thank you

Chris

No comments:

Post a Comment