Monday, March 26, 2012

Looping through rows in flat file

I'd appreciate some guidance on how to solve a problem. I have a flat file of the format

Header1
Header2
Header3
Data Record
Data Record
... etc. ...
Data Record
Trailer1
Trailer2

I want to loop through each data record. I want to take a column value (string) from each row, and call a stored procedure passing in that value. The s/p will return another string which I want to use to update the original column value from the flat file. If I can't update the original column value I'm happy to add a derived column.

Once I've done this "lookup" for all records, I want to re-create the flat file again with the new column values.

I don't need a detailed solution ... just some pointers on how to go about it.

Greg.

On the surface it looks like you should be able to do this (i.e. execute a sproc) using the OLE DB Command transform however I don't think this will work because you want the return value from the sproc put into the pipeline - which you cannot do with the OLE DB Command transform.

Can you not use the LOOKUP transform to get the new values? If all the sproc does is lookup the input value in a mapping table then LOOKUP transform will do the job.

If not - this will be a little harder. But let's eliminate (or otherwise) the option of using the LOOKUP transform first.

-Jamie

|||The sproc parses the incoming string to attempt to extract two distinct values from it. Each of these values are primary key columns in separate tables, both of which are needed to obtain the a pk value from another table, so it's not a straighfoward lookup.

I guess I could do the lookup using custom SQL in the Lookup transform? and re-write the parsing of the input column as a Script Component (I've already done it in C# so should be simple to port to VB.NET)

Apologies as it's a bit of a muddle, but just trying to get my head around a decent solution.

(I've already done this in BizTalk but am looking to use SSIS instead - it's a pity SSIS support for the Web Service tasks is so poor otherwise it would be simpler).

Thanks,

Greg.

|||Did you think about using of staging table? In this scenario you would load the file to a temp (staging table) 1 to 1. Then apply a stored procedure to a temp table that wouold transfer only needed rows to the production table...|||

Custom SQL in the LOOKUP is a good way to go. In fact, best practice states that this is the best thing to do because if you simply select a table you will be selecting unused data into your LOOKUP cache - VERY BAD.

You could probably do the parsing in a derived column component - no need to resort to code (unless you really want to of course). If you need to derive some values in order to do the LOOKUP then there is nothing at all wrong with doing this in the pipeline - that's what the pipeline is for.

Anyway...anything that enables you to use the LOOKUP is definately the way to go otherwise you'll have to hack around it and that isn't nice.

Incidentally, I agree that the Web Services Task is basically useless. I am currently working on a project where we are consuming web services in SSIS left right and centre and we haven't used the Web Services Task once - it is all done in script. Donald Farmer's book contains an example of how to do this.

-Jamie

|||I'd like to avoid a staging table here as these files really only "pass" the database on their way somewhere else, doing a lookup on the way.

Thanks Jamie. I'll have a look at derived columns. Last question for the moment is,

If I have a Script Component and define an InputColumn with Usage Type of Read/Write ... how do I write to it? Do I need to do it in script using Row.ColumnName = "whatever value"?

Greg.

|||

Jamie,

you can get values from a sproc back to the pipeline using OUTPUT parameters... That shouldn't be any problem...

However I aggree that using lookups (perhaps even some lookups behing eachother) will be the better solution...

|||

Thomas Pagel wrote:

Jamie,

you can get values from a sproc back to the pipeline using OUTPUT parameters... That shouldn't be any problem...

However I aggree that using lookups (perhaps even some lookups behing eachother) will be the better solution...

Ah, thanks Thomas. Care to post a demo on your blog? :)

|||

GregAbd wrote:


If I have a Script Component and define an InputColumn with Usage Type of Read/Write ... how do I write to it? Do I need to do it in script using Row.ColumnName = "whatever value"?

Yeah, basically. There's loads of demo code about for doing this. Here's some from my own page - you'll find stuff on Kirk's blog, Ash's blog, SQLIS.com amongst others as well I'm sure:

http://blogs.conchango.com/jamiethomson/archive/2005/07/25/1841.aspx

http://blogs.conchango.com/jamiethomson/archive/2005/04/09/1265.aspx

-Jamie

|||

Jamie,

here you are: http://sqljunkies.com/WebLog/tpagel/archive/2006/01/03/17744.aspx

|||

BTW, in SP1, the Web Services task gets dynamic. :)

K

|||Thanks for all the help so far folks. In my data flow, I've got my flat file coming in. This goes into a script component which takes one column (string), parses it (complicated) and sets two variables in the PostExecute method. This then goes into a lookup.

How can I use those variables in the custom SQL window? I know I can build a custom SQL query in a variable as an expression and use that in the OLE DB Command, but this doesn't seem possible in the Lookup

FWIW, my lookup is a three table join along the lines of

SELECT FieldB FROM A, B, C
WHERE A.FieldA = Variable1
AND C.FieldC = Variable2
AND A.PK = B.FK1
AND C.PK = B.FK2

Greg.
|||

Greg,

I guess you didn't understand the concept of the Lookup, yet. The lookup takes a query (when the package starts) and caches the result of that query (by default, you can change that...). In your dataflow you define one field (or many) from the pipeline to be matched with the same number of fields in the cache. If there is a match you'll get the other fields of the found record in return.

So you don't have to pass the variables of each record to the lookup. What you have to do is that you store the results of the script not in variables but in new fields in the pipeline. Then you have to match these fields with the result of the complete query in the lookup (so the lookup doesn't have one record but all possible records but you get only the fields of the matching record back to the pipeline).

HTH

|||I was just in the process of realising that when you posted Thomas. It's much clearer now.

SSIS is a pretty steep learning curve in terms of understanding what it's capable of but it's a pretty cool tool to use.

Plenty more questions to come.

Greg.

No comments:

Post a Comment