query equivalent: importing

Showing posts with label importing. Show all posts

Monday, March 12, 2012

Lookup on input

I need to validate my input rows. The row is valid if there exist some other input rows in the same table (I am importing data from excel and access). I'll give an example to make everything clear:

Input table boys has following columns:First_Name ,Surname and Date_of_birth.

Output table is Twin_Triple_More_Brothers. I would like to insert into this table only boys that surnames are equal and difference in date of birth is less then one day.

I was thinking about lookup component, but I cannot use it in that way (or I just do not know how).

Maybe someone has an idea how to do this?
Thanks for help.

You could try something like this...

-Send your source to a lookup with the lookup query on the surnames and the date of birth

-Send error rows of this lookup to another lookup with the lookup query on the surnames and the date of birth-1

-Send the successfull lookups from the two lookup components to a union all component

-Send the output of the union all component to your destination

|||Use a lookup to match on Surname. Then return Date_of_birth from the lookup table.

Next, hook up to a derived column to calculate the difference between the input Date_of_birth and the lookup Date_of_birth.

Hook that up to a conditional split, where you test the calculation from above. If it's within your parameters, send to the insert path, otherwise do nothing.|||But how can I lookup in input? I know how to search in other table, but I do not know how to search in input columns.
Again thanks for help.
|||

Piotr Stapp wrote:
But how can I lookup in input? I know how to search in other table, but I do not know how to search in input columns.
Again thanks for help.

Create an initial data flow that simply takes your input and loads it to a table. Then in your second data flow, you can use an OLE DB source hooked up to that staging table, having a lookup pointing to the same table. You might have to work through the matching logic a bit, but it could work for you.|||So the only way is to use temporary table?
Maybe it is not as good as I think, but it is also not so bad.
|||You could do this in SQL, probably, but this isn't the forum for that... Wink

lookup and commit; How it works

Lookup and commit; How it works
I am importing data from text file and I have data lookup to check reference table.

If the reference row doesn't exist I want to add row to reference table, other wise row is added to the detail table.

I am using oledb destination to saving reference table row with rows per batch to 1 and maximum insert commit size to 1.

When I run the package duplicate rows are in grid view. How can duplicates end in there the commit size is 1? Next time the data exists in reference table and should be going to detail not to reference table.

Funny but this was just answered in another thread about the lookup cache. The dataflow works on buffers not rows so the lookup transform gets a whole buffer to process at one time. It then looks up every row in this buffer and after that passes it on downstream. So any row with the same key in that buffer will be looked up in the same manner and sent to the same output for processing. No setting on another transform can change this. Furthermore, depending on the cache type the lookup will load all its cache into memory (full and partial cache) so it will never (or not reliably) be updated by any changes to the underlying data.

HTH,

Matt

|||Well, I can guess that. How to fix this? add another transform? where? after lookup?|||

Well, you can't fix it. This is the design of the dataflow. You can try some workarounds such as setting the lookup to no cache mode or the dataflow's max rows to a low number (note that setting it to 1 doesn't work because the dataflow does rounding in order to not waste memory). However, none of these are certain to work in all cases. The only option is to write your own custom or script component that has logic to circumvent the buffering.

Matt

|||

HI, I have similar problem. Setting lookup to no cache or the dataflow to low number of max row did not work at all. I ended up building my own asynchronous sript component in order to achieve what the lookup or SCD wizard should be able to do.

Ccote

|||

We are currently investigating a ton of ideas around lookup, and this pattern is one we are taking into consideration.

However, I do want to point out that adding such a pattern requires us to add more logic to the lookup component and we are somewhat cautious about that. SSIS components are typically very tightly scoped in their functionality - design patterns are built with several different smaller components rather than in one more complex "catch all component." Lookup us an example of a component that does one fairly isolated function - complete integration patterns are built around it using other components.

The pattern we are dicussing here is actually more complicated than might first be thought - especially when one looks at the possibility of errors creeping in through system problems or bad data. (I have never in my career seen a data integration process where unexpected errors of all sorts did not creep in, so I tend to be cheerfully pessimistic in my designs.)

I'll recap the current requirement as follows, to make it clear for other readers:

We want to lookup the key from an incoming record in a reference table.|||

Great!!! It's really helpful. I have broken the process into two steps; first step add to reference table and next step lookup works(the way it should).

I can understand the import will be slow if every row is commited before look up, but there should be option, sometime it's should work logically and of course slow is not a word these days with powerful machines.

Thanks again Donald for your response .

Saturday, February 25, 2012

Looking for ideas...

I have a text file that I am importing in a data flow task. Not all of the rows are the same, there are "header" rows that contain a company code.

What I want to do is keep that company code and append that to the row of text that I am writing to a CSV file.

Since you cannot change variables until the post execute, what are my options?

Hope that's clear

Thanks!

BobP

Are you writing a custom script component to handle this? You talk of not being able to change variables until post execute, just handle the locking yourself and you can as illustrated here - http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=956181&SiteID=1|||From your description, I assume you're trying to denormalize a multi-formatted text file by moving data from header rows to detail rows.

Since your rows are not all the same, set your text file source component to read each line of the source file as a single column. Then write a custom script transformation component to consume that column, parse the data by row type, and output the individual columns that will be consumed by your destination (including the header columns). When the transformation encounters one of these header rows, put the parsed data (Company Code) into global variable(s) in the transformation component, and discard or redirect the header row. For each detail row, create a new output row and populate the "detail" columns from the parsed input data and the "header" column(s) (i.e. Company Code) from the the local variable(s). This way your header data is remembered for each subsequent detail row until another header is encountered.

Hope that helps.

Monday, February 20, 2012

Looking for easy way to update w/o SQL statements

My manager is interested in knowing if there is a way to update our website's SQL database using a method with excel, similar to importing.

The person who was previously in my position had imported a few hundred new products into the database with an excel spreadsheet.

Now, we would like to make updates such as a price changes or similar adjustments to a number of the products in the database. We could use a web interface, but ours requires us to find each product individually and it takes too much time. I told him that it would probably be necessary to write an SQL statement to update the tables, but we're also interested in maintaining the integrity of the database and are worried about loosing data due to a typo. Is it possible to export the db contents to an excel file, make changes, and then merge those changes into the existing database? I have tried and failed, so I am wondering if any experienced users could help me out.

Also, is there some kind of phpmyadmin for MS SQL? A free, open source alternative would be best.Create a view/function/stored procedure that will retrieve the data with desired values, bcp the data out using this object, truncate the original table, and bulk insert the data back in.|||Better check relational integrity issues first.|||create view/function/stored procedure
bcp...queryout...
alter table <table_name> nocheck constraint all
truncate table <table_name>
bulk insert...
alter table <table_name> check constraint all|||Is this in SQL Server Manger or some similar software? I guess what I'm saying is that I'm inexperienced with Microsoft SQL Server and it seems like your directions are too vague. Thanks for helping me though.|||Is the Data in Excel?

I would DTS the data to a staging table, write a sproc that performed audits on the file, and if it passed all the test do an insert

MOO|||Originally posted by Brett Kaiser
Is the Data in Excel?

I would DTS the data to a staging table, write a sproc that performed audits on the file, and if it passed all the test do an insert

MOO

The data is stored on the webserver, but we have tried exporting to excel and when I imported the changes a new table was made or something. It was kind of confusing to work with the Manager.|||Originally posted by arosboro
It was kind of confusing to work with the Manager.

That's funny...it's always confusing to work with a manager...ooooh you probably meant Enterprise Manager...

EM for short...

And DTS...

Is the data fixed width or delimited?|||One of the things you need to worry about if you export your data, modify it, and then load it back into the database is whether any relational integrity is established between your table and other database tables, either logically or physically. You could potentially screw up your application if you are not carefull.

The best way to modify data is, of course, through the interface, but failing that you really need to do your modifications within SQL Server, using SQL statements. If you are going to be using a SQL Server database then you or somebody else in your office needs to learn some of the basics, and what you are trying to do is probably not very complex. I am afraid that you are going to create and very complicated and fragile solution to this problem involving mutiple steps and technologies, when the issue could be resolved easily within SQL Server.