Showing posts with label rows. Show all posts
Showing posts with label rows. Show all posts

Friday, March 30, 2012

Loss of records

Hi, I have the following problem
In a flat file I have 250000 rows when I them go on to the DB only 249995 come, 5 got lost

Not where it can spend the mistake the loggind does not say anything of mistake
Not which can be the reason of the problem
If someone save for that it can be spending this?

helps please.

If you execute the package in BIDS you see how many rows are output from each component. This should make it very easy to see where the rows are being lost from.

-Jamie

Losing rows from file to destination table - need troubleshooting help

I am losing data from time to time and can not figure out where the rows are vanishing to. In essence, I have 5 files that I process. The processing occurs on a daily basis by one SQLAgent job that calls 5 individual packages, each doing a different piece of work.

I stage each file to a temp table, then perform some minor transformations, then union the rowsets from each file before landing to another temporary table. A subsequent package performs additional transformations that apply to the entire dataset, before inserting them to their final destination. Along the way, I reject some records based on the business rules applied. Each package in the entire Job is logging(OnError, TaskFailed, Pre and Post execute). Their are no errors being generated. No rows are being rejected to my reject tables either.

Without getting into the specific transforms, etc. being used in this complex process, has anyone seen similar unexplained behaviour? I've not been able to identify any pattern, except that it is usually only 1 or 2 of the record types (specific to a source file), that will ever not be loaded. No patterns around volumes for specific source files. There are some lookups and dedupes, however I've seen the records 'drop off' before reaching these transforms. I have noticed I had my final desination load not using fast load. However sometimes the records disappear before even getting to my final staging table which is inserting using fast load. I am going to turn on logging the pipelinerowssent event. Any other suggestions for troubleshooting/tracking down these disappearing records?

Thanks

Couple more clarifications:

I have run the same files thru manually in debug mode to find that I can watch all the rows thru the entire process.

We have seen strange behaviour in running packages as scheduled jobs thru SQLAgent

Utilizes unions which seem a bit klunky.

|||

Joe,

We have seen similar problems on my current project so yesterday we turned on OnPipelineRowsSent logging.

Another thing we have done is output the data from each transform component to a file for later examination. The MULTICAST transform is invaluable in order to do this.

as yet we haven't found out what is going on. Its strange.

-Jamie

|||

I'm not sure I would use the adjective strange, but..

In trying to troubleshoot this process, I first changed the union tranform that was taking 6 input streams, and busted it out to 5 individual waterfalling unions each with 2 input streams. No change in behaviour.

I then changed the package that moves this data, by adding in multicasts to output to a file after every transform along the way up to the final destination, after the 5 unions. Just by adding the multicasts into the flow has resulted in no rows vanishing for the daily loads for the past week. Unfortunately, I don't have time to really troubleshoot further, but I think that this demonstrates that there is indeed a serious bug here. I still suspect it has to do with the union transform. I am quite scared for anyone else's shop that has decided to standardize ETL to this tool, as we have. As developers, we have only time to test our code, not testing that the native tool functionality is behaving as expected. In addition ,to have to monitor on a regular basis that it is performing properly, is not acceptable.

Hoping this problem magically went away with SP1....

JH

Losing my margin!?

I am designing a report to output address labels (Avery 5160) and the
spacing of the data in the columns and the rows descrease as the print moves
down the page. By the final row of labels, the name line is in the row
preceeding row. I am using a list object and the margin settings are per
Avery's spec sheet.
Anyone else experience this and have a fix?
Thanks,
AndyWhat rendering output are you using? My guess is you'll have your best luck
with PDF or TIFF. HTML is pretty non-deterministic.
--
Cheers,
'(' Jeff A. Stucker
\
Business Intelligence
www.criadvantage.com
---
"Andrew King" <acking@.cal.ameren.com> wrote in message
news:%23DfU0Z0IFHA.1176@.TK2MSFTNGP15.phx.gbl...
>I am designing a report to output address labels (Avery 5160) and the
>spacing of the data in the columns and the rows descrease as the print
>moves down the page. By the final row of labels, the name line is in the
>row preceeding row. I am using a list object and the margin settings are
>per Avery's spec sheet.
> Anyone else experience this and have a fix?
> Thanks,
> Andy
>|||I am using PDF. I have also tried to fix the size of the fields; unchecked
the "Can increase to accommodate contents".
Andy
"Jeff A. Stucker" <jeff@.mobilize.net> wrote in message
news:%23869bC3IFHA.4028@.tk2msftngp13.phx.gbl...
> What rendering output are you using? My guess is you'll have your best
> luck with PDF or TIFF. HTML is pretty non-deterministic.
> --
> Cheers,
> '(' Jeff A. Stucker
> \
> Business Intelligence
> www.criadvantage.com
> ---
> "Andrew King" <acking@.cal.ameren.com> wrote in message
> news:%23DfU0Z0IFHA.1176@.TK2MSFTNGP15.phx.gbl...
>>I am designing a report to output address labels (Avery 5160) and the
>>spacing of the data in the columns and the rows descrease as the print
>>moves down the page. By the final row of labels, the name line is in the
>>row preceeding row. I am using a list object and the margin settings are
>>per Avery's spec sheet.
>> Anyone else experience this and have a fix?
>> Thanks,
>> Andy
>|||Solved! Placed the list object inside a rectangle to fix the size of the
label and removed the right and bottom padding from the field.
"Andrew King" <acking@.cal.ameren.com> wrote in message
news:e4hyBC$IFHA.2844@.TK2MSFTNGP10.phx.gbl...
>I am using PDF. I have also tried to fix the size of the fields; unchecked
>the "Can increase to accommodate contents".
> Andy
> "Jeff A. Stucker" <jeff@.mobilize.net> wrote in message
> news:%23869bC3IFHA.4028@.tk2msftngp13.phx.gbl...
>> What rendering output are you using? My guess is you'll have your best
>> luck with PDF or TIFF. HTML is pretty non-deterministic.
>> --
>> Cheers,
>> '(' Jeff A. Stucker
>> \
>> Business Intelligence
>> www.criadvantage.com
>> ---
>> "Andrew King" <acking@.cal.ameren.com> wrote in message
>> news:%23DfU0Z0IFHA.1176@.TK2MSFTNGP15.phx.gbl...
>>I am designing a report to output address labels (Avery 5160) and the
>>spacing of the data in the columns and the rows descrease as the print
>>moves down the page. By the final row of labels, the name line is in the
>>row preceeding row. I am using a list object and the margin settings are
>>per Avery's spec sheet.
>> Anyone else experience this and have a fix?
>> Thanks,
>> Andy
>>
>

Wednesday, March 28, 2012

lopp for insert into

Hello,

I have 10 tables (T1Orj,T2Orj,…T10Orj) and I need to find modified rows from each table and insert them to T1Bak, T2Bak, …T10Bak. Although original and bak tables have the common fields , the original tables have more fields than bak tables, and T1Orj, T2Orj, … T10Orj tables have different table structures.

I can go head and write a insert into query for each table, I am just wondering Is there any way I can do this in a loop for all tables?

You can use a dataadapter to do this:DataSet ds =newDataSet();

SqlDataAdapter adp =newSqlDataAdapter("select * from authors; select * from sales","server=srv;Integrated Security=true;database=pubs");

adp.Fill(ds);

In this case the dataset ends up with two tables,
one for authors and one for sales.

If you want you can then go and rename them, as they end up with names like "Table", "Table1", "Table2", etc.

but they come out in the order you put in ....

then you can do

Monday, March 26, 2012

Looping through rows in flat file

I'd appreciate some guidance on how to solve a problem. I have a flat file of the format

Header1
Header2
Header3
Data Record
Data Record
... etc. ...
Data Record
Trailer1
Trailer2

I want to loop through each data record. I want to take a column value (string) from each row, and call a stored procedure passing in that value. The s/p will return another string which I want to use to update the original column value from the flat file. If I can't update the original column value I'm happy to add a derived column.

Once I've done this "lookup" for all records, I want to re-create the flat file again with the new column values.

I don't need a detailed solution ... just some pointers on how to go about it.

Greg.

On the surface it looks like you should be able to do this (i.e. execute a sproc) using the OLE DB Command transform however I don't think this will work because you want the return value from the sproc put into the pipeline - which you cannot do with the OLE DB Command transform.

Can you not use the LOOKUP transform to get the new values? If all the sproc does is lookup the input value in a mapping table then LOOKUP transform will do the job.

If not - this will be a little harder. But let's eliminate (or otherwise) the option of using the LOOKUP transform first.

-Jamie

|||The sproc parses the incoming string to attempt to extract two distinct values from it. Each of these values are primary key columns in separate tables, both of which are needed to obtain the a pk value from another table, so it's not a straighfoward lookup.

I guess I could do the lookup using custom SQL in the Lookup transform? and re-write the parsing of the input column as a Script Component (I've already done it in C# so should be simple to port to VB.NET)

Apologies as it's a bit of a muddle, but just trying to get my head around a decent solution.

(I've already done this in BizTalk but am looking to use SSIS instead - it's a pity SSIS support for the Web Service tasks is so poor otherwise it would be simpler).

Thanks,

Greg.

|||Did you think about using of staging table? In this scenario you would load the file to a temp (staging table) 1 to 1. Then apply a stored procedure to a temp table that wouold transfer only needed rows to the production table...|||

Custom SQL in the LOOKUP is a good way to go. In fact, best practice states that this is the best thing to do because if you simply select a table you will be selecting unused data into your LOOKUP cache - VERY BAD.

You could probably do the parsing in a derived column component - no need to resort to code (unless you really want to of course). If you need to derive some values in order to do the LOOKUP then there is nothing at all wrong with doing this in the pipeline - that's what the pipeline is for.

Anyway...anything that enables you to use the LOOKUP is definately the way to go otherwise you'll have to hack around it and that isn't nice.

Incidentally, I agree that the Web Services Task is basically useless. I am currently working on a project where we are consuming web services in SSIS left right and centre and we haven't used the Web Services Task once - it is all done in script. Donald Farmer's book contains an example of how to do this.

-Jamie

|||I'd like to avoid a staging table here as these files really only "pass" the database on their way somewhere else, doing a lookup on the way.

Thanks Jamie. I'll have a look at derived columns. Last question for the moment is,

If I have a Script Component and define an InputColumn with Usage Type of Read/Write ... how do I write to it? Do I need to do it in script using Row.ColumnName = "whatever value"?

Greg.

|||

Jamie,

you can get values from a sproc back to the pipeline using OUTPUT parameters... That shouldn't be any problem...

However I aggree that using lookups (perhaps even some lookups behing eachother) will be the better solution...

|||

Thomas Pagel wrote:

Jamie,

you can get values from a sproc back to the pipeline using OUTPUT parameters... That shouldn't be any problem...

However I aggree that using lookups (perhaps even some lookups behing eachother) will be the better solution...

Ah, thanks Thomas. Care to post a demo on your blog? :)

|||

GregAbd wrote:


If I have a Script Component and define an InputColumn with Usage Type of Read/Write ... how do I write to it? Do I need to do it in script using Row.ColumnName = "whatever value"?

Yeah, basically. There's loads of demo code about for doing this. Here's some from my own page - you'll find stuff on Kirk's blog, Ash's blog, SQLIS.com amongst others as well I'm sure:

http://blogs.conchango.com/jamiethomson/archive/2005/07/25/1841.aspx

http://blogs.conchango.com/jamiethomson/archive/2005/04/09/1265.aspx

-Jamie

|||

Jamie,

here you are: http://sqljunkies.com/WebLog/tpagel/archive/2006/01/03/17744.aspx

|||

BTW, in SP1, the Web Services task gets dynamic. :)

K

|||Thanks for all the help so far folks. In my data flow, I've got my flat file coming in. This goes into a script component which takes one column (string), parses it (complicated) and sets two variables in the PostExecute method. This then goes into a lookup.

How can I use those variables in the custom SQL window? I know I can build a custom SQL query in a variable as an expression and use that in the OLE DB Command, but this doesn't seem possible in the Lookup

FWIW, my lookup is a three table join along the lines of

SELECT FieldB FROM A, B, C
WHERE A.FieldA = Variable1
AND C.FieldC = Variable2
AND A.PK = B.FK1
AND C.PK = B.FK2

Greg.
|||

Greg,

I guess you didn't understand the concept of the Lookup, yet. The lookup takes a query (when the package starts) and caches the result of that query (by default, you can change that...). In your dataflow you define one field (or many) from the pipeline to be matched with the same number of fields in the cache. If there is a match you'll get the other fields of the found record in return.

So you don't have to pass the variables of each record to the lookup. What you have to do is that you store the results of the script not in variables but in new fields in the pipeline. Then you have to match these fields with the result of the complete query in the lookup (so the lookup doesn't have one record but all possible records but you get only the fields of the matching record back to the pipeline).

HTH

|||I was just in the process of realising that when you posted Thomas. It's much clearer now.

SSIS is a pretty steep learning curve in terms of understanding what it's capable of but it's a pretty cool tool to use.

Plenty more questions to come.

Greg.

looping through recordset

hello,
i have a select query that returns multiple rows (within a cursor). How do i loop through the rows to process it (in a stored proc)? I donot want to use nested cursors. a code sample is requested.
thanks!Usually a cursor is created to process multiple rows (unless you developed a passion for cursors and would like to create one for 1 row).

Have you looked at FETCH NEXT?|||Well, the situation is somethin like this. I am using a cursor that returns multiple rows, within the while @.@.FETCH_STATUS = 0, I have another sql select that returns multiple rows for every row the cursor is processing. I would want to process every row returned in the inner select query. Do I have an option other than using nested cursors? Any help is appreciated.|||If you post the code maybe we can get rid of the cursor.

looping through query result column and PRINT to log file....

i am creating a log file, and at 1 point I record how many rows are deleted after an update process.. I want to also record which productID are actually deleted... how would I write that?!

Basically I need know how I would get the list, and I am using PRINT command... I need to somehow write a loop that works through the list and PRINTS to the file...

Thanks in advanceI got it :)

For any1 who might be interested heres how its done

DECLARE @.NbrList VarChar(300)
SELECT @.NbrList = COALESCE(@.NbrList + ', ', '') + CAST(Invoice AS varchar(30))
from TRANSACTIONS where ProductID = 'CVSDBN'
PRINT @.NbrList

Thanks ;)sql

Looping rows to Increment a value

I currently have a table that contains a Serial Number and a quantity.

SN Qty
4352301 3
6892103 2

I need to be able to loop through this table and increment each unique Serial Number by the quantity. The desired result set is:

4352301
4352302
4302303
6892103
6892104

I can pull off a Do While loop in ASP however I am having trouble with the TSQL syntax. Any help would be GREAT !!

joeUPDATE table SET serial = serial + quantity ?|||Something like this?

Code:
-------------------------------------
create table #tmp(sn int, qty int)
insert into #tmp values(4352301,3)
insert into #tmp values(6892103,2)
select * From #tmp
declare @.sn int, @.qty int, @.cntr int
select @.sn = min(sn) from #tmp
while (@.sn is not null) begin
select @.qty = qty, @.cntr = 1 from #tmp where sn = @.sn
while (@.cntr < @.qty) begin
select @.sn = @.sn + 1, @.cntr = @.cntr + 1
insert into #tmp values(@.sn,@.qty)
end
select @.sn = min(sn) from #tmp where sn > @.sn
end
select * from #tmp order by sn
-------------------------------------

Results:
-------------------------------------
sn qty
---- ----
4352301 3
6892103 2

sn qty
---- ----
4352301 3
4352302 3
4352303 3
6892103 2
6892104 2
-------------------------------------

Wednesday, March 21, 2012

loop for insert into

Hello,

I have 10 tables (T1Orj,T2Orj,…T10Orj) and I need to find modified rows from each table and insert them to T1Bak, T2Bak, …T10Bak. Although original and bak tables have the common fields , the original tables have more fields than bak tables, and T1Orj, T2Orj, … T10Orj tables have different table structures.

I can go head and write a insert into query for each table, I am just wondering Is there any way I can do this in a loop for all tables?

You can use EXEC to ececute a dynamically constructed SQL statement that would run your query on every table from a list.|||That sounds to me like a job for a trigger, inserting the old data into a backup table at the time the delete / update is done. Otherwise if this is only a single execution job you should go the way the other poster mentioned, although this is based on dynamic SQL which should be avoided in this cases.

HTH, Jens Suessmeyer.

http://www.sqlserver2005.de

|||

Not really. You can use information_schema.columns to generate these statements, but there is no real easy way to do it other than actual compiled statement.

What purpose is this? If you want to do this offline (and not with a trigger as also suggested), the easiest way to do this would be to add a rowversion(timestamp) column to the Orj tables and a binary(8) column to the bak tables (to hold the version of the timestamp in the bak table)

Then the statements would be:

--new rows
insert into <bak> (<bakColumns>)
select <orjColumnsThatBakHas>
from <orj>
where not exists (select *
from <bak>
where <orj>.key = <bak>.key)

--changed rows
update <bak>
set <bak>.col1 = <orj>.col1,
<bak>.col2 = <orj>.col2,
...
<bak>.colN = <orj>.colN,
from <bak>
join <orj>
on <org>.key = <bak>.key
where orj.rowversion <> <bak>.rowversionCopy
--or compare all columns if rowversion not a posibility

I also am guessing you don't care about deletes, but if you want to delete rows:

delete from <bak> (<bakColumns>)
where not exists (select *
from <orj>
where <orj>.key = <bak>.key)

Monday, March 19, 2012

Lookup Transformation

Is there a way to ignore the unmatched rows using Lookup without creating another destination (Error Transformation for red connector)?Sure thing. Once the mappings have been defined in the lookup, click on the "Configure Error Output..." button and set the Error for Lookup Output to "Ignore failure." Then don't hook up the red flow to anything. Just leave it be.|||

shafiqm wrote:

Is there a way to ignore the unmatched rows using Lookup without creating another destination (Error Transformation for red connector)?

Yes, Just use 'Ignore failure' in the error output...the unmatched rows will go into the green output having null in the looup columns

|||

Rafael Salas wrote:

shafiqm wrote:

Is there a way to ignore the unmatched rows using Lookup without creating another destination (Error Transformation for red connector)?

Yes, Just use 'Ignore failure' in the error output...the unmatched rows will go into the green output having null in the looup columns

And if you truly need to ignore the error (unmatched records) then use the red flow and add it to a row counter. That way, you can capture the number of rows and they'll end up being separated from the valid records.

Lookup task with NOLOCK

I have an SSIS package that unpivots data - each row from the source becomes 20-24 rows in the destination db. In the dataflow it looks up some type-data IDs before inserting into the destination table. The whole process flies through the data at an incredible rate, however I find that the final commit on data insertion takes a very long time, and two of the lookup tasks remain yellow throughout. It appears that there may be a contention issue because the final table has FKs to those lookup tables.

In T-SQL it is possible to do SELECT..... WITH (NOLOCK), but when I modified the SQL statement of the lookup task by adding this clause, it caused an error "Incorrect syntax near the keywork 'with'."

Is there any way around this problem? THNX!

PhilSky

You could split the data-flow into 2, using raw files to pass data between them. This would alleviate blocking problems (if indeed that is the issue).

Incidentally, I have a request for enhancements to the LOOKUP component that means this would be even easier. Feel free to vote for these enhancements here: http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=58d0100b-aa34-4191-ae03-7655543614a1 and leave a comment as well if you could.

-Jamie

|||I've tried that and it works fine what is your SQL statement you are trying to use. The other option is to change the transaction isolation level for the dataflow

Monday, March 12, 2012

Lookup on input

I need to validate my input rows. The row is valid if there exist some other input rows in the same table (I am importing data from excel and access). I'll give an example to make everything clear:

Input table boys has following columns:First_Name ,Surname and Date_of_birth.

Output table is Twin_Triple_More_Brothers. I would like to insert into this table only boys that surnames are equal and difference in date of birth is less then one day.

I was thinking about lookup component, but I cannot use it in that way (or I just do not know how).

Maybe someone has an idea how to do this?
Thanks for help.

You could try something like this...

-Send your source to a lookup with the lookup query on the surnames and the date of birth

-Send error rows of this lookup to another lookup with the lookup query on the surnames and the date of birth-1

-Send the successfull lookups from the two lookup components to a union all component

-Send the output of the union all component to your destination

|||Use a lookup to match on Surname. Then return Date_of_birth from the lookup table.

Next, hook up to a derived column to calculate the difference between the input Date_of_birth and the lookup Date_of_birth.

Hook that up to a conditional split, where you test the calculation from above. If it's within your parameters, send to the insert path, otherwise do nothing.|||But how can I lookup in input? I know how to search in other table, but I do not know how to search in input columns.
Again thanks for help.
|||

Piotr Stapp wrote:

But how can I lookup in input? I know how to search in other table, but I do not know how to search in input columns.
Again thanks for help.

Create an initial data flow that simply takes your input and loads it to a table. Then in your second data flow, you can use an OLE DB source hooked up to that staging table, having a lookup pointing to the same table. You might have to work through the matching logic a bit, but it could work for you.|||So the only way is to use temporary table?
Maybe it is not as good as I think, but it is also not so bad.
|||You could do this in SQL, probably, but this isn't the forum for that... Wink

Lookup current id in Kimball Type II dimension for fact rows

Hi all,

Maybe someone here can help me out: I have a Kimball type II dimension, where i track changes in a hierarchy. Each row has a RowStartDate and RowEndDate property to indicate from when to when a certain row should be used.

Now i want to load facts to that table. So each fact will have a certain date associated with it that i can use to lookup the right Id (a certain SourceId can have mulitiple integer Ids when there are historic changes) and then load the facts.

Is there a building block I can use for that? I could do this with SQL scripts but the client would prefer to have as much as possible done in SSIS. The Lookup transformation will only let me specify an equal (inner join where A=B) join, but i need equal for one column (SourceId) and then >= and <= (RowStart and RowEnd) to find the right row version.

Any ideas? Thanks in advance,

John

That question have been asked before. There are several works arround to that problem, here is one:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=406485&SiteID=1

You may want to try a search with 'SSIS range lookup'

|||

That's it, excellent! I was googling on the wrong words.

Thanks,

John

Lookup Component Stalls

i am using a lookup component to do a typical SCD. Compare the Natural keys and if they are the same -- REdirect the rows and do whatever, If not present -- means the Error Rows -- redirect and do whatever.

WHen I use the component to do a Historical Load (which means -- there are no rows are in the Destination table) and put the Memory to Partial Cache -- the Data Flow STalls after about 46,000 rows, it just doesnt complete after that. But the moment I switch it to Full Cache -- it flows -- But Partial is what I am supposed to be using -- keeping in mind -- the Incremental Loads. Why does the component stall ?

I had used Partial Cache in an earlier project -- with a 18 Million Row Table --(albeit for incremental load) and it worked fine (though is was slow -- but tleast it worked) -- but now I am trying to load just 300,000 rows but it stalls.

I am using a 2GB RAM machine -- and set the Memory to 750 MB/500 MB nothing worked

I tried two different machines -- same thing happened.

Any insight will be appreciated.

I am jut wondering why you said that the partial cache is required in the historical load. I always use (after making sure the server have enough memory available) full cache.

How is your LKup set up? you should be using the only required columns; you said this is for a SCD; so you should be uisng a query with the busines/natural key columns only (please provide the number of columns and data type of the required columns). Also, how is the source component set up? are you using 'fast load'?

|||

Hmmm so you use Full Cache always? If I leave it at Full Cache -- from next time onwards during Incremental Loads -- it will load all 290,000 rows into memory and possibly less than 10 rows have changed. I dont want to do that.

Yes I bring ONLY the required columns actually 4 columns NaturalKey, InferredMember, Hash1 and Hash2 -- all Integers

Yes I use Fast Load.

I use OLEDB for my Source.

|||

JaguarRDA wrote:

Hmmm so you use Full Cache always? If I leave it at Full Cache -- from next time onwards during Incremental Loads -- it will load all 290,000 rows into memory and possibly less than 10 rows have changed. I dont want to do that.

Yes I bring ONLY the required columns actually 4 columns NaturalKey, InferredMember, Hash1 and Hash2 -- all Integers

Yes I use Fast Load.

I use OLEDB for my Source.

An INT32 field, though (call it "natural key") will only take up 1.1MB of memory to store 290,000 rows. Not too bad, eh?|||

That is exactly my point. Even with 4 columns SSIS should be able to cache the whole result set in no time. I would recommned you to watch the progress tab in BIDS to check how long the lookup caching takes. Check also the task manager to see if there is any other process taking to much resources from the box. You can try to replace the OLE DB destination with a rowcount transformation just to test the transformation speed. You can remove the lookup and destination and use the same row count to mesaure the 'reading' speed from the source component.

Make sure you are using fast load in your OLE DB destination.

Lookup as Inner join ?

Hello all.....

I got a doubt whether we can use Lookup as an inner join.
If I have duplicate keys in lookup source....can I get the duplicate rows also into the destination.

Ex:
source:
c1 c2 c3

1 a1 b1
2 a2 b2
3 a3 b3
lookup table:
c1 c4 c5

1 c1 d1
2 c21 c22
2 c23 c24
3 c3 c3

noe the result should be:
c1 c2 c3 c4 c5
-- -- -- -- --
1 a1 b1 c1 d1
2 a2 b2 c21 c22
2 a2 b2 c23 c24
3 a3 b3 c3 c3

in msdn I read the look up will get the first match.....is it possible to get all matches?

Thank you

No, you can't get multiple matches from the Lookup. You should use the Merge Join component to do that.
|||

Dear friend,

use the merge Join, and you can specify LEFT, INNER or RIGHT JOIN!

Helped?

regards!

|||

yes I know about the merge join componant. but for that I need to sort the data before joining. and till now I used my source table which has 6 million records only once and with the multicast I generated all the neccessary fact tables. and to generated one more facts table I need to make an inner join with another table. that's why I asked whether there is any setting where we can get all the matches. I can also use the merge join here....and to sort this 6 M records in the middle of the package is taking all the RAM and time.

I can't sort them in the starting because this table is alredy sorted with some other columns for merge join

input.

anyways thank you for your help. If it's not possible with lookup...I need to do something else. I think I need to read the table again with needed keys sorted.

Thank you.

Saturday, February 25, 2012

looking for security guidelines to avoid sql injection risks with xquery

Hi, i hope this forum can help me clarify my security concerns for my scenario:

I want a user to be able to filter sql rows by using xquery on an untyped XML that is stored in a XMLDataType column. My table will have multiple rows/columns, with one column being XML.

I would like to get some guidance on how a user can specify their "filter" condition string using the xquery syntax without harding coding any dependency on the server side (client decides which items they are interested in), and I would like to know the guidelines on how to translate the xquery on the server to avoid any sql injection risks. For instance, perhaps i should double all single quotes in the given xquery string. Also, i am concerned about sql:variables that may be able to access private variables inside a stored proc, has this been an issue?

Thanks!!

All of the issues you bring up are concerns:

-SQL injection
-sql:column
-sql:variable
-very expensive queries can lead to Denial of Service attacks

XQuery is code, so it should be treated similarly to SELECT statements, in general it is difficult to safely execute untrusted queries.

Ways around this would be to:

-Use set of parameterized stored procedures that reasonably cover the likely set of queries users are going to execute.
-Give each user their own database and assume that they have full SELECT permissions on it. (This wont prevent Denial of Service type attacks, but could prevent users' from access other users' data in the event of SQL injection.)
-Use full-text search. The queries wont be as expressive, but it might be good enough to filter rows.

Looking for ideas...

I have a text file that I am importing in a data flow task. Not all of the rows are the same, there are "header" rows that contain a company code.

What I want to do is keep that company code and append that to the row of text that I am writing to a CSV file.

Since you cannot change variables until the post execute, what are my options?

Hope that's clear

Thanks!

BobP

Are you writing a custom script component to handle this? You talk of not being able to change variables until post execute, just handle the locking yourself and you can as illustrated here - http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=956181&SiteID=1|||From your description, I assume you're trying to denormalize a multi-formatted text file by moving data from header rows to detail rows.

Since your rows are not all the same, set your text file source component to read each line of the source file as a single column. Then write a custom script transformation component to consume that column, parse the data by row type, and output the individual columns that will be consumed by your destination (including the header columns). When the transformation encounters one of these header rows, put the parsed data (Company Code) into global variable(s) in the transformation component, and discard or redirect the header row. For each detail row, create a new output row and populate the "detail" columns from the parsed input data and the "header" column(s) (i.e. Company Code) from the the local variable(s). This way your header data is remembered for each subsequent detail row until another header is encountered.

Hope that helps.

Monday, February 20, 2012

Looking for help with an SQL query.

well, i want to select rows by date from a file. but I want
in particular one sum of values from the rows that fall WITHIN
a supplied date range, and a second sum of values from the
rows that have dates FOR ALL TIME UP TO the second date in the date range.

the former, by itself, might be:

SELECT id, value RangedValue
FROM myFile
WHERE date >= [lower date range value]
AND date <= [higher date range value]

and the latter, by itself, might be:

SELECT id, value AllTimeValue
FROM myFile
WHERE date <= [higher date range value]

but I need to grab the two separate sums (RangedValue and AllTimeValue)
using one SQL statement.

I'm thinking that the UNION might work, but my preliminary results are
taking a huge amount of time, and apparently smegging up the (rather
stupid, external) report generator to boot.
If you like the UNION idea, please give me an example.

I should mention that the report generator at very least can do the
(summing) part. I could do the summing at either the SQL level or the
report level. I should also mention that although I only talk about the
one file here (myFile), in fact I need to join to and pull values from
its "parent" file, although I don't think that that should change my
fundamental problem.

Any ideas?

Cheers in advance!

-GlennYou could try something like this:
Select Id
, Sum(Value) Alltimevalue
, Sum(Case
When Date >= [Lower Date Range Value] Then Value
Else 0 End) As Rangedvalue
From Myfile
Where Date <= [Higher Date Range Value];
;)