Showing posts with label process. Show all posts
Showing posts with label process. Show all posts

Friday, March 30, 2012

Losing rows from file to destination table - need troubleshooting help

I am losing data from time to time and can not figure out where the rows are vanishing to. In essence, I have 5 files that I process. The processing occurs on a daily basis by one SQLAgent job that calls 5 individual packages, each doing a different piece of work.

I stage each file to a temp table, then perform some minor transformations, then union the rowsets from each file before landing to another temporary table. A subsequent package performs additional transformations that apply to the entire dataset, before inserting them to their final destination. Along the way, I reject some records based on the business rules applied. Each package in the entire Job is logging(OnError, TaskFailed, Pre and Post execute). Their are no errors being generated. No rows are being rejected to my reject tables either.

Without getting into the specific transforms, etc. being used in this complex process, has anyone seen similar unexplained behaviour? I've not been able to identify any pattern, except that it is usually only 1 or 2 of the record types (specific to a source file), that will ever not be loaded. No patterns around volumes for specific source files. There are some lookups and dedupes, however I've seen the records 'drop off' before reaching these transforms. I have noticed I had my final desination load not using fast load. However sometimes the records disappear before even getting to my final staging table which is inserting using fast load. I am going to turn on logging the pipelinerowssent event. Any other suggestions for troubleshooting/tracking down these disappearing records?

Thanks

Couple more clarifications:

I have run the same files thru manually in debug mode to find that I can watch all the rows thru the entire process.

We have seen strange behaviour in running packages as scheduled jobs thru SQLAgent

Utilizes unions which seem a bit klunky.

|||

Joe,

We have seen similar problems on my current project so yesterday we turned on OnPipelineRowsSent logging.

Another thing we have done is output the data from each transform component to a file for later examination. The MULTICAST transform is invaluable in order to do this.

as yet we haven't found out what is going on. Its strange.

-Jamie

|||

I'm not sure I would use the adjective strange, but..

In trying to troubleshoot this process, I first changed the union tranform that was taking 6 input streams, and busted it out to 5 individual waterfalling unions each with 2 input streams. No change in behaviour.

I then changed the package that moves this data, by adding in multicasts to output to a file after every transform along the way up to the final destination, after the 5 unions. Just by adding the multicasts into the flow has resulted in no rows vanishing for the daily loads for the past week. Unfortunately, I don't have time to really troubleshoot further, but I think that this demonstrates that there is indeed a serious bug here. I still suspect it has to do with the union transform. I am quite scared for anyone else's shop that has decided to standardize ETL to this tool, as we have. As developers, we have only time to test our code, not testing that the native tool functionality is behaving as expected. In addition ,to have to monitor on a regular basis that it is performing properly, is not acceptable.

Hoping this problem magically went away with SP1....

JH

Monday, March 26, 2012

looping through recordset

hello,
i have a select query that returns multiple rows (within a cursor). How do i loop through the rows to process it (in a stored proc)? I donot want to use nested cursors. a code sample is requested.
thanks!Usually a cursor is created to process multiple rows (unless you developed a passion for cursors and would like to create one for 1 row).

Have you looked at FETCH NEXT?|||Well, the situation is somethin like this. I am using a cursor that returns multiple rows, within the while @.@.FETCH_STATUS = 0, I have another sql select that returns multiple rows for every row the cursor is processing. I would want to process every row returned in the inner select query. Do I have an option other than using nested cursors? Any help is appreciated.|||If you post the code maybe we can get rid of the cursor.

looping through query result column and PRINT to log file....

i am creating a log file, and at 1 point I record how many rows are deleted after an update process.. I want to also record which productID are actually deleted... how would I write that?!

Basically I need know how I would get the list, and I am using PRINT command... I need to somehow write a loop that works through the list and PRINTS to the file...

Thanks in advanceI got it :)

For any1 who might be interested heres how its done

DECLARE @.NbrList VarChar(300)
SELECT @.NbrList = COALESCE(@.NbrList + ', ', '') + CAST(Invoice AS varchar(30))
from TRANSACTIONS where ProductID = 'CVSDBN'
PRINT @.NbrList

Thanks ;)sql

Friday, March 23, 2012

Looping In SQL 2000 (Can it be a nested loop)

I have to automate a process that assigns sales leads to sales people.

For example:
Every day we buy a list of sales leads, it ranges in size from 50 -
100 records.
We have a team of sales people that also can range from 5 - 8 people.

I need to take the new records and divide them evenly among the sales
people.

If i get 50 records, and have 5 sales people, then each sales person
gets 10 leads.
--
So, im guessing that I may need to have a nested loop inside this. I
have tried it several different ways, but cant seem to get it quite
right.

DECLARE @.TotalRecordCount int, @.TotalSalesPeopleCount int,
@.AmountForEach int, @.LooperSalesPeoplerecords int,
@.LooperNewSalesLeadsRecords int, @.SalesPersonID int

SELECT @.TotalSalesPeopleCount = COUNT(UserId)
FROM SalesPeople
WHERE Active = 1
--
SELECT @.TotalRecordCount = COUNT(*)
FROM NewSalesLeads
--
SELECT @.AmountForEach = (@.TotalRecordCount/@.TotalSalesPeopleCount)
--
SELECT @.LooperSalesPeoplerecords = 1
SELECT @.LooperNewSalesLeadsRecords = 1
--
WHILE @.LooperSalesPeoplerecords <= @.TotalSalesPeopleCount
BEGIN
WHILE @.LooperNewSalesLeadsRecords <= @.TotalRecordCount
BEGIN
SELECT @.SalesPersonID = (SELECT UserID
FROM SalesPeople
WHERE UniqueId = @.LooperSalesPeoplerecords)

SELECT @.LooperSalesPeoplerecords =
(@.LooperSalesPeoplerecords + 1)

UPDATE SalesLeads
SET SalesPerson_ID = @.SalesPersonID
WHERE UNIQUEID = @.LooperSalesPeoplerecords

SELECT @.LooperSalesPeoplerecords =
(@.LooperSalesPeoplerecords + 1)
END
END

--
Table structures

CREATE TABLE [dbo].[SalesPeople] (
[SalesPerson_ID] [int] NOT NULL ,
[FirstName] [varchar](20)NOT NULL
) ON [PRIMARY]
--
INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (26, 'Bill')
INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (28, 'Bob')
INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (37,
'Chris')
---------------
CREATE TABLE [dbo].[SalesLeads] (
[SalesLeadID] [int]NOT NULL ,
[SalesPerson_ID] [int]NOT NULL
) ON [PRIMARY]
--
INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1001,0)
INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1002,0)
INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1003,0)
INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1004,0)
INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1005,0)
INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1006,0)
---------------

So in this case, all 3 salespeople should receive 2 salesleads each.

I dummied this down quite a bit. It actually ends up being more like
15 sales people, and about 400,000 sales leads. But it should work on
any level.

Thanks for any help you might shed on this.Hi

Something like this may help. It assumes the sales IDs are sequential, and
does not worry if there are a an exact division.

UPDATE L
SET SalesPerson_ID = M.SalesPerson_ID
FROM SalesLeads L JOIN
( SELECT P.[FirstName], P.[SalesPerson_ID] ,
( SELECT COUNT(*) FROM [dbo].[SalesPeople] S WHERE S.[SalesPerson_ID] <
P.[SalesPerson_ID] ) AS Rank
FROM [dbo].[SalesPeople] P ) M
ON L.SalesLeadID % ( SELECT COUNT(*) FROM [dbo].[SalesPeople] ) = M.Rank

John

"Dave" <funkdm1@.yahoo.com> wrote in message
news:f5174e0f.0406180540.2afaed20@.posting.google.c om...
> I have to automate a process that assigns sales leads to sales people.
> For example:
> Every day we buy a list of sales leads, it ranges in size from 50 -
> 100 records.
> We have a team of sales people that also can range from 5 - 8 people.
> I need to take the new records and divide them evenly among the sales
> people.
> If i get 50 records, and have 5 sales people, then each sales person
> gets 10 leads.
> --
> So, im guessing that I may need to have a nested loop inside this. I
> have tried it several different ways, but cant seem to get it quite
> right.
> DECLARE @.TotalRecordCount int, @.TotalSalesPeopleCount int,
> @.AmountForEach int, @.LooperSalesPeoplerecords int,
> @.LooperNewSalesLeadsRecords int, @.SalesPersonID int
> SELECT @.TotalSalesPeopleCount = COUNT(UserId)
> FROM SalesPeople
> WHERE Active = 1
> --
> SELECT @.TotalRecordCount = COUNT(*)
> FROM NewSalesLeads
> --
> SELECT @.AmountForEach = (@.TotalRecordCount/@.TotalSalesPeopleCount)
> --
> SELECT @.LooperSalesPeoplerecords = 1
> SELECT @.LooperNewSalesLeadsRecords = 1
> --
> WHILE @.LooperSalesPeoplerecords <= @.TotalSalesPeopleCount
> BEGIN
> WHILE @.LooperNewSalesLeadsRecords <= @.TotalRecordCount
> BEGIN
> SELECT @.SalesPersonID = (SELECT UserID
> FROM SalesPeople
> WHERE UniqueId = @.LooperSalesPeoplerecords)
> SELECT @.LooperSalesPeoplerecords =
> (@.LooperSalesPeoplerecords + 1)
> UPDATE SalesLeads
> SET SalesPerson_ID = @.SalesPersonID
> WHERE UNIQUEID = @.LooperSalesPeoplerecords
> SELECT @.LooperSalesPeoplerecords =
> (@.LooperSalesPeoplerecords + 1)
> END
> END
> --
> Table structures
> CREATE TABLE [dbo].[SalesPeople] (
> [SalesPerson_ID] [int] NOT NULL ,
> [FirstName] [varchar](20)NOT NULL
> ) ON [PRIMARY]
> --
> INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (26, 'Bill')
> INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (28, 'Bob')
> INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (37,
> 'Chris')
> ---------------
> CREATE TABLE [dbo].[SalesLeads] (
> [SalesLeadID] [int]NOT NULL ,
> [SalesPerson_ID] [int]NOT NULL
> ) ON [PRIMARY]
> --
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1001,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1002,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1003,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1004,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1005,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1006,0)
> ---------------
> So in this case, all 3 salespeople should receive 2 salesleads each.
> I dummied this down quite a bit. It actually ends up being more like
> 15 sales people, and about 400,000 sales leads. But it should work on
> any level.
> Thanks for any help you might shed on this.|||John,

Yeah, we arent that concerned about having an exactly even
distribution so uneven divisors are not a problem.

And, unfortunately the sales people arent in sequential order. WE have
hundreds of sales people that are assigned to various campaigns at any
given time. So this particular campaign may change on a daily basis.
Meaning that the salespersonid is always non sequential for this
campaign.

Thanks anyway.

"John Bell" <jbellnewsposts@.hotmail.com> wrote in message news:<_mDAc.3861$Is4.34270449@.news-text.cableinet.net>...
> Hi
> Something like this may help. It assumes the sales IDs are sequential, and
> does not worry if there are a an exact division.
> UPDATE L
> SET SalesPerson_ID = M.SalesPerson_ID
> FROM SalesLeads L JOIN
> ( SELECT P.[FirstName], P.[SalesPerson_ID] ,
> ( SELECT COUNT(*) FROM [dbo].[SalesPeople] S WHERE S.[SalesPerson_ID] <
> P.[SalesPerson_ID] ) AS Rank
> FROM [dbo].[SalesPeople] P ) M
> ON L.SalesLeadID % ( SELECT COUNT(*) FROM [dbo].[SalesPeople] ) = M.Rank
> John
> "Dave" <funkdm1@.yahoo.com> wrote in message
> news:f5174e0f.0406180540.2afaed20@.posting.google.c om...
> > I have to automate a process that assigns sales leads to sales people.
> > For example:
> > Every day we buy a list of sales leads, it ranges in size from 50 -
> > 100 records.
> > We have a team of sales people that also can range from 5 - 8 people.
> > I need to take the new records and divide them evenly among the sales
> > people.
> > If i get 50 records, and have 5 sales people, then each sales person
> > gets 10 leads.
> > --
> > So, im guessing that I may need to have a nested loop inside this. I
> > have tried it several different ways, but cant seem to get it quite
> > right.
> > DECLARE @.TotalRecordCount int, @.TotalSalesPeopleCount int,
> > @.AmountForEach int, @.LooperSalesPeoplerecords int,
> > @.LooperNewSalesLeadsRecords int, @.SalesPersonID int
> > SELECT @.TotalSalesPeopleCount = COUNT(UserId)
> > FROM SalesPeople
> > WHERE Active = 1
> > --
> > SELECT @.TotalRecordCount = COUNT(*)
> > FROM NewSalesLeads
> > --
> > SELECT @.AmountForEach = (@.TotalRecordCount/@.TotalSalesPeopleCount)
> > --
> > SELECT @.LooperSalesPeoplerecords = 1
> > SELECT @.LooperNewSalesLeadsRecords = 1
> > --
> > WHILE @.LooperSalesPeoplerecords <= @.TotalSalesPeopleCount
> > BEGIN
> > WHILE @.LooperNewSalesLeadsRecords <= @.TotalRecordCount
> > BEGIN
> > SELECT @.SalesPersonID = (SELECT UserID
> > FROM SalesPeople
> > WHERE UniqueId = @.LooperSalesPeoplerecords)
> > SELECT @.LooperSalesPeoplerecords =
> > (@.LooperSalesPeoplerecords + 1)
> > UPDATE SalesLeads
> > SET SalesPerson_ID = @.SalesPersonID
> > WHERE UNIQUEID = @.LooperSalesPeoplerecords
> > SELECT @.LooperSalesPeoplerecords =
> > (@.LooperSalesPeoplerecords + 1)
> > END
> > END
> > --
> > Table structures
> > CREATE TABLE [dbo].[SalesPeople] (
> > [SalesPerson_ID] [int] NOT NULL ,
> > [FirstName] [varchar](20)NOT NULL
> > ) ON [PRIMARY]
> > --
> > INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (26, 'Bill')
> > INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (28, 'Bob')
> > INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (37,
> > 'Chris')
> > ---------------
> > CREATE TABLE [dbo].[SalesLeads] (
> > [SalesLeadID] [int]NOT NULL ,
> > [SalesPerson_ID] [int]NOT NULL
> > ) ON [PRIMARY]
> > --
> > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1001,0)
> > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1002,0)
> > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1003,0)
> > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1004,0)
> > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1005,0)
> > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1006,0)
> > ---------------
> > So in this case, all 3 salespeople should receive 2 salesleads each.
> > I dummied this down quite a bit. It actually ends up being more like
> > 15 sales people, and about 400,000 sales leads. But it should work on
> > any level.
> > Thanks for any help you might shed on this.|||Hi

The solution does not rely on salesperson id being sequential as it ranks
them separately. If the salesleads id are reasonably sequential then you
should be ok.

John

"Dave" <funkdm1@.yahoo.com> wrote in message
news:f5174e0f.0406181026.4c69acff@.posting.google.c om...
> John,
> Yeah, we arent that concerned about having an exactly even
> distribution so uneven divisors are not a problem.
> And, unfortunately the sales people arent in sequential order. WE have
> hundreds of sales people that are assigned to various campaigns at any
> given time. So this particular campaign may change on a daily basis.
> Meaning that the salespersonid is always non sequential for this
> campaign.
> Thanks anyway.
> "John Bell" <jbellnewsposts@.hotmail.com> wrote in message
news:<_mDAc.3861$Is4.34270449@.news-text.cableinet.net>...
> > Hi
> > Something like this may help. It assumes the sales IDs are sequential,
and
> > does not worry if there are a an exact division.
> > UPDATE L
> > SET SalesPerson_ID = M.SalesPerson_ID
> > FROM SalesLeads L JOIN
> > ( SELECT P.[FirstName], P.[SalesPerson_ID] ,
> > ( SELECT COUNT(*) FROM [dbo].[SalesPeople] S WHERE S.[SalesPerson_ID] <
> > P.[SalesPerson_ID] ) AS Rank
> > FROM [dbo].[SalesPeople] P ) M
> > ON L.SalesLeadID % ( SELECT COUNT(*) FROM [dbo].[SalesPeople] ) = M.Rank
> > John
> > "Dave" <funkdm1@.yahoo.com> wrote in message
> > news:f5174e0f.0406180540.2afaed20@.posting.google.c om...
> > > I have to automate a process that assigns sales leads to sales people.
> > > > For example:
> > > Every day we buy a list of sales leads, it ranges in size from 50 -
> > > 100 records.
> > > We have a team of sales people that also can range from 5 - 8 people.
> > > > I need to take the new records and divide them evenly among the sales
> > > people.
> > > > If i get 50 records, and have 5 sales people, then each sales person
> > > gets 10 leads.
> > > --
> > > So, im guessing that I may need to have a nested loop inside this. I
> > > have tried it several different ways, but cant seem to get it quite
> > > right.
> > > > DECLARE @.TotalRecordCount int, @.TotalSalesPeopleCount int,
> > > @.AmountForEach int, @.LooperSalesPeoplerecords int,
> > > @.LooperNewSalesLeadsRecords int, @.SalesPersonID int
> > > > SELECT @.TotalSalesPeopleCount = COUNT(UserId)
> > > FROM SalesPeople
> > > WHERE Active = 1
> > > --
> > > SELECT @.TotalRecordCount = COUNT(*)
> > > FROM NewSalesLeads
> > > --
> > > SELECT @.AmountForEach = (@.TotalRecordCount/@.TotalSalesPeopleCount)
> > > --
> > > SELECT @.LooperSalesPeoplerecords = 1
> > > SELECT @.LooperNewSalesLeadsRecords = 1
> > > --
> > > WHILE @.LooperSalesPeoplerecords <= @.TotalSalesPeopleCount
> > > BEGIN
> > > WHILE @.LooperNewSalesLeadsRecords <= @.TotalRecordCount
> > > BEGIN
> > > SELECT @.SalesPersonID = (SELECT UserID
> > > FROM SalesPeople
> > > WHERE UniqueId = @.LooperSalesPeoplerecords)
> > > > SELECT @.LooperSalesPeoplerecords =
> > > (@.LooperSalesPeoplerecords + 1)
> > > > UPDATE SalesLeads
> > > SET SalesPerson_ID = @.SalesPersonID
> > > WHERE UNIQUEID = @.LooperSalesPeoplerecords
> > > > SELECT @.LooperSalesPeoplerecords =
> > > (@.LooperSalesPeoplerecords + 1)
> > > END
> > > END
> > > > --
> > > Table structures
> > > > CREATE TABLE [dbo].[SalesPeople] (
> > > [SalesPerson_ID] [int] NOT NULL ,
> > > [FirstName] [varchar](20)NOT NULL
> > > ) ON [PRIMARY]
> > > --
> > > INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (26, 'Bill')
> > > INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (28, 'Bob')
> > > INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (37,
> > > 'Chris')
> > > ---------------
> > > CREATE TABLE [dbo].[SalesLeads] (
> > > [SalesLeadID] [int]NOT NULL ,
> > > [SalesPerson_ID] [int]NOT NULL
> > > ) ON [PRIMARY]
> > > --
> > > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1001,0)
> > > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1002,0)
> > > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1003,0)
> > > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1004,0)
> > > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1005,0)
> > > INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1006,0)
> > > ---------------
> > > > So in this case, all 3 salespeople should receive 2 salesleads each.
> > > > I dummied this down quite a bit. It actually ends up being more like
> > > 15 sales people, and about 400,000 sales leads. But it should work on
> > > any level.
> > > > Thanks for any help you might shed on this.|||I think meant to have something like this, with the keys shown and
some extra columns addded:

CREATE TABLE SalesPeople
(salesperson_id INTEGER NOT NULL PRIMARY KEY,
firstname VARCHAR(20)NOT NULL,
seq_nbr INTEGER NOT NULL,
.. );

Run this guy when then table is changed.

UPDATE SalesPeople
SET seq_nbr
= (SELECT COUNT(*)
FROM SalesPeople AS P1
WHERE P1.saleperson_id <= SalesPeople.salesperson_id)

Now a table for the leads as leads and not what you had:

CREATE TABLE SalesLeads
(saleslead_id INTEGER NOT NULL PRIMARY KEY,
grp_nbr INTEGER,
..);

First group the leads based on the number of sales people:

UPDATE SalesLeads
SET grp_nbr
= (SELECT COUNT(*)
FROM SalesLeads AS L1
WHERE L1.saleslead_id <= SalesLeads.saleslead_id)
% (SELECT COUNT(*) FROM SalesPeople);

Now look at a VIEW to get each sales person the group of leads they
are to work.

CREATE VIEW LeadAssignments (saleslead_id, salesperson_id)
AS SELECT L1.saleslead_id, P1.salesperson_id
FROM Salesleads AS L1, SalesPeople AS P1
WHERE L1.grp_nbr = P1.seq_nbr;|||I think I understand what you are saying here. However, This thing has
to be comletely automated and just do the update everynight.

I do appreciate the suggestions.

I have to think that this is used fairly often, and cant be this
difficult to find some folks that have written something similar.

Thanks,
Dave

jcelko212@.earthlink.net (--CELKO--) wrote in message news:<18c7b3c2.0406181437.187bf80a@.posting.google.com>...
> I think meant to have something like this, with the keys shown and
> some extra columns addded:
> CREATE TABLE SalesPeople
> (salesperson_id INTEGER NOT NULL PRIMARY KEY,
> firstname VARCHAR(20)NOT NULL,
> seq_nbr INTEGER NOT NULL,
> .. );
> Run this guy when then table is changed.
> UPDATE SalesPeople
> SET seq_nbr
> = (SELECT COUNT(*)
> FROM SalesPeople AS P1
> WHERE P1.saleperson_id <= SalesPeople.salesperson_id)
> Now a table for the leads as leads and not what you had:
> CREATE TABLE SalesLeads
> (saleslead_id INTEGER NOT NULL PRIMARY KEY,
> grp_nbr INTEGER,
> ..);
> First group the leads based on the number of sales people:
> UPDATE SalesLeads
> SET grp_nbr
> = (SELECT COUNT(*)
> FROM SalesLeads AS L1
> WHERE L1.saleslead_id <= SalesLeads.saleslead_id)
> % (SELECT COUNT(*) FROM SalesPeople);
> Now look at a VIEW to get each sales person the group of leads they
> are to work.
> CREATE VIEW LeadAssignments (saleslead_id, salesperson_id)
> AS SELECT L1.saleslead_id, P1.salesperson_id
> FROM Salesleads AS L1, SalesPeople AS P1
> WHERE L1.grp_nbr = P1.seq_nbr;|||How about something like this:

----------------------
Create Table #Leads
(
LeadID Int Identity Not Null,
SalesID Int,
UniqueId Int -- Bad name
)

Create Table #SalesPeople
(
SalesID Int Identity (0, 1) Not Null,
UserID Char(30) -- type?
)

DECLARE @.SalesPeopleCount int

-- Put sales people in the temp table.
Insert Into #SalesPeople (UserID)
Select UserID From SalesPeople Where Active = 1

-- How many were there?
Set @.SalesPeopleCount = @.@.RowCount

-- Put leads in the temp table.
Insert Into #Leads (UniqueId)
Select UniqueId From NewSalesLeads

-- Match sales people to leads
Update #Leads Set SalesID = LeadID % @.SalesPeopleCount

-- Save the matches.
UPDATE SalesLeads
SET SalesPerson_ID = UserID
From
SalesLeads
Join #Leads On SalesLeads.UniqueId = #Leads.UniqueId
Join #SalesPeople On #Leads.SalesID = #SalesPeople.SalesID

----------------------

Temporary IDs will always be sequential. The only problem with
this is that lower sales IDs will consistently get more leads.
You could fix that by adding something variable to LeadID before
performing the mod.

If you're going to use ID columns, you need to treat them as keys.
Joe Celko was using something like this as a bad example a few
weeks ago, trying to convince me that ID columns are a bad idea.
Used halfway like this, I would agree.

Bill

Dave wrote:
> I have to automate a process that assigns sales leads to sales people.
> For example:
> Every day we buy a list of sales leads, it ranges in size from 50 -
> 100 records.
> We have a team of sales people that also can range from 5 - 8 people.
> I need to take the new records and divide them evenly among the sales
> people.
> If i get 50 records, and have 5 sales people, then each sales person
> gets 10 leads.
> --
> So, im guessing that I may need to have a nested loop inside this. I
> have tried it several different ways, but cant seem to get it quite
> right.
> DECLARE @.TotalRecordCount int, @.TotalSalesPeopleCount int,
> @.AmountForEach int, @.LooperSalesPeoplerecords int,
> @.LooperNewSalesLeadsRecords int, @.SalesPersonID int
> SELECT @.TotalSalesPeopleCount = COUNT(UserId)
> FROM SalesPeople
> WHERE Active = 1
> --
> SELECT @.TotalRecordCount = COUNT(*)
> FROM NewSalesLeads
> --
> SELECT @.AmountForEach = (@.TotalRecordCount/@.TotalSalesPeopleCount)
> --
> SELECT @.LooperSalesPeoplerecords = 1
> SELECT @.LooperNewSalesLeadsRecords = 1
> --
> WHILE @.LooperSalesPeoplerecords <= @.TotalSalesPeopleCount
> BEGIN
> WHILE @.LooperNewSalesLeadsRecords <= @.TotalRecordCount
> BEGIN
> SELECT @.SalesPersonID = (SELECT UserID
> FROM SalesPeople
> WHERE UniqueId = @.LooperSalesPeoplerecords)
> SELECT @.LooperSalesPeoplerecords =
> (@.LooperSalesPeoplerecords + 1)
> UPDATE SalesLeads
> SET SalesPerson_ID = @.SalesPersonID
> WHERE UNIQUEID = @.LooperSalesPeoplerecords
> SELECT @.LooperSalesPeoplerecords =
> (@.LooperSalesPeoplerecords + 1)
> END
> END
> --
> Table structures
> CREATE TABLE [dbo].[SalesPeople] (
> [SalesPerson_ID] [int] NOT NULL ,
> [FirstName] [varchar](20)NOT NULL
> ) ON [PRIMARY]
> --
> INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (26, 'Bill')
> INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (28, 'Bob')
> INSERT INTO SalesPeople (SalesPerson_ID,FirstName) VALUES (37,
> 'Chris')
> ---------------
> CREATE TABLE [dbo].[SalesLeads] (
> [SalesLeadID] [int]NOT NULL ,
> [SalesPerson_ID] [int]NOT NULL
> ) ON [PRIMARY]
> --
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1001,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1002,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1003,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1004,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1005,0)
> INSERT INTO SalesLeads (SalesLeadID,SalesPerson_ID) VALUES (1006,0)
> ---------------
> So in this case, all 3 salespeople should receive 2 salesleads each.
> I dummied this down quite a bit. It actually ends up being more like
> 15 sales people, and about 400,000 sales leads. But it should work on
> any level.
> Thanks for any help you might shed on this.

Wednesday, March 21, 2012

Loop container to process all excel files

What I'm trying to achieve is a SSIS package that will pick up 1 or more excel files, process the data in them via the conditional splitter, pushing the good data into a table, and all other rows into an error table.
I'm having some issues using the ForEach container to process multiple excel spreadsheets into tables. The excel import into the tables is more or less working (imports data for good cases, but uses a null if the Excel Source when it gets an unexpected value - but that's a seperate problem).
I found something related to this when searching, but it related to CTPs (June and September) and trying to reuse the connection strings they built up there (using my own variable names, naturally) causes a 'Property Value failure':
--
The connection string format is not valid. It must consist of one or more components of the form X=Y, separated by semicolons. This error occurs when a connection string with zero components is set on database connection manager.
--
I attemtpted to use this:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::RankingFileFullPath] + ";Extended Properties=\"Excel 8.0;HDR=YES\";"
The excel importer works fine as a stand-alone component. Trying to use the process defined in 'Profession SQL Server Integration Services' pp140, I tried to use an expression to assign the variable value to the connection string. I get a validation error:
--
Error at Import TPNB Ranking Excel spreadsheets [Excel Source [1]]: The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009.
Error at Import TPNB Ranking Excel spreadsheets [DTS.Pipeline]: component "Excel Source" (1) failed validation and returned error code 0xC020801C.
Error at Import TPNB Ranking Excel spreadsheets [DTS.Pipeline]: One or more component failed validation.
Error at Import TPNB Ranking Excel spreadsheets: There were errors during task validation.
Error at Excel Importer [Connection manager "Excel Connection Manager"]: An OLE DB error has occurred. Error code: 0x80040E4D.
--
Any advice?
....
.... in addition ....
I attempted to change the package - I set the Data Flow validation to Delay Validation, and changed the expression to change from:
ConnectionString @.[User::RankingFileFullPath]
to
ExcelFilePath @.[User::RankingFileFullPath]
This allowed the package to start debugging, and gave more information in the failure:
--
SSIS package "Excel Importer.dtsx" starting.
SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Excel Importer' has been hit
SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Foreach Loop Container' has been hit
SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit
Information: 0x4004300A at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: Validation phase is beginning.
Warning: 0x802092A7 at Import TPNB Ranking Excel spreadsheets, ProductSalesRank Table [278]: Truncation may occur due to inserting data from data flow column "Rank" with a length of 1000 to database column "SalesRank" with a length of 50.
Error: 0xC0202009 at Excel Importer, Connection manager "Excel Connection Manager": An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft JET Database Engine" Hresult: 0x80004005 Description: "Unrecognized database format 'D:\Testing\TestRanking.xls'.".
Error: 0xC020801C at Import TPNB Ranking Excel spreadsheets, Excel Source [1]: The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009.
Error: 0xC0047017 at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: component "Excel Source" (1) failed validation and returned error code 0xC020801C.
Error: 0xC004700C at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: One or more component failed validation.
Error: 0xC0024107 at Import TPNB Ranking Excel spreadsheets: There were errors during task validation.
SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit
Warning: 0x80019002 at Foreach Loop Container: The Execution method succeeded, but the number of errors raised (5) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit
SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit
SSIS breakpoint 'Break when the container receives the OnWarning event' at executable 'Excel Importer' has been hit
Warning: 0x80019002 at Excel Importer: The Execution method succeeded, but the number of errors raised (5) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS breakpoint 'Break when the container receives the OnPostExecute event' at executable 'Excel Importer' has been hit
SSIS package "Excel Importer.dtsx" finished: Failure.
The program '[2460] Excel Importer.dtsx: DTS' has exited with code 0 (0x0).--
Apparently, I added too much detail, and poeple are avoiding the question :-)
I've tested doing exactly the same thing, but using CSV's as the source (i.e. text files and not excel files). And it works fine. So there's nothing wrong with the environment, DB etc.
...aaarg|||

I do recall other people having similar problems with Excel but I don't know if there was a resolution.

Sorry...not much help I'm afraid.

-Jamie

|||Thanks for the response - at least I have had one, now
The only reference's I've found pointed me to (a) re-installing service pack 1 to ensure SQL and SSIS etc. are all good. I did that, but no go. The other reference seemed to say I need to write an activeX (or .Net script) component to handle the files. That's just got to be wrong - what's the point of having to continuously kludge different sections of SSIS together with .Net scripting... In that case it seems that we should rather do the whole thing in 'Net ... *sighs* ...
I guess I'm just a bit frustrated that something as simple as importing a few excel files into a SQL table doesn't work simply ... after all the forEach and loop controls was one of the NEW BIG THINGS that SSIS was bringing us.... oh well, it's getting later on a Friday, and it looks like it's going to be a long, frustrating weekend (... it might be better if I knew how to write the .Net )
Well, I hope you (and anyone else reading my ranting) has a good weekend.
|||

I was able to complete your scenario successfully following the instructions at: http://msdn2.microsoft.com/en-us/library/ms345182.aspx

Note that I paid particular attention to this paragraph:

To avoid validation errors later as you configure tasks and data flow components to use this connection manager, assign a default Excel workbook in the Excel Connection Manager Editor. After you create and configure the package, you can delete this value in the Properties window. However, after you delete this value, a validation error may occur because the connection string property of the Excel connection manager is no longer valid until the Foreach Loop runs. In this case, set the DelayValidation property to True on the connection manager, on the tasks in which the connection manager is used, or on the package.

If I did not have DelayValidation set True on the connection manager and have a valid default value in the connection manager I received the same errors as you. I also set DelayedValidation = True on the Data Flow Task for extra caution.

Donald

|||

Hi Donald,

Thanks for the link and the reponse ... I appreciate it.

I'll have a look - the article loks promising. I did already have all my 'delay validations' set to true, and I initially created the excel connection with a valid connection - I had it pointing at one of the file I want to impoty. When it is hard-coded to the excel file, my import runs fine... as soon as I delete the connection name, and replace it witht eh variable name @.[User::RankingFileFullPath], mr problems occur. I'll read through the article, and create a new package, just to ensure I don't inherit some issues I created in my attempts to appease the SSIS connections gods... :-)

thanks again

|||

I followed the article's guidelines, but am having an issue.

The sequence of actions:

Create New Project.

Create ForEach container.

Create Variables ExcelFileFullPath and ExcelConnectionProperties. Set the Container to map the ExcelFileFullPath.

Create Excel Connection, pointing to valid, existing excel file.

Alter Excel connection properties: Delay Validation = true. Expression: ConnectionString:

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\""

All good, so far.

Create a data flow control in the container. Add an excel source. Try connecting excel source to excel connection - error: Initial error was an Hresult type error (not captured). Trying to go further, I now get 'Invlaid Arguement' when trying to choose name of excel sheet - this happens on new excel source as well.

I'm going to start again, this time creating an excel import that works correctly. Once that's done, I'll try and move it into the container, and apply the properties as described in article, and see if that makes a difference (surely it shouldn't?)....

|||

Status:

I have created a Data Flow Task that opens a HardCoded (or direct) Excel Connection, does a data conversion, then a conditional split, and inserts rows into a OLE DB table connection. I have a ForEach container that correctly populates the excel file path into the ExcelFullFilePath variable. The Data flow task is OUTSIDE the container. All runs through successfully.

Next:

I intend to move the data flow task into the Container, but not change the excel connection, and confirm that it runs successfully. Assumming success, I will then apply the process from the article, specifically:

Set Delay Validation = True on the Package, Containter, Data Flow and Excel Connection.

Set the ConnectionString (via expressions) =

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\""

I'll respond on progress/results

-- so far:

Run with Data Flow in ForEach container - still runs through, USING Hard-coded connection

Change Delay Validation properties on all objects - still runs through, USING Hard-coded connection

Change ConnectionString property to use dynamic ("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\"") - SUCCESS.

Add additional .xls file to folder for processing - failure. BUT that was (due to a GOOD, DESCRIPTIVE ERROR MESSAGE) because the connection requires the same sheet name for the 2 excel files. I can live with that.

Update 2nd test file sheet name, re-run... SUCCESS. 2 Excel file successfully imported.

Many thanks for the response, especially yours, Donald.

I still don't understand why I had to go through the sequence I had to, i.e. create a fully working excel import, and then alter the properties, but it worked. Also, a useful note for those doing this in the future - make sure the sheets for the multiple files have the same sheet names.

HTH and thanks

|||

Thanks for these detailed notes. They will help us to improve this experience in the future.

Donald

|||Wow, thanks GethWho. I've been looking for this answer all day!

Everyone looking at this make sure the set DelayValidation to true on ALL of your components!

-Matt

Loop container to process all excel files

What I'm trying to achieve is a SSIS package that will pick up 1 or

more excel files, process the data in them via the conditional

splitter, pushing the good data into a table, and all other rows into

an error table.

I'm having some issues using the ForEach container to process multiple

excel spreadsheets into tables. The excel import into the tables is

more or less working (imports data for good cases, but uses a null if

the Excel Source when it gets an unexpected value - but that's a

seperate problem).

I found something related to this when searching, but it related to

CTPs (June and September) and trying to reuse the connection strings

they built up there (using my own variable names, naturally) causes a

'Property Value failure':

--

The connection string format is not valid. It must consist of one or

more components of the form X=Y, separated by semicolons. This error

occurs when a connection string with zero components is set on database

connection manager.

--

I attemtpted to use this:

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +

@.[User::RankingFileFullPath] + ";Extended Properties=\"Excel

8.0;HDR=YES\";"

The excel importer works fine as a stand-alone component. Trying to use

the process defined in 'Profession SQL Server Integration Services'

pp140, I tried to use an expression to assign the variable value to the

connection string. I get a validation error:

--

Error at Import TPNB Ranking Excel spreadsheets [Excel Source [1]]: The

AcquireConnection method call to the connection manager "Excel

Connection Manager" failed with error code 0xC0202009.

Error at Import TPNB Ranking Excel spreadsheets [DTS.Pipeline]:

component "Excel Source" (1) failed validation and returned error code

0xC020801C.

Error at Import TPNB Ranking Excel spreadsheets [DTS.Pipeline]: One or more component failed validation.

Error at Import TPNB Ranking Excel spreadsheets: There were errors during task validation.

Error at Excel Importer [Connection manager "Excel Connection Manager"]: An OLE DB error has occurred. Error code: 0x80040E4D.

--

Any advice?

....

.... in addition ....

I attempted to change the package - I set the Data Flow validation to

Delay Validation, and changed the expression to change from:

ConnectionString @.[User::RankingFileFullPath]

to

ExcelFilePath @.[User::RankingFileFullPath]

This allowed the package to start debugging, and gave more information in the failure:

--

SSIS package "Excel Importer.dtsx" starting.

SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Excel Importer' has been hit

SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Foreach Loop Container' has been hit

SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit

Information: 0x4004300A at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: Validation phase is beginning.

Warning: 0x802092A7 at Import TPNB Ranking Excel spreadsheets,

ProductSalesRank Table [278]: Truncation may occur due to inserting

data from data flow column "Rank" with a length of 1000 to database

column "SalesRank" with a length of 50.

Error: 0xC0202009 at Excel Importer, Connection manager "Excel

Connection Manager": An OLE DB error has occurred. Error code:

0x80004005.

An OLE DB record is available. Source: "Microsoft JET Database

Engine" Hresult: 0x80004005 Description: "Unrecognized

database format 'D:\Testing\TestRanking.xls'.".

Error: 0xC020801C at Import TPNB Ranking Excel spreadsheets, Excel

Source [1]: The AcquireConnection method call to the connection manager

"Excel Connection Manager" failed with error code 0xC0202009.

Error: 0xC0047017 at Import TPNB Ranking Excel spreadsheets,

DTS.Pipeline: component "Excel Source" (1) failed validation and

returned error code 0xC020801C.

Error: 0xC004700C at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: One or more component failed validation.

Error: 0xC0024107 at Import TPNB Ranking Excel spreadsheets: There were errors during task validation.

SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit

Warning: 0x80019002 at Foreach Loop Container: The Execution method

succeeded, but the number of errors raised (5) reached the maximum

allowed (1); resulting in failure. This occurs when the number of

errors reaches the number specified in MaximumErrorCount. Change the

MaximumErrorCount or fix the errors.

SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit

SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit

SSIS breakpoint 'Break when the container receives the OnWarning event' at executable 'Excel Importer' has been hit

Warning: 0x80019002 at Excel Importer: The Execution method succeeded,

but the number of errors raised (5) reached the maximum allowed (1);

resulting in failure. This occurs when the number of errors reaches the

number specified in MaximumErrorCount. Change the MaximumErrorCount or

fix the errors.

SSIS breakpoint 'Break when the container receives the OnPostExecute event' at executable 'Excel Importer' has been hit

SSIS package "Excel Importer.dtsx" finished: Failure.

The program '[2460] Excel Importer.dtsx: DTS' has exited with code 0

(0x0).--Apparently, I added too much detail, and poeple are avoiding the question :-)

I've tested doing exactly the same thing, but using CSV's as the source

(i.e. text files and not excel files). And it works fine. So there's

nothing wrong with the environment, DB etc.

...aaarg|||

I do recall other people having similar problems with Excel but I don't know if there was a resolution.

Sorry...not much help I'm afraid.

-Jamie

|||Thanks for the response - at least I have had one, now

The only reference's I've found pointed me to (a) re-installing service

pack 1 to ensure SQL and SSIS etc. are all good. I did that, but no go.

The other reference seemed to say I need to write an activeX (or .Net

script) component to handle the files. That's just got to be wrong -

what's the point of having to continuously kludge different sections of

SSIS together with .Net scripting... In that case it seems that we

should rather do the whole thing in 'Net ... *sighs* ...

I guess I'm just a bit frustrated that something as simple as importing

a few excel files into a SQL table doesn't work simply ... after all

the forEach and loop controls was one of the NEW BIG THINGS that SSIS

was bringing us.... oh well, it's getting later on a Friday, and it

looks like it's going to be a long, frustrating weekend (... it might

be better if I knew how to write the .Net )

Well, I hope you (and anyone else reading my ranting) has a good weekend.|||

I was able to complete your scenario successfully following the instructions at: http://msdn2.microsoft.com/en-us/library/ms345182.aspx

Note that I paid particular attention to this paragraph:

To avoid validation errors later as you configure tasks and data flow components to use this connection manager, assign a default Excel workbook in the Excel Connection Manager Editor. After you create and configure the package, you can delete this value in the Properties window. However, after you delete this value, a validation error may occur because the connection string property of the Excel connection manager is no longer valid until the Foreach Loop runs. In this case, set the DelayValidation property to True on the connection manager, on the tasks in which the connection manager is used, or on the package.

If I did not have DelayValidation set True on the connection manager and have a valid default value in the connection manager I received the same errors as you. I also set DelayedValidation = True on the Data Flow Task for extra caution.

Donald

|||

Hi Donald,

Thanks for the link and the reponse ... I appreciate it.

I'll have a look - the article loks promising. I did already have all my 'delay validations' set to true, and I initially created the excel connection with a valid connection - I had it pointing at one of the file I want to impoty. When it is hard-coded to the excel file, my import runs fine... as soon as I delete the connection name, and replace it witht eh variable name @.[User::RankingFileFullPath], mr problems occur. I'll read through the article, and create a new package, just to ensure I don't inherit some issues I created in my attempts to appease the SSIS connections gods... :-)

thanks again

|||

I followed the article's guidelines, but am having an issue.

The sequence of actions:

Create New Project.

Create ForEach container.

Create Variables ExcelFileFullPath and ExcelConnectionProperties. Set the Container to map the ExcelFileFullPath.

Create Excel Connection, pointing to valid, existing excel file.

Alter Excel connection properties: Delay Validation = true. Expression: ConnectionString:

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\""

All good, so far.

Create a data flow control in the container. Add an excel source. Try connecting excel source to excel connection - error: Initial error was an Hresult type error (not captured). Trying to go further, I now get 'Invlaid Arguement' when trying to choose name of excel sheet - this happens on new excel source as well.

I'm going to start again, this time creating an excel import that works correctly. Once that's done, I'll try and move it into the container, and apply the properties as described in article, and see if that makes a difference (surely it shouldn't?)....

|||

Status:

I have created a Data Flow Task that opens a HardCoded (or direct) Excel Connection, does a data conversion, then a conditional split, and inserts rows into a OLE DB table connection. I have a ForEach container that correctly populates the excel file path into the ExcelFullFilePath variable. The Data flow task is OUTSIDE the container. All runs through successfully.

Next:

I intend to move the data flow task into the Container, but not change the excel connection, and confirm that it runs successfully. Assumming success, I will then apply the process from the article, specifically:

Set Delay Validation = True on the Package, Containter, Data Flow and Excel Connection.

Set the ConnectionString (via expressions) =

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\""

I'll respond on progress/results

-- so far:

Run with Data Flow in ForEach container - still runs through, USING Hard-coded connection

Change Delay Validation properties on all objects - still runs through, USING Hard-coded connection

Change ConnectionString property to use dynamic ("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\"") - SUCCESS.

Add additional .xls file to folder for processing - failure. BUT that was (due to a GOOD, DESCRIPTIVE ERROR MESSAGE) because the connection requires the same sheet name for the 2 excel files. I can live with that.

Update 2nd test file sheet name, re-run... SUCCESS. 2 Excel file successfully imported.

Many thanks for the response, especially yours, Donald.

I still don't understand why I had to go through the sequence I had to, i.e. create a fully working excel import, and then alter the properties, but it worked. Also, a useful note for those doing this in the future - make sure the sheets for the multiple files have the same sheet names.

HTH and thanks

|||

Thanks for these detailed notes. They will help us to improve this experience in the future.

Donald

|||Wow, thanks GethWho. I've been looking for this answer all day!

Everyone looking at this make sure the set DelayValidation to true on ALL of your components!

-Matt

Monday, March 12, 2012

Lookup error with code 0x80070057

Hi there,

I'm currently designing an ETL process and I'm using lookup transformations.
In one of them, I encountered an 0x80070057 error which I cannot explain.
When I'm looking at the number of rows already processed, the number is not always the same when the error occurs. This is the first strange thing. A second strange thing is the explanations given by SSIS (log):

OnError,DWHAPP1,AWW\RS9906,ODSTran1_1_1_DFT1,{002D0747-8F3E-43EF-A0EA-FE925E668ECB},{BAF1A259-7A26-49ED-B4E5-4BB9BB0BF004},08/03/2006 13:01:15,08/03/2006 13:01:15,-1073450974,0x,
The ProcessInput method on component "ODSTran1_1_1_D1_LU2" (15452) failed with error code 0x80070057. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.

I really don't get it :-(

To explain you a bit what I'm doing, I do a lookup to check if the codes used in the facts (transaction table) exist in the referential tables (dimensions). The lookup in which the problem appears is a simple select on a table.

If someone has an explanation or (better) a solution, shoot! :-)

Renaud

I reply to myself :-)

I've just tried by removing the transformation linked to the Lookup Error path and by making my transormations in another way and it seems to be ok now.

So definetely, there was a random error (because not always at the same moment. I.e. not always with the same data) linked to the Lookup Error path and the way the Lookup managed it.

Still strange to me because the error wasn't explicit at all and the solution is more a workaround than a real solution to a real problem.

Bug ? This is definetely an open question :-)

|||

When you hooked up the component to the lookup's error output did you configure the output to redirect rows on error?

Thanks,

Matt

|||

Matt David wrote:

When you hooked up the component to the lookup's error output did you configure the output to redirect rows on error?

Thanks,

Matt

Yes, I did so first of all but I had to change it to 'ignore failure' and used a workaround to avoid the random error.

The workaround works fine, so I won't waste my time on looking for the real explanation of it.

Thanks Matt :-)

Saturday, February 25, 2012

Looking for process Ideas

I'm looking for Ideas on how to handle a Pldege Reminder process. For example; a pledge is made to pay $2400 over the next two years. They will pay $100 per month and each month a reminder will be sent. No real mistery if there is a balance you send a reminder. My problem is how to handle things like what if they want to pay quarterly or annually and how to determine if a payment is really due based on when they paid last, etc... You most likely see what I mean.

If anyone has done this in the past and/or has any ideas and is willing to share I would greatly appreciate any help.

Some stuff that may help you help me better:

tblClient (ClientID)

tblPledge (PledgeID, ClientID, PledegedAmt, PledgeDate,Frequency,NumberYears)

tblPledgePayments (PmtID, PledgeID,PmtAmt,PmtDate)

Using the following definition:

Client (Id, Name, Email)

Pledge (Id, ClientId, PledgedAmt, PledgeDate, Frequency, NumberOfPeriods)

PledgePayments (Id, PledgeId,PmtAmt,PmtDate)

Try

SELECT Name, Email FROM Client WHERE Id IN (

SELECT ClientId FROM Pledge, PledgePayments WHERE Frequency='M' AND PledgePayments.PledgeId = Pledge.Id

AND (SUM(PmtAmt) < (PledgedAmt * (DateDiff(m, GetDate(), PledgeDate) / NumberOfPeriods)))

You will need to look up DateDiff in Books-on-line and check that m is the indicator for months. Also that GetDate() and PledgeDate are the right way around.

HTH

|||Have you done this sort of thing in the past with pledges?|||

No! It is however a fairly simple bit of SQL. You will need to test the query with a number of test cases.