query equivalent: files

Showing posts with label files. Show all posts

Friday, March 30, 2012

Losing rows from file to destination table - need troubleshooting help

I am losing data from time to time and can not figure out where the rows are vanishing to. In essence, I have 5 files that I process. The processing occurs on a daily basis by one SQLAgent job that calls 5 individual packages, each doing a different piece of work.

I stage each file to a temp table, then perform some minor transformations, then union the rowsets from each file before landing to another temporary table. A subsequent package performs additional transformations that apply to the entire dataset, before inserting them to their final destination. Along the way, I reject some records based on the business rules applied. Each package in the entire Job is logging(OnError, TaskFailed, Pre and Post execute). Their are no errors being generated. No rows are being rejected to my reject tables either.

Without getting into the specific transforms, etc. being used in this complex process, has anyone seen similar unexplained behaviour? I've not been able to identify any pattern, except that it is usually only 1 or 2 of the record types (specific to a source file), that will ever not be loaded. No patterns around volumes for specific source files. There are some lookups and dedupes, however I've seen the records 'drop off' before reaching these transforms. I have noticed I had my final desination load not using fast load. However sometimes the records disappear before even getting to my final staging table which is inserting using fast load. I am going to turn on logging the pipelinerowssent event. Any other suggestions for troubleshooting/tracking down these disappearing records?

Thanks

Couple more clarifications:

I have run the same files thru manually in debug mode to find that I can watch all the rows thru the entire process.

We have seen strange behaviour in running packages as scheduled jobs thru SQLAgent

Utilizes unions which seem a bit klunky.

|||

Joe,

We have seen similar problems on my current project so yesterday we turned on OnPipelineRowsSent logging.

Another thing we have done is output the data from each transform component to a file for later examination. The MULTICAST transform is invaluable in order to do this.

as yet we haven't found out what is going on. Its strange.

-Jamie

|||

I'm not sure I would use the adjective strange, but..

In trying to troubleshoot this process, I first changed the union tranform that was taking 6 input streams, and busted it out to 5 individual waterfalling unions each with 2 input streams. No change in behaviour.

I then changed the package that moves this data, by adding in multicasts to output to a file after every transform along the way up to the final destination, after the 5 unions. Just by adding the multicasts into the flow has resulted in no rows vanishing for the daily loads for the past week. Unfortunately, I don't have time to really troubleshoot further, but I think that this demonstrates that there is indeed a serious bug here. I still suspect it has to do with the union transform. I am quite scared for anyone else's shop that has decided to standardize ETL to this tool, as we have. As developers, we have only time to test our code, not testing that the native tool functionality is behaving as expected. In addition ,to have to monitor on a regular basis that it is performing properly, is not acceptable.

Hoping this problem magically went away with SP1....

Wednesday, March 28, 2012

Looping thru time..

I have a DTS PACKAGE IN SQL 2000 where I need to use vbscript to loop thru the files. I have the package detecting the directory - and the file, BUT the file itself has an interval number from 1- 48 set within it. So in one day I am getting 48 different files.

I can not delete previous files or move them - so I need to - each day - loop thru the day for each interval.

Any thoughts on how I can do this?

Interval 1 = 12:00 AM

Interval 2 = 12:30 AM

Interval 3 = 1:00 AM

etc. etc.

Thanks!

I am not clear exactly what you're after. If you want to translate intervals into respective times for the 48 intervals, you can do like so:

Code Snippet

--create dummy #digits table
select top 48 i=identity(int,1,1)
into #digits
from syscolumns

--the query
select i, right(convert(varchar,dateadd(minute,(i-1)*30,0),100),7) [time]
from #digits

--drop #digits
drop table #digits

|||

Well not exactly taking the intervals and transposing them into a time. Each interval means it is a certain time frame (a 30 min time frame within the 24 hours - giving me a total of 48 intervals in a day).

The file that I get - I get 48 times a day. So I have to loop thru the 48 files in activex for the dts. And I am a bit stuck on that part.

|||

Moved to the SSIS forum, since you are looking for a solution with no T-SQL code. This is more likely to have someone to answer it here. Thanks

|||Are you trying to do this in DTS or SSIS? If it's DTS, try this newsgroup - microsoft.public.sqlserver.dts.|||Forums site is screwy - finally was able to get in and edit/reply... this is for DTS in SQL 2000...

Looping thru time..

I can not delete previous files or move them - so I need to - each day - loop thru the day for each interval.

Any thoughts on how I can do this?

Interval 1 = 12:00 AM

Interval 2 = 12:30 AM

Interval 3 = 1:00 AM

etc. etc.

Thanks!

I am not clear exactly what you're after. If you want to translate intervals into respective times for the 48 intervals, you can do like so:

Code Snippet

--create dummy #digits table
select top 48 i=identity(int,1,1)
into #digits
from syscolumns

--the query
select i, right(convert(varchar,dateadd(minute,(i-1)*30,0),100),7) [time]
from #digits

--drop #digits
drop table #digits

|||

The file that I get - I get 48 times a day. So I have to loop thru the 48 files in activex for the dts. And I am a bit stuck on that part.

|||

Moved to the SSIS forum, since you are looking for a solution with no T-SQL code. This is more likely to have someone to answer it here. Thanks

Monday, March 26, 2012

Looping through several Excel data sources in SSIS

I am attempting to use the foreach loop structure in an SSIS package to
loop through however many Excel files are placed in a directory and
then perform an import operation into a SQL table on each of these
files sequentially. The closest model for this that I was able to find
in the MS tutorial used a flat file source rather than Excel. That
involved adding a new expression to the Connection Manager that set the
connection string to the current filename, as provided by the foreach
component. That works just fine, but when I attempt to apply the same
method to an Excel source, rather than a flat file source, I cannot get
it to work. I see the following error associated with the Excel source
on the Data Flow page: "Validation error. Data Flow Task: Excel Source
[1]: The AcquireConnection method call to the connection manager "Excel
Connection Manager 1" failed with error code 0xC020200." I think that
it's just a matter of getting the right expression, and I thought that
perhaps I should be constructing an expression for ExcelFilePath rather
than the Connection String, but I have fiddled with it for hours and
haven't come up with something that will be accepted. Has anybody out
there been able to do this, or can perhaps refer me to some
documentation that contains an example of what I am trying to do?
Thanks for any help you can give.David,

The April 2006 update of SQL Server 2005 Books Online contains a
new topic titled "How to: Loop through Excel Files and Tables", at
ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/extran9/html/a5393c1a-cc37-491a-a260-7aad84dbff68.htm
or online at
http://msdn2.microsoft.com/en-us/library/ms345182.aspx

Why don't you see if that helps, and if not, let us know exactly
what point in that article something goes wrong.

The download page for BOL, for local installation, is
http://www.microsoft.com/downloads/...&DisplayLang=en

Steve Kass
Drew University

davidz wrote:
> I am attempting to use the foreach loop structure in an SSIS package to
> loop through however many Excel files are placed in a directory and
> then perform an import operation into a SQL table on each of these
> files sequentially. The closest model for this that I was able to find
> in the MS tutorial used a flat file source rather than Excel. That
> involved adding a new expression to the Connection Manager that set the
> connection string to the current filename, as provided by the foreach
> component. That works just fine, but when I attempt to apply the same
> method to an Excel source, rather than a flat file source, I cannot get
> it to work. I see the following error associated with the Excel source
> on the Data Flow page: "Validation error. Data Flow Task: Excel Source
> [1]: The AcquireConnection method call to the connection manager "Excel
> Connection Manager 1" failed with error code 0xC020200." I think that
> it's just a matter of getting the right expression, and I thought that
> perhaps I should be constructing an expression for ExcelFilePath rather
> than the Connection String, but I have fiddled with it for hours and
> haven't come up with something that will be accepted. Has anybody out
> there been able to do this, or can perhaps refer me to some
> documentation that contains an example of what I am trying to do?
> Thanks for any help you can give.

Looping Through Excel Columns (256 columns)

Hello All,

I have a problem and i wish i can get the answers or advices to solve it.

i have like 20 excel files and in each file there is 1 sheet (Planning) . What i need to do is to loop on the on the 20 files (actually this is the easy part and i already done it) the hard part is while looping i need to open each excel file and loop on the 256 columns in it and extract the data from it to a SQL server Database.

Any help will be alot appreciated.

Does each sheet have the same number of columns and the same column names?

|||

Hi, thank you for ur reply, yes the sheets are all the same, same columns name and same column data and same number of columns.

|||

Perfect, then inside your loop you'll need a data flow task. Set the source excel connection manager through a variable and send the data to your destination.

|||

thats if i wanted to extract the whole excel sheet, wht i want to do is looping on the columns(256), loop on each column and extract it where i want.

|||

I don't quite understand. There is an interface to map your source columns to your destination columns.

|||

i found it thank you alot Smile

sql

Friday, March 23, 2012

Looping problem in DTS with ActiveX

I have a DTS package that downloads files via FTP, then processes each file. A few tasks are done when each file is processed:
(a) a holding table is truncated and 1 blank record is inserted into the holding table,
(b) the XML data in the file is inserted into the holding table via TextCopy.exe,
(c) the XML data is parsed using OPENXML and inserted into 2 additional holding tables, and
(d) the XML file is archived to another directory.
After (a), (b), (c), and (d) are completed, the DTS package loops back and executes (a), (b), (c), and (d) for the other remaining files.

It all worked flawlessly in testing, until I commented out a MsgBox line in the ActiveX task for item (b) above. Without the MsgBox command, the other tasks (b) and (c) don't appear to execute, though I can see that the looping is working, since the source files get moved to the archive location (in step (d)).

Attached is a screenshot of the DTS package (it can also be viewed at http://www.nmwildlife.org/images/DTS_screenshot.gif).

I think that the MsgBox issue is a red herring; in other words, I'm thinking that when I click the OK button on the MsgBox, there might be something about the return code which allows the tasks to be executed properly. However, I'm not a VBScript expert, so can't figure out where the problem lies or how to fix it.

Here's the code for the "Import w/ShellCmd" ActiveX task:

Function Main()
Dim objShell
Dim strPath
Dim strCmd

strPath = CSTR(DTSGlobalVariables("gv_FileFullName").Value)

strCmd = """C:\Program Files\Microsoft SQL Server\MSSQL\Binn\TextCopy.exe"" /S ""GROVER"" /U sa /P """" /D TESTProlaw /T dc_XML /C myXML /F " & _
strPath & _
" /W ""WHERE 1=1"" /I /Z"

Set objShell = CreateObject("WScript.Shell")
objShell.Run strCmd
Set objShell = nothing

MsgBox ""

Main = DTSTaskExecResult_Success
End Function

And here's the code for the "Begin Loop" ActiveX task:

Option Explicit

Function Main()

dim pkg
dim stpEnterLoop
dim stpFinished

set pkg = DTSGlobalVariables.Parent
set stpEnterLoop = pkg.Steps("DTSStep_DTSExecuteSQLTask_2") 'Start loop at the "Truncate dc_XML" task
set stpFinished = pkg.Steps("DTSStep_DTSActiveScriptTask_5")

' We want to continue with the loop only of there are more than 1 text file in the directory.
' If the function ShouldILoop returns true then we disable the step that takes us out of the package and continue processing

if ShouldILoop = True then
stpEnterLoop.DisableStep = False
stpFinished.DisableStep = True
stpEnterLoop.ExecutionStatus = DTSStepExecStat_Waiting
else
stpEnterLoop.DisableStep =True
stpFinished.DisableStep = False
stpFinished.ExecutionStatus = DTSStepExecStat_Waiting
End if

Main = DTSTaskExecResult_Success
End Function

Function ShouldILoop

dim fso
dim fil
dim fold
dim pkg
dim counter

set pkg = DTSGlobalVariables.Parent
set fso = CREATEOBJECT("Scripting.FileSystemObject")

set fold = fso.GetFolder(DTSGlobalVariables("gv_FileLocation").Value)

counter = fold.files.count

'So long as there is more than 1 file carry on

if counter >= 1 then

for each fil in fold.Files
DTSGlobalVariables("gv_FileFullName").Value = fil.path
ShouldILoop = CBool(True)
Next

else
ShouldILoop = CBool(False)
End if

End Function

The goal is to get the DTS package to run without having to manually click OK on the MsgBox; that way, I can schedule it to run automatically.

Any help would be greatly appreciated. Thanks in advance!no, here the MsgBox is not returning anything. it should run perfectly without the MsgBox. i think it was used to give the shell the time to complete the process.
what u can try is - executing the shell in sync mode and drop the MsgBox. use this code and see if it works

....
Ret = objShell.Run (strCmd,,true)
.....sql

Looping over files not available in my ssis

I'm downloading zip'd files and would like to loop through each file that was downloaded. I'd also like to unizip each file and append all of them to one file. I have a dos batch that is fairly simple and would like to emulate it using ssis. Here is what the dos batch file looks like.

DATE /T >%TEMP%\D.txt

FOR /F "usebackq tokens=2,3,4 delims=/, " %%i IN ("%TEMP%\D.txt") DO SET fname=TAMF_162%%i%%j%%k-%%k.zip

ECHO xxx>zzzzz
ECHO xxxxx>>zzzzz
ECHO BINARY>>zzzzz
ECHO GET %fname%>>zzzzzz
ECHO QUIT>>zzzzz

FTP -s:zzzzzzz ftp.aaaaa.com

PKUNZIP -o -xxxxxxx downloadedfile_1~1.ZIP

DEL TAMF_1~1.ZIP
DEL zzzzzzzz

EXIT

I would just run my DOS batch file as an external process if the batch file does what you want it to do.|||

We could do that but then each time that this needs to be edited its two places instead of one. I'd rather convert it over to ssis.

Big problem with SSIS now. My looping container does not have the option to loop over files. Seems as if others are having the same problem http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=416437&SiteID=1&mode=1

|||Not sure why it would make for something that has to be edited in two places. You would only have the code that does the unzip in one bat file so if things needed to be edited you would only have to edit the bat file. Must be something I'm missing in what you are doing.|||Swells - I just want to convert the dos file to SSIS and schedule it in one package and I'd like to learn how to unzip and parse the files. As you can see from my other posts my ssis is somehow broken in that I do not have the option to loop through files. Seems like this may be an issue with the SP1.|||Regarding missing ForEach File enumerator - I've just had a chance to investigate a machine that has this problem. It turned out this is another symptom of the problem discussed in this KB:

http://support.microsoft.com/kb/913817|||

I'm coming up with FlashProp Class. How do I keep the dos window open. I'm not a console app programmer, sorry. And how do we know that this is the problem?

thanks

|||

Bogey1 wrote:

I'm coming up with FlashProp Class. How do I keep the dos window open. I'm not a console app programmer, sorry. And how do we know that this is the problem?

thanks

The application does not fix the registry settings, it just lists CLSIDs with incorrect permissions. Once they are identified, open RegEdit and find these CLSIDs under HKEY_LOCAL_MACHINE\Software\Classes\CLSID, and grant everyone read permissions to these keys. You may then close the application, and reboot. The problem should go away.

|||

Thats seems to not of worked for me. Is there a fix for this? I cannot have my developers run regedit and fix keys to get this working. Can MS provide a tool that will run and fix the keys if this is the problem? Anyone else running into this problem?

thanks Michael and keep me updated. We'd really like to use this on one of our projects here at the University.

|||The problem is caused by third-party installation (the cases I've seen were caused by Lexmark driver and Flash, there might be other applications incorrectly setting the permissions), so we don't know what keys these applications create and what are the appropriate permissions for them. So it would be risky to write an automated program to fix them. The program in KB article identifies the problematic keys, but you currently need to fix them in RegEdit.

Wednesday, March 21, 2012

Loop Through Flat Files Based On A Date Range

Hello,

I currently have a For Each File container that loops through all files from a specific directory. The files have a naming convention that looks like this;

CDNSC.CDNSC.SC00015.01012007

The last segment of the file name is the date of the data in the file (mmddyyyy). The create date for these files is always a day later than indicated in the file name.

What I would like to do is to have more control over the 'range' of files that are looped through by using the date portion of the file name to define what group of files should be looped through. Ideally, I would like to have a 'StartDate' variable and an 'EndDate' variable that I could define at run time for the package, and the package would loop through all of the files where the date portion of the file name fell between 'StartDate' and 'EndDate'.

Any ideas on this?

Thank you for your help!

cdun2

One way is to move the files to an archive directory when you're done with them. That way you only pick up the files you need to process. Just throwing that idea out there.|||Thanks for your response. One thing that I have to be prepared for is that I may have to 'reload' data. The date range for any reloading could be anything.|||Well, you could (and perhaps should?) stage the data from the flat files in a staging table. Then, once in the staging table you can keep n number of loads around, all marked by either the load date, or a unique load number. After n number of loads, you can delete the old data. Each row in the staging table would also contain the source file name.

Or, you could simply load a table with the load date and a filename. That way, you've got a table that tells you which files to grab for a given load.
Load File
1 xyz.txt
1 fhs.txt
2 jfb.txt

The above still assumes that you "archive" the files into a subdirectory or somewhere.|||The data from the flat files will wind up in one ore more sql server tables. One idea I was considering was to stored the file names in a table, and maybe parse the date portion of the file name into a column. Then somehow use that as a lookup table to determine which files would be processed.|||

cdun2 wrote:
The data from the flat files will wind up in one ore more sql server tables. One idea I was considering was to stored the file names in a table, and maybe parse the date portion of the file name into a column. Then somehow use that as a lookup table to determine which files would be processed.

Right, but the point is that your first data flow simply loads the data from the files AS IS into a table. Then in the data flow that you use today, you point them to the staging table instead of the flat file(s).|||Oh, I see. The data that I need to process is a subset, and will be aggregated for reporting. The concern that I have is that somewhere down the road, the reporting requirements may change, and the users may need another column from the data. I'm thinking that if that happens, I'll need to add the additional column to the subset table, and reload everything.|||

cdun2 wrote:
Oh, I see. The data that I need to process is a subset, and will be aggregated for reporting. The concern that I have is that somewhere down the road, the reporting requirements may change, and the users may need another column from the data. I'm thinking that if that happens, I'll need to add the additional column to the subset table, and reload everything.

You're going to have to add the column in the flat file source anyway, would be one argument.

Regardless, this is just one way to tackle this problem. There are other ways I'm sure.|||

Actually, the destination table will have just a subset of the columns that are found in the flat files. If a need is found for an addional column of data, it will already be in the flat files. I'll just need to add it to the destination table.

What kind of component could I use in a DataFlow Task to read filenames?

Thanks again.

|||

cdun2 wrote:
Actually, the destination table will have just a subset of the columns that are found in the flat files. If a need is found for an addional column of data, it will already be in the flat files. I'll just need to add it to the destination table.
What kind of component could I use in a DataFlow Task to read filenames?
Thanks again.

If this is the case, then the staging table example still works. In your CONTROL flow, you'd use a foreach loop to spin through the files. Each filename can be put in a variable, which can be used in a data flow inside the foreach loop. Then, using a derived column in the data flow, you'd have access to the variable that contains the current filename.|||

cdun2 wrote:

Hello,
I currently have a For Each File container that loops through all files from a specific directory. The files have a naming convention that looks like this;
CDNSC.CDNSC.SC00015.01012007
The last segment of the file name is the date of the data in the file (mmddyyyy). The create date for these files is always a day later than indicated in the file name.
What I would like to do is to have more control over the 'range' of files that are looped through by using the date portion of the file name to define what group of files should be looped through. Ideally, I would like to have a 'StartDate' variable and an 'EndDate' variable that I could define at run time for the package, and the package would loop through all of the files where the date portion of the file name fell between 'StartDate' and 'EndDate'.
Any ideas on this?
Thank you for your help!
cdun2

Cdun2,

This is an alternative approach to the issue of processing only the files that are within a start/end date range. It uses Expression and precedence constraints:

http://rafael-salas.blogspot.com/2007/02/ssis-loop-through-files-in-date-range.html

BTW,

Just to save extra work down the road; i would recomment to import all the columns of the files from the begining.

|||Thanks again for your input. I'll dig a little deeper into this next week. As far as bringing in all of the columns of data, the attempt is to aggregate so that we can meet a reporting need, and reduce the number of rows of data that we need to store. Its a resource/performance issue for us. I can't bring in any more than I need.|||

cdun2 wrote:
I can't bring in any more than I need.

But you can. To Rafael's point, you can at least have ALL of the flat file columns mapped in the connection manager. You don't have to do anything with them in the data flow, but at least they are there if you need to pick them up later; you won't have to redefine the connection manager.|||

Phil Brammer wrote:

cdun2 wrote:
I can't bring in any more than I need.
But you can. To Rafael's point, you can at least have ALL of the flat file columns mapped in the connection manager. You don't have to do anything with them in the data flow, but at least they are there if you need to pick them up later; you won't have to redefine the connection manager.

Yes, that makes sense. Thanks again.

|||

I took a look at the example here; http://rafael-salas.blogspot.com/2007/02/ssis-loop-through-files-in-date-range.html

Thank you for providing this example. I am having a difficult time making the bridge between the precendent constraint, and the Data Flow Task where the content of the qualifying flat files are inserted into the SQL Server table.

loop through certain files

Using SSIS foreach loop, I am looping through files to get the filenames and pass them on as variable.
How is it possible to loop through the files but only pass the files that do not end with ..._Parameters.xml? or loop through only the ones which end with ..._Parameters.xml
I think this is to do with the scripting or expression in the foreachloop?
ThanksYou could use a script task to determine which path to take (i.e. is _Parameters.xml or isNot _Paremeters.xml). One way you could do this would be by using the "Fail" path if the task if it is _Parameters.xml, and otherwise use the sucess path. I'm not sure if this is the best way to go about this or not though... (You would need to make sure that this task would not cause the container to fail when it does)|||Can't you set the files property (available in the dialog) to "*_Parameters.xml" to get the ones that do match?|||

Good thinking.

Done that now.

Thanks

sql

Loop container to process all excel files

What I'm trying to achieve is a SSIS package that will pick up 1 or more excel files, process the data in them via the conditional splitter, pushing the good data into a table, and all other rows into an error table.
I'm having some issues using the ForEach container to process multiple excel spreadsheets into tables. The excel import into the tables is more or less working (imports data for good cases, but uses a null if the Excel Source when it gets an unexpected value - but that's a seperate problem).
I found something related to this when searching, but it related to CTPs (June and September) and trying to reuse the connection strings they built up there (using my own variable names, naturally) causes a 'Property Value failure':
--
The connection string format is not valid. It must consist of one or more components of the form X=Y, separated by semicolons. This error occurs when a connection string with zero components is set on database connection manager.
--
I attemtpted to use this:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::RankingFileFullPath] + ";Extended Properties=\"Excel 8.0;HDR=YES\";"
The excel importer works fine as a stand-alone component. Trying to use the process defined in 'Profession SQL Server Integration Services' pp140, I tried to use an expression to assign the variable value to the connection string. I get a validation error:
--
Error at Import TPNB Ranking Excel spreadsheets [Excel Source [1]]: The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009.
Error at Import TPNB Ranking Excel spreadsheets [DTS.Pipeline]: component "Excel Source" (1) failed validation and returned error code 0xC020801C.
Error at Import TPNB Ranking Excel spreadsheets [DTS.Pipeline]: One or more component failed validation.
Error at Import TPNB Ranking Excel spreadsheets: There were errors during task validation.
Error at Excel Importer [Connection manager "Excel Connection Manager"]: An OLE DB error has occurred. Error code: 0x80040E4D.
--
Any advice?
....
.... in addition ....
I attempted to change the package - I set the Data Flow validation to Delay Validation, and changed the expression to change from:
ConnectionString @.[User::RankingFileFullPath]
to
ExcelFilePath @.[User::RankingFileFullPath]
This allowed the package to start debugging, and gave more information in the failure:
--
SSIS package "Excel Importer.dtsx" starting.
SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Excel Importer' has been hit
SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Foreach Loop Container' has been hit
SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit
Information: 0x4004300A at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: Validation phase is beginning.
Warning: 0x802092A7 at Import TPNB Ranking Excel spreadsheets, ProductSalesRank Table [278]: Truncation may occur due to inserting data from data flow column "Rank" with a length of 1000 to database column "SalesRank" with a length of 50.
Error: 0xC0202009 at Excel Importer, Connection manager "Excel Connection Manager": An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft JET Database Engine" Hresult: 0x80004005 Description: "Unrecognized database format 'D:\Testing\TestRanking.xls'.".
Error: 0xC020801C at Import TPNB Ranking Excel spreadsheets, Excel Source [1]: The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009.
Error: 0xC0047017 at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: component "Excel Source" (1) failed validation and returned error code 0xC020801C.
Error: 0xC004700C at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: One or more component failed validation.
Error: 0xC0024107 at Import TPNB Ranking Excel spreadsheets: There were errors during task validation.
SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit
Warning: 0x80019002 at Foreach Loop Container: The Execution method succeeded, but the number of errors raised (5) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit
SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit
SSIS breakpoint 'Break when the container receives the OnWarning event' at executable 'Excel Importer' has been hit
Warning: 0x80019002 at Excel Importer: The Execution method succeeded, but the number of errors raised (5) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS breakpoint 'Break when the container receives the OnPostExecute event' at executable 'Excel Importer' has been hit
SSIS package "Excel Importer.dtsx" finished: Failure.
The program '[2460] Excel Importer.dtsx: DTS' has exited with code 0 (0x0).--
Apparently, I added too much detail, and poeple are avoiding the question :-)
I've tested doing exactly the same thing, but using CSV's as the source (i.e. text files and not excel files). And it works fine. So there's nothing wrong with the environment, DB etc.
...aaarg|||

I do recall other people having similar problems with Excel but I don't know if there was a resolution.

Sorry...not much help I'm afraid.

-Jamie

|||Thanks for the response - at least I have had one, now

The only reference's I've found pointed me to (a) re-installing service pack 1 to ensure SQL and SSIS etc. are all good. I did that, but no go. The other reference seemed to say I need to write an activeX (or .Net script) component to handle the files. That's just got to be wrong - what's the point of having to continuously kludge different sections of SSIS together with .Net scripting... In that case it seems that we should rather do the whole thing in 'Net ... *sighs* ...
I guess I'm just a bit frustrated that something as simple as importing a few excel files into a SQL table doesn't work simply ... after all the forEach and loop controls was one of the NEW BIG THINGS that SSIS was bringing us.... oh well, it's getting later on a Friday, and it looks like it's going to be a long, frustrating weekend (... it might be better if I knew how to write the .Net

)
Well, I hope you (and anyone else reading my ranting) has a good weekend.
|||

I was able to complete your scenario successfully following the instructions at: http://msdn2.microsoft.com/en-us/library/ms345182.aspx

Note that I paid particular attention to this paragraph:

To avoid validation errors later as you configure tasks and data flow components to use this connection manager, assign a default Excel workbook in the Excel Connection Manager Editor. After you create and configure the package, you can delete this value in the Properties window. However, after you delete this value, a validation error may occur because the connection string property of the Excel connection manager is no longer valid until the Foreach Loop runs. In this case, set the DelayValidation property to True on the connection manager, on the tasks in which the connection manager is used, or on the package.

If I did not have DelayValidation set True on the connection manager and have a valid default value in the connection manager I received the same errors as you. I also set DelayedValidation = True on the Data Flow Task for extra caution.

Donald

|||

Hi Donald,

Thanks for the link and the reponse ... I appreciate it.

I'll have a look - the article loks promising. I did already have all my 'delay validations' set to true, and I initially created the excel connection with a valid connection - I had it pointing at one of the file I want to impoty. When it is hard-coded to the excel file, my import runs fine... as soon as I delete the connection name, and replace it witht eh variable name @.[User::RankingFileFullPath], mr problems occur. I'll read through the article, and create a new package, just to ensure I don't inherit some issues I created in my attempts to appease the SSIS connections gods... :-)

thanks again

|||

I followed the article's guidelines, but am having an issue.

The sequence of actions:

Create New Project.

Create ForEach container.

Create Variables ExcelFileFullPath and ExcelConnectionProperties. Set the Container to map the ExcelFileFullPath.

Create Excel Connection, pointing to valid, existing excel file.

Alter Excel connection properties: Delay Validation = true. Expression: ConnectionString:

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\""

All good, so far.

Create a data flow control in the container. Add an excel source. Try connecting excel source to excel connection - error: Initial error was an Hresult type error (not captured). Trying to go further, I now get 'Invlaid Arguement' when trying to choose name of excel sheet - this happens on new excel source as well.

I'm going to start again, this time creating an excel import that works correctly. Once that's done, I'll try and move it into the container, and apply the properties as described in article, and see if that makes a difference (surely it shouldn't?)....

|||

Status:

I have created a Data Flow Task that opens a HardCoded (or direct) Excel Connection, does a data conversion, then a conditional split, and inserts rows into a OLE DB table connection. I have a ForEach container that correctly populates the excel file path into the ExcelFullFilePath variable. The Data flow task is OUTSIDE the container. All runs through successfully.

I intend to move the data flow task into the Container, but not change the excel connection, and confirm that it runs successfully. Assumming success, I will then apply the process from the article, specifically:

Set Delay Validation = True on the Package, Containter, Data Flow and Excel Connection.

Set the ConnectionString (via expressions) =

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\""

I'll respond on progress/results

-- so far:

Run with Data Flow in ForEach container - still runs through, USING Hard-coded connection

Change Delay Validation properties on all objects - still runs through, USING Hard-coded connection

Change ConnectionString property to use dynamic ("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\"") - SUCCESS.

Add additional .xls file to folder for processing - failure. BUT that was (due to a GOOD, DESCRIPTIVE ERROR MESSAGE) because the connection requires the same sheet name for the 2 excel files. I can live with that.

Update 2nd test file sheet name, re-run... SUCCESS. 2 Excel file successfully imported.

Many thanks for the response, especially yours, Donald.

I still don't understand why I had to go through the sequence I had to, i.e. create a fully working excel import, and then alter the properties, but it worked. Also, a useful note for those doing this in the future - make sure the sheets for the multiple files have the same sheet names.

HTH and thanks

|||

Thanks for these detailed notes. They will help us to improve this experience in the future.

Donald

|||Wow, thanks GethWho. I've been looking for this answer all day!

Everyone looking at this make sure the set DelayValidation to true on ALL of your components!

-Matt

Loop container to process all excel files

What I'm trying to achieve is a SSIS package that will pick up 1 or

more excel files, process the data in them via the conditional

splitter, pushing the good data into a table, and all other rows into

an error table.

I'm having some issues using the ForEach container to process multiple

excel spreadsheets into tables. The excel import into the tables is

more or less working (imports data for good cases, but uses a null if

the Excel Source when it gets an unexpected value - but that's a

seperate problem).

I found something related to this when searching, but it related to

CTPs (June and September) and trying to reuse the connection strings

they built up there (using my own variable names, naturally) causes a

'Property Value failure':

The connection string format is not valid. It must consist of one or

more components of the form X=Y, separated by semicolons. This error

occurs when a connection string with zero components is set on database

connection manager.

I attemtpted to use this:

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +

@.[User::RankingFileFullPath] + ";Extended Properties=\"Excel

8.0;HDR=YES\";"

The excel importer works fine as a stand-alone component. Trying to use

the process defined in 'Profession SQL Server Integration Services'

pp140, I tried to use an expression to assign the variable value to the

connection string. I get a validation error:

Error at Import TPNB Ranking Excel spreadsheets [Excel Source [1]]: The

AcquireConnection method call to the connection manager "Excel

Connection Manager" failed with error code 0xC0202009.

Error at Import TPNB Ranking Excel spreadsheets [DTS.Pipeline]:

component "Excel Source" (1) failed validation and returned error code

0xC020801C.

Error at Import TPNB Ranking Excel spreadsheets [DTS.Pipeline]: One or more component failed validation.

Error at Import TPNB Ranking Excel spreadsheets: There were errors during task validation.

Error at Excel Importer [Connection manager "Excel Connection Manager"]: An OLE DB error has occurred. Error code: 0x80040E4D.

Any advice?

....

.... in addition ....

I attempted to change the package - I set the Data Flow validation to

Delay Validation, and changed the expression to change from:

ConnectionString @.[User::RankingFileFullPath]

ExcelFilePath @.[User::RankingFileFullPath]

This allowed the package to start debugging, and gave more information in the failure:

SSIS package "Excel Importer.dtsx" starting.

SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Excel Importer' has been hit

SSIS breakpoint 'Break when the container receives the OnPreExecute event' at executable 'Foreach Loop Container' has been hit

SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit

Information: 0x4004300A at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: Validation phase is beginning.

Warning: 0x802092A7 at Import TPNB Ranking Excel spreadsheets,

ProductSalesRank Table [278]: Truncation may occur due to inserting

data from data flow column "Rank" with a length of 1000 to database

column "SalesRank" with a length of 50.

Error: 0xC0202009 at Excel Importer, Connection manager "Excel

Connection Manager": An OLE DB error has occurred. Error code:

0x80004005.

An OLE DB record is available. Source: "Microsoft JET Database

Engine" Hresult: 0x80004005 Description: "Unrecognized

database format 'D:\Testing\TestRanking.xls'.".

Error: 0xC020801C at Import TPNB Ranking Excel spreadsheets, Excel

Source [1]: The AcquireConnection method call to the connection manager

"Excel Connection Manager" failed with error code 0xC0202009.

Error: 0xC0047017 at Import TPNB Ranking Excel spreadsheets,

DTS.Pipeline: component "Excel Source" (1) failed validation and

returned error code 0xC020801C.

Error: 0xC004700C at Import TPNB Ranking Excel spreadsheets, DTS.Pipeline: One or more component failed validation.

Error: 0xC0024107 at Import TPNB Ranking Excel spreadsheets: There were errors during task validation.

SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit

Warning: 0x80019002 at Foreach Loop Container: The Execution method

succeeded, but the number of errors raised (5) reached the maximum

allowed (1); resulting in failure. This occurs when the number of

errors reaches the number specified in MaximumErrorCount. Change the

MaximumErrorCount or fix the errors.

SSIS breakpoint 'Break when the container receives the OnQueryCancel event' at executable 'Excel Importer' has been hit

SSIS breakpoint 'Break when the container receives the OnWarning event' at executable 'Excel Importer' has been hit

Warning: 0x80019002 at Excel Importer: The Execution method succeeded,

but the number of errors raised (5) reached the maximum allowed (1);

resulting in failure. This occurs when the number of errors reaches the

number specified in MaximumErrorCount. Change the MaximumErrorCount or

fix the errors.

SSIS breakpoint 'Break when the container receives the OnPostExecute event' at executable 'Excel Importer' has been hit

SSIS package "Excel Importer.dtsx" finished: Failure.

The program '[2460] Excel Importer.dtsx: DTS' has exited with code 0

(0x0).--Apparently, I added too much detail, and poeple are avoiding the question :-)

I've tested doing exactly the same thing, but using CSV's as the source

(i.e. text files and not excel files). And it works fine. So there's

nothing wrong with the environment, DB etc.

...aaarg|||

I do recall other people having similar problems with Excel but I don't know if there was a resolution.

Sorry...not much help I'm afraid.

-Jamie

|||Thanks for the response - at least I have had one, now

The only reference's I've found pointed me to (a) re-installing service

pack 1 to ensure SQL and SSIS etc. are all good. I did that, but no go.

The other reference seemed to say I need to write an activeX (or .Net

script) component to handle the files. That's just got to be wrong -

what's the point of having to continuously kludge different sections of

SSIS together with .Net scripting... In that case it seems that we

should rather do the whole thing in 'Net ... *sighs* ...

I guess I'm just a bit frustrated that something as simple as importing

a few excel files into a SQL table doesn't work simply ... after all

the forEach and loop controls was one of the NEW BIG THINGS that SSIS

was bringing us.... oh well, it's getting later on a Friday, and it

looks like it's going to be a long, frustrating weekend (... it might

be better if I knew how to write the .Net )

Well, I hope you (and anyone else reading my ranting) has a good weekend.|||

I was able to complete your scenario successfully following the instructions at: http://msdn2.microsoft.com/en-us/library/ms345182.aspx

Note that I paid particular attention to this paragraph:

To avoid validation errors later as you configure tasks and data flow components to use this connection manager, assign a default Excel workbook in the Excel Connection Manager Editor. After you create and configure the package, you can delete this value in the Properties window. However, after you delete this value, a validation error may occur because the connection string property of the Excel connection manager is no longer valid until the Foreach Loop runs. In this case, set the DelayValidation property to True on the connection manager, on the tasks in which the connection manager is used, or on the package.

Donald

|||

Hi Donald,

Thanks for the link and the reponse ... I appreciate it.

thanks again

|||

I followed the article's guidelines, but am having an issue.

The sequence of actions:

Create New Project.

Create ForEach container.

Create Variables ExcelFileFullPath and ExcelConnectionProperties. Set the Container to map the ExcelFileFullPath.

Create Excel Connection, pointing to valid, existing excel file.

Alter Excel connection properties: Delay Validation = true. Expression: ConnectionString:

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\""

All good, so far.

|||

Status:

Set Delay Validation = True on the Package, Containter, Data Flow and Excel Connection.

Set the ConnectionString (via expressions) =

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @.[User::ExcelFilefullPath] + ";Extended Properties=\"" + @.[User::ExcelConnectionProperties] + "\""

I'll respond on progress/results

-- so far:

Run with Data Flow in ForEach container - still runs through, USING Hard-coded connection

Change Delay Validation properties on all objects - still runs through, USING Hard-coded connection

Update 2nd test file sheet name, re-run... SUCCESS. 2 Excel file successfully imported.

Many thanks for the response, especially yours, Donald.

HTH and thanks

|||

Thanks for these detailed notes. They will help us to improve this experience in the future.

Donald

|||Wow, thanks GethWho. I've been looking for this answer all day!

Everyone looking at this make sure the set DelayValidation to true on ALL of your components!

-Matt

Friday, March 9, 2012

looking into rdl file

Some business analysts would like to reverse engineer the ssrs .rdl files by looking into the field mappings and data sources. Do they have to install the client version of visual studio or there is an easier way?

TIA..

You can open the RDL files in Notepad and see that they are written in XML. You can find the query run by looking for the <CommandText> tag, and each field is listed under a <ReportItems> tag. The data source information is also listed in there, under the <ConnectionProperties> tag.

Hope this helps.

Jarret

Wednesday, March 7, 2012

Looking for suggestion of how to manage a DB file when it getting too big.

Hi all,

At our facility, we have a > 350 gigs database and the DB is breaking into 2 files. One file is our Image table with the size of 300 gigs. The other file contains all of the other tables. The image file is getting bigger and bigger everyday, I am worry that eventually when a data file getting too big it could lead into problem. - DB fragmentatio, hard to manage etc

My question is anyone has run into this problem, and how do you solve it?

I am open for any suggestion.

BTW: I am using SQL 2000.

Thanks in advance.

Use filegroups, Assign the frequently modified tables in one filegroup while those that don't get modified frequently be assigned in another filegroup. When you create filegroups, you also modify your backup strategies.|||

Since your basic problem is with growth of your images, you need to come up with a mechanism for managing those.

If this is SQL Server 2005, then I would implement partitioning against that table on your identifier. That will allow you to split the table across multiple filegroups and files. If this is SQL Server 2000, then you can add multiple files to the filegroup that this table resides on (the files don't necessarily need to be in the same directory or even drive).

It doesn't really matter that they are images or discrete data, managing them occurs using the same process. You need to rely on one or more filegroups each with one or more files in them.

|||

Thanks all for your suggestions. The main problem we are having is we have one huge table (300 gigs) reside in one file group and spliting a table across multiple files is not an option for us since we are using SQL2k, (Might have to wait until we upgrade to 2005) but I'm going to use your suggestions with other databases

Thanks

|||I don't understand why not. I am NOT suggesting splitting across multiple fileGROUPS, because that would require an upgrade to 2005. I am suggesting adding FILES to the existing filegroup. This is absolutely supported and possible in SQL Server 2000. The data in that single table would then be written across multiple FILES within a single filegroup and would at least give you options on managing the storage space.|||

Hmm, I obviously didn't think of this possibility. Data could be written across multiple files! it makes perfect sense now. damm, how could I not see this :-(- sorry for misunderstood your comment and thanks for enlightent me, Michael.

Appreciated.

|||Hello Michael Hotek,

I had gone through the post. But did not understood how it will help the issue.
I have one similar problem here. We aer using MS SQL 2000, where there is only one table in the database where scanned document images are storing and database is growing by 2 to 3 GB per day. Currently it is upto 300 GB. For safety reason we are planning to partition/split the data in the table according to one key field 'book_no' so that there will be seperate partitions for each book_no. At the end we need to combine all together to form the original table.
Please guide/suggest me how to split the table with the query like 'SELECT * FROM doc_master WHERE book_no=1' and split the entire table for seperate book_no's and later combine them together to form the original table?

Right now I am planning like this: create demmy databases for each book_no's with same structure as doc_master and use "INSERT...SELECT...WHERE book_no=?' statement to fill these databases and delete them from original database. Whether it will ok with out harming data? Please suggest.|||

Rajeev,

My problem is I can't reduce the size of my image db, I can't archive (at least in SQL 2000) I can't partition my table either and I don't want my data file growing too big. So spliting a single data file of a table into multiple data files (like what Michael suggested) is a way to go. I don't think this way will gain much in performance or anything but having a smaller data files would help me to easily manage files. I also store these smaller data files into separate drives etc. so the sugesstion did help me in the way of managing the physicall data file.

I think what you are trying to do is separate data into separate databases, in my opinion, this could lead into more complication since now you have to manage multiple databases, mutiple db permissions, db maintenances and it might also hurting your application performance etc. in the end you are not gaining much.

|||

You can add data files to a SQL 2000 database, within the same filegroup.

What this will accomplish is that the new data inserted into your table will start spreading to both files. It will not automatically rebalance existing data between files.

Have you experienced specific problems as a result of having a large table, or are you just anticipating that you might?

What is the storage that this database is on?

Looking for suggestion of how to manage a DB file when it getting too big.

Hi all,

My question is anyone has run into this problem, and how do you solve it?

I am open for any suggestion.

BTW: I am using SQL 2000.

Thanks in advance.

Since your basic problem is with growth of your images, you need to come up with a mechanism for managing those.

It doesn't really matter that they are images or discrete data, managing them occurs using the same process. You need to rely on one or more filegroups each with one or more files in them.

|||

Thanks

Appreciated.

|||Hello Michael Hotek,

Rajeev,

|||

You can add data files to a SQL 2000 database, within the same filegroup.

What this will accomplish is that the new data inserted into your table will start spreading to both files. It will not automatically rebalance existing data between files.

Have you experienced specific problems as a result of having a large table, or are you just anticipating that you might?

What is the storage that this database is on?

Saturday, February 25, 2012

Looking for some advice.....

I have a couple of files that I have that are comma seperated, and am looking for the best way to take those files, but them in a temp location to run some sql up against, and than update or insert a production sql database based on a SP i have written that takes a number of variable. I have played around with using the recordset destination and defining all the variables and than using a for each loop but seem to be stuck. Somewhat new to the whole programming field, and to be honest, seems a little intimidating with the little I know of the programming side. Just looking for some thoughts.You could use a data flow to read the flat file, and use an OLEDB Destination to save it to a "temp" table. Use an Execute SQL to update the temp table, then use a second data flow to retrieve the data and send it to an OLE DB Command to call your stored procedure.

Looking for Lab files from class

I recently took a class on SQL 2005. The classwork had lab files but I did not manage to hang on to the files. Are those files available for DL from MS?

I am specifically looking for the files from Module 7 "Planning for Source Control, Unit Testing, and Deployment"

TIA

I am certainly interested in hearing what the MS policy is with respect to this type of question; looks interesting.

I have placed a note in the REPORTING forum so that I can get an answer as to how this needs to get handled.

|||

My experience as a trainer (MCT), is that it has been a longstanding situation where class materials are available to class participants only. And lab files, etc., are on the CDs that the class participant is provided along with the training manual(s). And that Microsoft will not replace materials to any 'class participants'.

Your best option may be to contact the class provider and see if they will replace your 'lost' class materials. Someo of the training providers are a bit more helpful in this way than others. I don't think that you will get anywhere with Microsoft.

You can sign up for, and use course 2781 modules on-line at:

https://www.microsoft.com/learning.com/

Friday, March 30, 2012

Losing rows from file to destination table - need troubleshooting help

Wednesday, March 28, 2012

Looping thru time..

Looping thru time..

Monday, March 26, 2012

Looping through several Excel data sources in SSIS

Looping Through Excel Columns (256 columns)

Friday, March 23, 2012

Looping problem in DTS with ActiveX

Looping over files not available in my ssis

Wednesday, March 21, 2012

Loop Through Flat Files Based On A Date Range

loop through certain files

Loop container to process all excel files

Loop container to process all excel files

Friday, March 9, 2012

looking into rdl file

Wednesday, March 7, 2012

Looking for suggestion of how to manage a DB file when it getting too big.

Looking for suggestion of how to manage a DB file when it getting too big.

Saturday, February 25, 2012

Looking for some advice.....

Looking for Lab files from class

query equivalent

Blog Archive

About Me