[AccessD] Infutor Statistics - was RE: [dba-SQLServer]Bulkinsert

JWColby jwcolby at colbyconsulting.com
Wed May 9 16:48:26 CDT 2007


Yep, that's figured out.  Thanks. 


John W. Colby
Colby Consulting
www.ColbyConsulting.com

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Shamil
Salakhetdinov
Sent: Wednesday, May 09, 2007 5:33 PM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Infutor Statistics - was RE:
[dba-SQLServer]Bulkinsert

Hello John,

VB.NET 2005 "mimics" all(/most of) usual VB6/VBA functions, which you can
use. Trim(...) is in Microsoft.VisualBasic.Strings namespace, which is
always available for VB.NET projects therefore you don't even need to use
Imports directive or full namespace qualification with Trim(...) function:

Dim s As String = "  test string "
Console.WriteLine("[{0}]",Trim(s))

Results in:  

[test string]

Equivalent native VB.NET function is just:

Console.WriteLine("[{0}]", s.Trim())

--
Shamil
 

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of JWColby
Sent: Thursday, May 10, 2007 1:00 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Infutor Statistics - was RE:
[dba-SQLServer]Bulkinsert

Whadayathinkimstupid?

However... Once I am done with my import widget I will do these fixed price,
and make a killing.

John W. Colby
Colby Consulting
www.ColbyConsulting.com

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of William Hindman
Sent: Wednesday, May 09, 2007 4:51 PM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] Infutor Statistics - was RE: [dba-SQLServer]
Bulkinsert

...go ahead ...tell us you did this fixed price :)

William Hindman

----- Original Message -----
From: "JWColby" <jwcolby at colbyconsulting.com>
To: <dba-sqlserver at databaseadvisors.com>; "'Access Developers discussion and
problem solving'" <accessd at databaseadvisors.com>
Sent: Wednesday, May 09, 2007 1:01 PM
Subject: [AccessD] Infutor Statistics - was RE: [dba-SQLServer] Bulk insert


> Just an FYI.  The table that I have been building this whole time 
> contains
> 97.5 million records, exactly 149 (imported) fields and requires 62.6 
> Gigabytes of data space inside of SQL Server. It took 2 hours and 28 
> minutes just to build the auto increment PK field after the table was 
> finished importing records.  The index space for the table (with just 
> this single
> index) is 101 Megabytes.
>
> There were 56 raw data files which required 75 gigabytes of disk space 
> to hold.  There were 56 CSV files created after stripping out the 
> spaces, which required 40.8 Gigabytes of disk space to hold.  Thus by 
> my calculations,
> 35
> gigs of disk space was required to hold JUST THE SPACES in the 
> original fixed width file, with the real data occupying 40.8 GB.  It 
> is interesting to note that the raw data in the CSV file was 41gb 
> while the data space required in SQL Server is 62 gb.
>
> As the process was built over time, I do not have accurate specs for 
> each and every file, but the process of stripping the spaces off of 
> the fields happened at about 1K records / second.  Given 97.5 million 
> records, this equates to 97.5 thousand seconds to do the space 
> stripping, which is 27.77 hours.  That of course is done in a VBA 
> application.  Again I don't have accurate specs for all of the bulk 
> inserts, however those that I recorded the times for summed to 71.2 
> million records, which took 4674 seconds (1.3
> hours) to import using a BULK INSERT statement, which equates to 
> approximately 15K records / second.  Remember that this BULK INSERT is 
> importing precleaned data with pipe delimiters.  Also remember that 
> the BULK INSERT itself took 1.3 hours but due to the lack of 
> automation in feeding the Sproc file names, I had to manually edit the 
> SPROC each time I wanted to import a new file so the actual import 
> took much longer, since I wasn't necessarily watching the computer as 
> the last SPROC run finished.
>
> So there you go, that is what I have been trying to accomplish this 
> last few weeks.
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com
>
> -----Original Message-----
> From: dba-sqlserver-bounces at databaseadvisors.com
> [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of 
> JWColby
> Sent: Wednesday, May 09, 2007 11:02 AM
> To: dba-sqlserver at databaseadvisors.com
> Subject: Re: [dba-SQLServer] Bulk insert
>
> Robert,
>
> That is much appreciated.  Why don't you work with me instead?  I am 
> all for doing it "in SQL Server" (I own and use SQL Server 2005) but 
> it has to be something that I understand and can modify or I will not 
> be able to use it in the end, which would make your efforts wasted.
>
> My high level spec:
>
> 1) Use a "from directory" which is monitored to pick up files from.
> 1a) The from directory changes for each import data set.
> 2) Each file in the "from directory" will contain identical formatted 
> data, but the number of records in the file may vary.
> 3) Each file will be defined by an import spec table, which contains 
> Field Name, Field Width and data type.  If no data type is specified, 
> then
> VarChar() is used.  The size of the VarChar() field is determined by 
> the spec table.  This allows me to only have to spec data types (in
> advance) that I know is not VarChar(), but allows me to spec as many 
> as I need in advance of the import.
> 4) Once the table is built and populated, a new field called PKID 
> needs to be built.  PKID will be INT (32 bit) PRIMARY KEY, IDENTITY.
>
> Understand that at this time I actually have a functioning system.  It 
> takes a pair of tables which specify the file info (from / to 
> directories, field delimiter etc) and the field info (field name / 
> start position in the fixed width data / field length).  This program 
> (written in VBA) does a conversion from fixed width to a pipe 
> delimited "CSV" file, reading a line, stripping off the spaces, and 
> writing the stripped data lines back out to a CSV file in the "TO 
> Directory", complete with the first line containing field names.
> At the moment I have a hand constructed table in a hand constructed 
> database, which is created by an initial use of the wizard from inside 
> of SQL Server, pulling in the first CSV files created by my program.
> Once that table is created, I use a hand created BULK INSERT Sproc to 
> import the CSV files.  Once the table is fully populated with the 
> contents of all the files, I hand build an ALTER TABLE query to build 
> a PKID INT PRIMARY KEY, IDENTITY.
>
> As you can see, anywhere you see "hand created", that is an area that 
> needs to be automated.  My thoughts are that creating the table 
> initially will be relatively easy, and in fact I know how, building up 
> a make table query with the field names and widths taken from the spec 
> table.  I just did not do that because I did not have the time.  The 
> next issue is dynamically creating the Sproc that does the Bulk 
> Insert.
>
> Now, as to whether the process of importing the data (inside of SQL
> Server)
> strips off the spaces is really somewhat irrelevant at this point 
> since I have working code to do this.  It is not blazing fast at about 
> 1000 lines / second (for 150 fields) but it is "fast enough".  If I 
> port that to VB.Net I hope / expect to get a speed increase.  The BULK 
> INSERT SProc that I hand build is currently running about 12K records 
> / sec (for 150 fields)
>
> In the end, this really needs to be an external application driving 
> SQL Server functionality.  I need a place to go to fill in the import 
> spec table, set the from / to directories, set up the name of the 
> table
etc.
> My
> heartache to this point has been the inability to get the SQL Server 
> built-in import stuff to import the data without the external strip / 
> rebuild step, or at least without an elaborate dance to get around any 
> limitations of SQL Server to do that stuff for me.
>
> The very next thing I have to do is start exporting just the name / 
> address (and PK) of this 100 million record table back out for CASS / 
> NCOA processing.  This requires exporting 2 million records at a time, 
> to a destination directory, with a unique file name (Infutor01.csv, 
> Infutor02.csv...).  Once my CASS program finishes processing I will 
> have a new set of files in yet another directory that I need to pull 
> back in to SQL Server.  Those files will not require the space 
> stripping piece since they will not be fixed width.
>
> I do appreciate all of the advice from all the folks out there that 
> have contributed.  I am slowly but surely learning the pieces and 
> parts that I need to do this part of my job.
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com
>
> -----Original Message-----
> From: dba-sqlserver-bounces at databaseadvisors.com
> [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Robert L.
> Stewart
> Sent: Wednesday, May 09, 2007 9:31 AM
> To: dba-sqlserver at databaseadvisors.com
> Subject: Re: [dba-SQLServer] Bulk insert
>
> John,
>
> I am working on an example of doing this with SQL Server for you.  
> But, since I have the same amount of free time as you do, it is going 
> to take a week or so to complete.  I am using the same concept as you 
> did with a couple of table to hold the Import spec and the column 
> definitions for the spec. I am going to only build it for one table to 
> show how it can be done.
> The rest will be up to you if you want to expand it.
>
> Robert
>
> At 09:27 PM 5/8/2007, you wrote:
>>Date: Tue, 8 May 2007 13:40:59 -0400
>>From: "JWColby" <jwcolby at colbyconsulting.com>
>>Subject: Re: [dba-SQLServer] Bulk insert
>>To: <dba-sqlserver at databaseadvisors.com>
>>Message-ID: <00d901c79198$0a703210$657aa8c0 at m6805>
>>Content-Type: text/plain;       charset="us-ascii"
>>
>>Jim,
>>
>>At this point it is not necessary.  I built a preprocessor in a few 
>>hours using my toy (Access).  My toy application handles everything 
>>exactly as described.  Someday (soon I hope) I will port that to 
>>VB.Net which I hope will be much quicker in the preprocessing department.
>>Then I will be considered by some as being a real man, playing with 
>>real tools.  ;-) Others will still consider me a child, playing with 
>>toys because I didn't take it straight to C#.  SOMEDAY (far in the
>>future) perhaps I will embed those pieces directly in CLR programming 
>>inside of SQL Server 2005.  Then I will be able to look down my nose 
>>at
> those children still playing with toys.
>>
>>For now, it works and with the addition of driving the Sproc from the 
>>vba will be an integrated application like what I described.
>>
>>John W. Colby
>
>
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
>
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
>
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
> 



--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com




More information about the AccessD mailing list