David McAfee
davidmcafee at gmail.com
Mon May 12 12:29:33 CDT 2014
I thought by choosing the "Fast Load" Data access mode option, that I was using the OpenRowSet method of loading data. I tried using the insert (and variations of it) in the links that you provided, but I couldn't get them to work. I'm thinking there are too many issues between all of dynamic connection strings, databases, variables and missing/included columns. This isn't (yet) a mission critical component. It's more of a proof of concept. A little more of a back story: Each week a coworker downloads a couple of MDBs from LA County's FTP site. They run some existing action queries and create a couple of "clean" MDBs that other people use to query. The trouble is, the data is spread among many different databases. I decided to bring in the one "main" (huge) table into a test SQL database. I let it run all weekend, and have brought in all of the records from 3/2006 through the first week of 1/2014. Currently, if I use sp_spaceused 'TableNameHere' I get the following: 26,627,542 rows 51,946,048 KB reserved 24,125,192 KB data (24GB for one F'n table...woah!) 27,812,840 KB index_size 8016 KB unused This is definitely the biggest table that I've ever worked with! :P I only bring in New or changed records from the load table. I wrote a view that gives a flattened identical look and feel of the large table, but only showing the latest status for each "claim". That view takes 9:48 to run and returns 3,029,983 rows. I'm thinking that I should make a table from this view and have people query against this table, rather than the large table. What do you thing? Am I going about this the wrong way? There are no daily inserts into this table, just a batch once a week. Many queries against it though... It almost sounds like a candidate for NOSQL :P I think I may be giving John Colby a run for his money when it comes to table sizes. :) Thanks again. David On Fri, May 9, 2014 at 1:26 PM, Francisco Tapia <fhtapia at gmail.com> wrote: > David, > based on your data load requirement a csv may become overwhelming... I > think instead I would perform a OpenRowSet in step 2.c.iii > > openrowset example: > http://technet.microsoft.com/en-us/library/ms190312.aspx > > bulk insert example: > http://technet.microsoft.com/en-us/library/ms174335.aspx > > > so step a.i switch recovery model to bulk-logged this will keep your > database in an almost full recovery model skipping the bulk logged > operations. next following the bulk insert example for Jet databases: > > (from bulk insert link above) > > INSERT INTO FastLoadTable (Column1, Column2) > > > SELECT Column1, Column2 > FROM OPENROWSET(BULK 'Microsoft.Jet.OLEDB.4.0', > 'C:\Program Files\Microsoft Office\OFFICE11\SAMPLES\Northwind.mdb'; > 'admin';'',Customers); > > > performing loads in this manner will minimize the growth of the log file > when bulk inserts are ran, so you wouldn't be logging the hundreds of > records from each of the 700 access db's. > > then at step 3.i switch the recovery model back to full. > > btw, if your business case does not call for full or bulk-logged recovery > you can just switch and leave it at simple. remember you want full > recovery if you have a job actively backing up the log file regularly so > that you can recovery up to the minute before failure in case of a > catastrophe. > > On Fri, May 9, 2014 at 10:28 AM, David McAfee <davidmcafee at gmail.com>wrote: > >> Currently I am doing the following: >> >> 1. Execute SQL Task - Get list of Database Connection Strings and initial >> variables (I'm currently limiting to 10-20 database pulls at a time) >> >> 2. ForEach Loop: (assign loop variables) >> a. Truncate Load Table >> b. Script Task (Depending on variables, I create different connection >> strings and select statements to deal with missing columns) >> c. Data Flow (Move data from Access to SQL) >> i. OLE DB Source - Dynamic Connection, SQL Command from Variable >> ii. Derived Column - Create two derived columns: MDBID (PKID to >> path and MDB or source DB), SourceDBDate (Date the Database was considered >> clean/correct) >> iii. OLE DB Destination - SQL fast load into "Load table" >> d. Execute SQL Task - Exec Sproc that flags new records in load table >> e. - k. Execute SQL Task - Exec Sprocs that normalizes the data and >> updates FKID in load table >> l. Execute SQL Task - Exec Sproc that flags load table records that >> are changed from the max equivalent record in the real table >> m. Execute SQL Task - Exec Sproc that appends new and changed records >> into the real table >> 3. Truncate Log File >> >> Are you saying that I should replace step 2 (i, ii, iii) with something >> that selects the data from the 700+ various Access databases and save them >> into a text/csv file then bulk insert the text file into the OLE >> Destination Load table then manipulate the data? >> >> > >