[dba-SQLServer] Any way to help speed things up?

David McAfee davidmcafee at gmail.com
Mon May 12 12:29:33 CDT 2014


I thought by choosing the "Fast Load" Data access mode option, that I was
using the OpenRowSet method of loading data.

I tried using the insert (and variations of it) in the links that you
provided, but I couldn't get them to work.
I'm thinking there are too many issues between all of dynamic connection
strings, databases, variables and missing/included columns.

This isn't (yet) a mission critical component. It's more of a proof of
concept.

A little more of a back story:

Each week a coworker downloads a couple of MDBs from LA County's FTP site.
They run some existing action queries and create a couple of "clean" MDBs
that other people use to query.

The trouble is, the data is spread among many different databases.

I decided to bring in the one "main" (huge) table into a test SQL database.

I let it run all weekend, and have brought in all of the records from
3/2006 through the first week of 1/2014.

Currently, if I use


sp_spaceused 'TableNameHere'


I get the following:

26,627,542 rows
51,946,048 KB reserved
24,125,192 KB data (24GB for one F'n table...woah!)
27,812,840 KB index_size
8016 KB unused

This is definitely the biggest table that I've ever worked with! :P

I only bring in New or changed records from the load table.
I wrote a view that gives a flattened identical look and feel of the large
table, but only showing the latest status for each "claim".

That view takes  9:48 to run and returns 3,029,983 rows.

I'm thinking that I should make a table from this view and have people
query against this table, rather than the large table.

What do you thing? Am I going about this the wrong way?

There are no daily inserts into this table, just a batch once a week.
Many queries against it though...

It almost sounds like a candidate for NOSQL :P

I think I may be giving John Colby a run for his money when it comes to
table sizes. :)

Thanks again.
David



On Fri, May 9, 2014 at 1:26 PM, Francisco Tapia <fhtapia at gmail.com> wrote:

> David,
>   based on your data load requirement a csv may become overwhelming... I
> think instead I would perform a OpenRowSet in step 2.c.iii
>
> openrowset example:
> http://technet.microsoft.com/en-us/library/ms190312.aspx​
>
> bulk insert example:
> http://technet.microsoft.com/en-us/library/ms174335.aspx
>
>
> so step a.i switch recovery model to bulk-logged this will keep your
> database in an almost full recovery model skipping the bulk logged
> operations.  next following the bulk insert example for Jet databases:
>
> (from bulk insert link above)
>
> INSERT INTO FastLoadTable (Column1, Column2)
>
>
> SELECT Column1, Column2
>    FROM OPENROWSET(BULK 'Microsoft.Jet.OLEDB.4.0',
>       'C:\Program Files\Microsoft Office\OFFICE11\SAMPLES\Northwind.mdb';
>       'admin';'',Customers);
>
>
> performing loads in this manner will minimize the growth of the log file
> when bulk inserts are ran, so you wouldn't be logging the hundreds of
> records from each of the 700 access db's.
>
> then at step 3.i switch the recovery model back to full.
>
> btw, if your business case does not call for full or bulk-logged recovery
> you can just switch and leave it at simple.  remember you want full
> recovery if you have a job actively backing up the log file regularly so
> that you can recovery up to the minute before failure in case of a
> catastrophe.
>
> On Fri, May 9, 2014 at 10:28 AM, David McAfee <davidmcafee at gmail.com>wrote:
>
>> Currently I am doing the following:
>>
>> 1. Execute SQL Task - Get list of Database Connection Strings and initial
>> variables (I'm currently limiting to 10-20 database pulls at a time)
>>
>> 2. ForEach Loop: (assign loop variables)
>>     a. Truncate Load Table
>>     b. Script Task (Depending on variables, I create different connection
>> strings and select statements to deal with missing columns)
>>     c. Data Flow (Move data from Access to SQL)
>>          i. OLE DB Source - Dynamic Connection, SQL Command from Variable
>>          ii. Derived Column - Create two derived columns: MDBID (PKID to
>> path and MDB or source DB), SourceDBDate (Date the Database was considered
>> clean/correct)
>>          iii. OLE DB Destination - SQL fast load into "Load table"
>>     d. Execute SQL Task - Exec Sproc that flags new records in load table
>>     e. - k. Execute SQL Task - Exec Sprocs that normalizes the data and
>> updates FKID in load table
>>     l.  Execute SQL Task - Exec Sproc that flags load table records that
>> are changed from the max equivalent record in the real table
>>     m. Execute SQL Task - Exec Sproc that appends new and changed records
>> into the real table
>>  3. Truncate Log File
>>
>> Are you saying that I should replace step 2 (i, ii, iii) with something
>> that selects the data from the 700+ various Access databases and save them
>> into a text/csv file then bulk insert the text file into the OLE
>> Destination Load table then manipulate the data?
>>
>>
>
>


More information about the dba-SQLServer mailing list