[dba-SQLServer] Using Databases on separate machines - performance

Francisco Tapia fhtapia at gmail.com
Wed May 16 11:19:01 CDT 2007


John,
  You're describing distributed queries, which you can perform such as


SELECT * FROM ServerName.Database.Owner.Table


These instructions will pass the request to the OLEDB provider and pass the
"work" to the linked server.  If you were hooked into some other type of
engine ie, JET/Foxpro excel, then the work is performed by sql server as the
oledb provider would have restrictions on the datasource.  More info on
distributed queries in the links below.  I do use distributed queries in my
environment, but in my situation I generally cause them to join against
local tables, which then it has to pull data over to do the join locally.
If instead I perform an entire query on the linked server the processing
occurs on the remote server.

such as:

SELECT * FROM ServerName.Database.Owner.Table Where Field1 IN ('a','b', 'c',
'd')


hth

White Paper on performance:
http://citeseer.ist.psu.edu/rd/93453806%2C732761%2C1%2C0.25%2CDownload/http%3AqSqqSqhome.comcast.netqSq%7EevilconroyqSqICDE2005-MicrosoftSQLServerDistributedQuery.pdf

Architecture:
http://msdn2.microsoft.com/en-us/library/ms191277.aspx

Distributed Queries
http://msdn2.microsoft.com/en-us/library/ms188721.aspx

On 5/16/07, jwcolby <jwcolby at colbyconsulting.com> wrote:
>
> I now have two "high powered" servers built up.  Both run Windows 2003 and
> SQL Server 2005.
>
> ATM I am running my entire database on a single machine.  The database
> consists of a couple of largish "raw data" tables (65 million recs / 700+
> fields; 97 million recs / 149 fields), which contain the data pulled in
> from
> text files, with a autoincrement PK added for tracking purposes.  Each raw
> data table then has address / PK fields pulled out and "sent out" for
> address validation.  The results are then reimported back in to SQL
> Server,
> into the same DB as the raw data table sits in.  I have created separate
> DBF
> files for each "database" (raw / validated).
>
> As I mentioned earlier, I am creating a new dbf file set for each "order"
> I
> receive from my customer, where I build up the views required to process
> that specific order.  That is working quite well BTW.
>
> I have a bunch of questions re performance.  I have discovered that I can
> create queries / views that pull the data straight out of the desired db /
> table when I use that data in another db, simply by just referencing the
> database / table.  I think I can do the same thing if pieces are on
> another
> server instance.  What I am considering doing is placing these huge raw /
> validated database files out on the StoneHenge server, leaving the Azul
> server to contain and process the orders.  Stonehenge is the newer machine
> and has a single partition with 1.6 tbytes open, and I will be adding
> another 500 gb to that soon.  Thus this seems like the logical home for
> the
> big databases.
>
> I have a gbit switch between all the machines on my network.
>
> My question is, will there be any pros/cons to organizing things this way.
> I can get about 450mb burst streaming data off of my raid arrays which is
> considerably above the 1 gb switch capacity, but it seems unlikely that
> SQL
> Server would actually process data at that speed anyway.  So I want to
> place
> the big source databases on the new server and the order database on the
> original server.
>
> To give an example of a real order I created a set of queries:
>
> 1) One query talks to the tblHSIDRaw table (75 million records / 700
> fields), asking for
>
> "SELECT PKID WHERE ... "
>
> The Where clause encompasses about 9 different fields.  All the fields are
> indexed, though how useful the indexes are (in all cases) is in doubt.
>
> 2) Another query uses a small table of 180 ZIPS provided by the client.
> That ZIP table is joined on the ZIP column of tblAZHSID which is the table
> that has been processed for valid addresses.  tblAZHSID is ~50 million
> records with about 20 fields.
>
> "SELECT PK from tblZIPS inner join tblAZHSID on ..."
>
> 3) A third query requests data from tblHSIDRaw for a where on a single
> specific field.
>
> "SELECT PK FROM tblHSIDRaw WHERE FieldX IN ()
>
> I did that just because this has to be an AND with all the 9 fields in the
> first query.  Perhaps not the most efficient but it works.
>
> 4) A query that inner joins all the PKs and returns a result set.
>
> So what happens if I place these tblHSIDRaw and tblAZHSID out on
> Stonehenge?
> I assume that processor is passed the select clauses and performs the
> processing required to return the PKs requested?
>
> Does Azul do any processing?  Perhaps the join of the result sets in query
> 4
> to create the final data set?
>
> Is the process faster or slower than if the whole shootin match ran on a
> single machine?
>
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com
>
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
>
>


-- 
-Francisco
http://sqlthis.blogspot.com | Tsql and More...



More information about the dba-SQLServer mailing list