John W. Colby
jwcolby at colbyconsulting.com
Sun Sep 5 10:12:49 CDT 2004
OK it's time to talk about how to speed things up. I now have a SQL Server machine with (4) 250g SATA data drives each containing one file from the database. The four files together form a single database with a single table which as of this morning contains 27 million records. Doing a count (*) from another machine took 18 minutes to count 24 million records. What can I do to speed up this count function? What can I do in general to speed up accessing the database? I am going to need to do cross tab type queries across all 65 million records to see how many people do or use X thing. I will also need to pull specific fields from all 65 million records WHERE some field .... There are 6000 fields (so far) and I just can't see indexing all 600 fields although indexes on select fields will be a necessity. Table scans on Boolean values is going to take FOREVER. The "Boolean" fields are currently nvar(50) fields holding a Y or N. Will it help to go through the database changing these fields to a different data type? Please be as specific (including instructions on how if at all possible) as you can since a general "do Y" will take me HOURS of research. I LOVE research, research is good for the soul, but I only have so many hours in the day and I'm trying to learn a lot in a short time. John W. Colby www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of JMoss Sent: Sunday, September 05, 2004 10:35 AM To: dba-sqlserver at databaseadvisors.com Subject: RE: [dba-SQLServer] SQL Server Hash algorithm - question,then rambling update John, Some software parses the street number, then breaks the street name down into parts also. Doubletake from PeopleSmith Software breaks down names and addresses in this fashion... but its not cheap. -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com]On Behalf Of JMoss Sent: Sunday, September 05, 2004 9:26 AM To: dba-sqlserver at databaseadvisors.com Subject: RE: [dba-SQLServer] SQL Server Hash algorithm - question,then rambling update John, What the client is talking about is taking x characters from first name, x characters from last name, x characters from address1, x characters from city, x characters from state, and x characters from the zip code and creating a match key field from that. Other criteria could be added to ensure the uniqueness of the record. Then you could use something like http://www.codeproject.com/database/xp_md5.asp to build a hash key. Either key could be used for purposes of deduping. Jim -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com]On Behalf Of John W. Colby Sent: Sunday, September 05, 2004 9:07 AM To: dba-sqlserver at databaseadvisors.com Subject: RE: [dba-SQLServer] SQL Server Hash algorithm - question,then rambling update Eric, >Why not use Merge Replication to merge your data? Maybe pure ignorance. As I have said I know nothing about SQL Server, learning as I go. However I am merging data in from a set of 22 comma delimited text files. From the word "replication" I assumed that this would merge data already in a sql server database. >For a unique id you can use the uniqueidentifier field type Precisely. I will be using a uniqueidentifier field as my pk. However I also need a unique index to prevent putting the same record in the database twice. This is a database of people in the US. I need something to attempt to recognize John Colby in Northfield CT as already in the db and not put a second copy in the database. The client says that the industry uses a field where they take N characters from field A and M characters from field B and O characters from field C etc. I haven't seen any sign of such a field in the data that I am importing, but they keep saying we need such a thing. John W. Colby www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Eric Barro Sent: Sunday, September 05, 2004 3:27 AM To: dba-sqlserver at databaseadvisors.com Subject: RE: [dba-SQLServer] SQL Server Hash algorithm - question,then rambling update John, Why not use Merge Replication to merge your data?' For a unique id you can use the uniqueidentifier field type. Select yes for the "is rowguid" option. SQL server creates a unique 16-bit id that you can use. In fact this is the field type used by the field when you include the table for replication. --- Eric Barro Senior Systems Analyst Advanced Field Services (208) 772-7060 http://www.afsweb.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com]On Behalf Of John W. Colby Sent: Saturday, September 04, 2004 11:35 PM To: dba-sqlserver at databaseadvisors.com Subject: [dba-SQLServer] SQL Server Hash algorithm - question,then rambling update Does anyone know of a robust hash algorithm for SQL Server? I need to generate a unique id (natural PK) for the nVLDB address database I am working on. I currently have a gaggle (as opposed to a google) of machines busily importing the raw data into the database and once that is done I have to build a unique identifier that can be used to find an address record if it exists for merges and updates. Anyone who has worked on databases like this feel free to pipe in with how you did this. Rambling update - I have been building a pair of machines with the MSI Nforce3 motherboard for the ATHLON64 - http://www.newegg.com/app/viewProductDesc.asp?description=13-130-457&depa=0 With a 3ghz A64 processor. It promises 4 SATA channels and 2 IDE channels for 8 total devices. It promises that raids can be built from any combination of drives. I had hoped to set up a bunch of raid 1 containing disk pairs to hold the database but in the end had a heck of a time trying to get the raid to function reliably. I was trying to set up a Raid 1 boot disk and spent literally days installing and reinstalling trying to get that working. Never did. I did however get a REAL PERSON at MSI to talk to and with luck will sort that problem next week. I finally simply had to give up for now and get on with the show since I need results next week. As a result I built one of the machines with (4) SATA 250gb disks for a terrabyte of storage, threw in another 250 g drive on an IDE channel to hold the log file and the raw data files from which the import is happening. This whole thing has been an exercise in humility I must admit, with much time spent going nowhere. I purchased (2) 1g Dimms a few weeks ago for an existing Athlon machine (my old desktop) and when I purchased the pair of motherboards I purchased (3) more gig sticks, one of which turns out to be bad. It turns out that the bad stick was one of two I opened to just test the ram and so wrongly came to the conclusion that the ram simply did not work with these motherboards. Luckily after getting one of the motherboards up I went back "one last time" to try and manually tweak the ram parameters and see if they would work (didn't want to RMA them if I didn't have to) and discovered that in fact 2 sticks did work. Long story medium long, TODAY I finally got ALL of the machines up and running. The two new motherboards and my original dev machine (which I had retired in favor of my Laptop). I am so tired of Windows installs I could spit. Thank the big cahuna for high speed internet. I also figured out how to get SQL Server to use multiple files on different disks as a single database, which given my total ignorance about SQL Server I consider to be a major victory. When I started this I thought I needed a single file on a HUGE disk. So I have the (4) 250gb SATA drives each holding a single mdf file for a total capacity of 1 terabyte. By my calculations the data will be around 600+ gbytes, giving me a little head room. A fifth 250gb drive will hold the indexes (assuming I can figure out how to tell SQL where to put them). I now have the second a64 machine, my old desktop, my new laptop and my Wife's desktop all running bcp queries dumping the raw data into the server. Each machine is also simultaneously unzipping a raw data file - ~350g zipped, 10g unzipped. Talk about saturating your network bandwidth! With luck, by this time tomorrow I will have all the data in and be able to start the REAL work. Tomorrow I have to figure out the unique index thing, plus start to look at (become familiar with) the data fields. I also want to build an "autonumber" PK. Eventually I would like to experiment with dividing the data out onto different machines. The database currently has 600 fields and I fear that I am about to bump into the 8k / record limitation that someone on the list has mentioned. If that happens I will have to divide the db vertically. I also need to build some views containing subsets of the fields to make analysis and exporting easier. So much remains to be done, and none of it could proceed until I got all the data in which was a waaaay bigger task than I anticipated. I have ordered the MS Action pack and somewhere down the road I hope to get a Windows 2003 server set up. I have heard rumors that it can run 64 bit mode, and that SQL Server can as well, so if that is true I will be testing a 64 bit system, perhaps setting up two identical systems, one 32 bit and one 64 bit for a side by side speed comparison. Of course I need to get some PAYING work done to allow me to do that. ;-) Anyway, that's all the news that's fit to print. Thanks to all the folks that have been giving me suggestions and reading materials. John W. Colby www.ColbyConsulting.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com