artful at rogers.com
artful at rogers.com
Fri Nov 3 12:31:32 CST 2006
It is split across 8 servers, each running two processors and each having > 2 GB of RAM. In part this arrangement was due to security concerns, which I won't go into, but also the sheer size of the database made it impossible to put it on one conventional server. All the servers run Windows Server 2003. The first iteration ran on SQL 2000, but the conversion to SQL 2005 is underway. The fact of 8 databases, each on a different server, exposed a serious down side as compared to a similar implementation in Oracle -- MS SQL cannot do referential integrity constraints across databases, so a bunch of additional code + duplicate lookup tables were required. In Oracle, one could simply declare tablespaces then place them on different servers, and it would still remain a single database. Until this project, I had never been involved with anything so large, so the issue never arose. ----- Original Message ---- From: JWColby <jwcolby at colbyconsulting.com> To: dba-sqlserver at databaseadvisors.com Sent: Friday, November 3, 2006 9:03:11 AM Subject: Re: [dba-SQLServer] Any gut feelings? Arthur, The other question I have given your experience with these big databases, is what kind of hardware were they using? Was this database you mention running on 1/2 proc machines with 2/4 gig of memory, or was it on processor arrays with a dozen or two or three processors and a dozen or two or three gigabytes of main memory? Running windows (what flavor)? SQL Server 2K (what flavor)? John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of artful at rogers.com Sent: Friday, November 03, 2006 4:18 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Any gut feelings? Given the projects upon which I have worked recently, 65M rows is a smallish table. Mind you, I've been doing everything in SQL Server not Access, but I've been working with situations in which the anticipated growth is 1TB per year. The 700 columns is an issue of a different colour and suggests a serious problem, perhaps not in your final solution but certainly in the source data. Some stakeholder must prioritize these columns and then you must break them out into related tables. (i.e. columns most commonly queried, etc.) 65M rows is not really a large table, except in the Access context. The 700 columns is the locus of your problem, not the rowcount. If you can quantify the probablity of a column being queried, then you can break this big table into several related tables according to probablity of query, then join the related tables if and only if their columns are queried. Just my opinion, but based on working with TB-sized tables. Arthur ----- Original Message ---- From: JWColby <jwcolby at colbyconsulting.com> To: dba-sqlserver at databaseadvisors.com Sent: Friday, November 3, 2006 12:58:00 AM Subject: [dba-SQLServer] Any gut feelings? I have this huge table, 65 million records, ~700 fields. The table has a bunch of now useless fields which I will be deleting, basically old cass and ncoa data from a previous (many years ago) address verification on the table. More importantly, it also has what I will call demographics data, but extended into such things as brand preference for consumables, hobbies, types of electronics purchases, etc. Obviously the sheer size is an issue here. What I would think to do, coming from a small database background, is to create a PK on the big table, then break this big table down into smaller tables, each focused on one "set" of demographics data. My question to you guys is, given the size of the table in terms of sheer number of records, is this a valid strategy? If I have an indexed and identical PK in each table, can I break out the Address into a table, age/income/race/religion into a table, then tables for boating, electronics, medicines, etc. with each sub table only containing records where the demographics info is populated. IOW, not everyone has boating info, not everyone has medical info, electronics info etc. so I end up with a smaller subset of records in each table where there is only a record in that table if the person has data for that demographic set. Having done that can I then expect to be able to join up to a half dozen tables to get inner join subsets, outer join mega-sets etc. in anything approaching near real time (minutes)? I am in the process of doing the very first (two) breakouts, address and boating. Obviously I have to create a new table, append the data in, then build an index on the PK for starters. Having done that I can do a quick and dirty test, but I would like your opinions on the general feasibility of this approach given the limitations of current (but state of the art) desktop hardware, SBS 2003 and SQL Server 2000. Having answered this (assuming that the answer is "it is probably a valid approach", I then need assistance on various methods to determine the existence of data within a given record of valid demographic data. IOW, I am pulling all 65 million records, a subset of fields (typically 10 to 30 fields) into a smaller table. This smaller table will eventually have an indexed PK, but NO indexes on the various fields. One way I can think of to do something like this is to generate a "validity" field where I store a number which is simply a count of the fields with something other than a zero length string. So record 1 has 0 valid data fields, rec 2 has 3 valid data fields, 3 has 12 valid data fields and so forth. IF I could generate such a query to get these counts and store it in a single field (and ever get results) and then apply an index on that field, anything over a zero has valid data and could be used to pull records, or (in reverse) records with a zero in the "validity" field could be deleted from the table to drop the number of records in that table. Other thoughts on how to accomplish this objective? Remember that this main table is simply freakin huge, with probably 20-50 different demographic groupings and some where the boundaries aren't even clear. In order to do this I will have to create that same number of tables, cut out the data and get it into these new tables, index the PK, generate counts of the validity data, delete records with no valid data etc. Some kind of automation for doing this would be nice, but basically, I can if necessary just brute force it once I have the procedure down. Ideas? Objections? "Your crazy to even think about this"? John W. Colby Colby Consulting www.ColbyConsulting.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com