[dba-SQLServer] How are indexes built

Thu Nov 29 07:17:40 CST 2007

I am in the tail end of rebuilding the "fact table" for the big database
from hell.  Long ago Arthur managed to get me an "optimum field size" for
each of the fields.  I stripped off a ton (100?) fields out at the beginning
of the table which were very old address validation stuff, and then build a
new table without those fields, and with the rest of the fields set to
exactly the right size to fit the data - usually char(1) rather than the
default varchar(50) which the old table used.  I am now in the "index it"
phase.  Keeping in mind that this data is never updated, once the data is in
place indexes do not have to deal with the insertion problem.

One issue that I have is that in order to keep the speed up, I did not
define a PK field or any indexes before I inserted the data.  Now the data
is so huge that I cannot get it to build (designate) a PK field without a
timeout, or at least it will not work from the built in table modify wizard.
It starts chugging away and then comes back a couple of minutes later with a
timeout message and fails to complete.  So I do not have a real PK.  I do
have a field that contains the data (int) that is the number that represents
a record (the PK field), or is just not actually considered a PK by SQL
Server and I am not sure how to make SQL Server turn it into a PK.  I think
I can do so with an actual SQL statement run as a query but I will have to
research the syntax.  For some reason, SQL Server will run QUERIES to
completion, even if they take hours, but it will cause timeouts for any of
the built in wizards that attempt to modify a table, or for that matter will
time out views if they take too long.  Cut the view SQL into a query and it
will run to completion.  Obviously one of many things I do not understand
about my tool.

So... I am trying to build up indexes to help me with select WHERE clauses
for this beast.  I have discussed some of the types of where clauses I get,
IN(a,b,c,d,e) for a single field, Fld1=A and Fld2=C, etc.  In order to bring
sanity to the table and allow me to test pieces of these select queries, I
am starting to use a strategy of building sub queries to handle the pieces
of a complex where clause, and then using a JOIN for the AND.  In order to
do this I need indexes on the fields used in each sub query.  It is possible
to create cover queries where I include the PKID (field) and the field used
in the select.

I really need to build a clustered index on the PKID field (I think) since
that is what I use to actually reference the entire record, for those cases
where it has to pull data not indexed or where an index is ignored.  I will
need to do that "over the weekend" I think given the physical sort involved.
Will it have to completely rebuild all of my other indexes as well?  I
suspect so.  I really need a single SP that will drop all of my indexes and
another to rebuild all of my indexes.  Then I could just "drop the names" in
an SP and cause the whole lot to build.

I think I am going to have to join a group specializing in SQL Server in
order to get these kinds of questions answered.  Kind of a scary prospect.
I am so "developer oriented" and so ignorant about the workings of the SQL
Server innards.

John W. Colby
Colby Consulting
www.ColbyConsulting.com