[AccessD] Very interesting quirk in table design

Thu May 15 11:22:37 CDT 2003

My response of last night has not showed up so I'll try again:

Speed/simplicity versus storage requiements is given as a reason for using a 
bigger datatype.

If so, why didn't Microsoft make the boolean a four byte datatype?  And when 
we get to 64 bit processors and busses?

The time taken by the processor to mask a byte for use by a 32 bit processor 
compared to the time to transfer a byte from disk to memory is 
insignificant.  The time taken to transfer that same byte over a LAN is many 
orders of magnitude greater.  Now factor in a WAN or a dial up connection 
and the smaller datatype gains significant advantages in speed.  I keep 
hearing that storage is cheap but even so, the data must move from storage 
to processor to memory and back to be used and it is the transit time that 
has the greatest impact on performance.

Assume 100,000 records.
Assume a requirement of 8 yes/no fields.

Case 2 byte Yes/No datatype:  8*2*100,000 = 1,600,000 bytes
Case 1 byte bit field:  1*100,000 = 100,000 bytes

The difference is 1,500,000 bytes and the transit time is enormous compared 
to the processor time required to work with the byte in a 32 bit 
environment.  You lose in simplicity in querying but this can still be 
handled with a bit of skill.

Assume a requirement of 6 yes/no fields and you 'waste' a couple bits

Case 2 byte Yes/No datatype:  6*2*100,000 = 1,200,000 bytes
Case 1 byte bit field:  1*100,000 = 100,000 bytes

Even with wasted bits, the saving is enormous

Assume a requirement of 8 yes/no fields and you use a signed byte vs. the 
two byte boolean datatype.

Case 2 byte Yes/No datatype:  8*2*100,000 = 1,600,000 bytes
Case 1 byte boolean field:  8*100,000 = 800,000 bytes

In this case, you still have better speed due to the overriding impact on 
performance of transfer time plus there is no impact on query complexity.  
This is both simple and faster than a two byte datatype.  In addition, you 
have the benefit of a smaller file (less time to transfer by mail or 
otherwise between remote servers, lower backup requirements) plus you 
benefit from reduced overall network traffic so all users benefit no matter 
what applications they are running.

I think that a smaller datatype is a no brainer on all counts and was merely 
wondering why Microsoft approached the boolean type with No Brain?  Whatever 
their reason, it would appear that a signed byte would handle 0 and -1.

Hen

It's for speed/simplicity versus storage requirements.

A 32 bit (long) data type is the quickest and easiest to
retrieve/mainipuate/store in a 32 bit OS.

If you used a single bit to store a yes/no field, it would reduce the
storage requirement, but think about how you would then have to
store/locate/ index etc that particualr bit. It would make it a very
slow datatype to use in comparison.

If you only have a single Yes/No field in the table, what are you
going to do with the other 7 bits in the byte anyway - so you might
as well just store it as a whole byte. Once you take that step, you
might as well use a long for speed rather than a byte (You don't use
single byte Integer variables do you <g>)

_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE*  
http://join.msn.com/?page=features/virus