[AccessD] Number vs text data type

Henry Simpson hsimpson88 at hotmail.com
Fri Oct 17 22:48:54 CDT 2003


OK Drew:

And it showed up on mine, but the posts I made earlier may have shown up on 
your system but have yet to appear on mine.  For example, the post I made 
over 48 hours ago obviously made it to other list members even though it has 
not yet made it to my mail.  I see both your reply and my earlier post of 
this date.

Just to be clear, much of the discussion on this topic has been about speed 
of finding or sorting.  The point I was trying to make was that storing 
numbers generally takes less space than storing text and that the storage 
size is the most direct measure of possible performance in a file server 
database because the storage size affects the disk data retrieval time and 
the data transfer time.  The discussion about processor speed is a tempest 
in a teapot and the difference in speed at the processor level between 
numbers and text is the least part of the equation.  The more significant 
factor is at the disk to LAN and LAN to client end.  It's not just a a 
bottle neck but an 'hour' glass pinching off the flow.  If you have half the 
sand, the turn around speed is double.  The discussion about sort and search 
speed on an indexed column on text vs string is a simple 30 odd compares in 
a billion records.  How do 30 processor comparisons even warrant 
consideration in the context of the fact that those billion records have to 
be transferred over a LAN.  It's like talking about how the paint used to 
paint a truck affects the load carrying capacity of that truck because one 
brand or color of paint is a bit heavier than the other.

I personally like to store data composed of numerical characters as numbers 
wherever possible, even if there are additional formatting characters.  As I 
said before, even though dates are stored and retreived as simple numeric 
values, they may be displayed in any number of formats including spelled out 
months in any number of languages, with dates of the week, as a week number 
only, as a quarter, with times, in 24 hour format and so on.  Just because 
Microsoft built it for you doesn't mean that you can't or shouldn't take 
advantage of the same strategy.


Hen


>From: Drew Wutka <DWUTKA at marlow.com>
>Reply-To: Access Developers discussion and problem 
>solving<accessd at databaseadvisors.com>
>To: "'Access Developers discussion and problem solving'" 
><accessd at databaseadvisors.com>
>Subject: RE: [AccessD] Number vs text data type
>Date: Fri, 17 Oct 2003 10:36:51 -0500
>
>It showed up...grin.
>
>Couldn't agree more.  Processor speed isn't a bottleneck in Access, but my
>point is that it's no excuse to not optimize your db's for speed, whether 
>it
>is for faster LAN access or even for local db's, because I promise that 
>it's
>more likely a db will go from local to network then it is for the US postal
>service changing the Zip Code system! <grin>
>
>Drew
>
>-----Original Message-----
>From: Henry Simpson [mailto:hsimpson88 at hotmail.com]
>Sent: Friday, October 17, 2003 9:33 AM
>To: accessd at databaseadvisors.com
>Subject: RE: [AccessD] Number vs text data type
>
>
>A person I know upgraded his computer on a 2 person LAN with a machine 
>three
>
>times as fast.  His data was kept on his secretary's computer on a 100 mbit
>LAN connnection.  I warned him that he wouldn't likely see significant
>difference in performance.  In fact, he could see no difference.  I had
>warned him that a bit of time looking at indexes and addressing data volume
>and recordset size would probably be of more benefit and when all was said
>and done, he came asking for help.  Processor speed has rarely been the
>bottleneck in most Access deployments.
>
>I've made two posts to this effect in the past week and have seen neither 
>of
>
>them show up yet.  The last was over 36 hours ago and I've seen once
>response to it that included an excerpt of my post.  I wonder what fraction
>of all posts made to the list that I ever see.  I tried looking at Drew's
>archive but if I search on my name, the last post by me is August 30, 2003.
>
>I'll send this once per day for the next week until I see it show up.
>
>Hen
>
>
> >From: Drew Wutka <DWUTKA at marlow.com>
> >Reply-To: Access Developers discussion and problem
> >solving<accessd at databaseadvisors.com>
> >To: "'Access Developers discussion and problem solving'"
> ><accessd at databaseadvisors.com>
> >Subject: RE: [AccessD] Number vs text data type
> >Date: Wed, 15 Oct 2003 14:11:41 -0500
> >
> >You are right, Zip Codes could change into anything.  But that applies to
> >everything we know.  Should we make EVERY field a string?  You can only
> >code
> >for the present, and predicted future.  The post office isn't going to
> >change the Zip Code system (at least not the basic 5 digits) without a
> >decent amount of lead/warning time.  They have systems they'd have to
> >modify
> >also.
> >
> >Also, you are right, with today's processors, speed is not as critical.
> >But
> >the question is, when faced with a factor of 1.5x to 2x faster, do you go
> >with the slower process, or the faster process?  Isn't it better to stick
> >with the faster process?  Why sacrifice any speed, to accomodate unknown
> >future variables?
> >
> >Drew
> >
> >-----Original Message-----
> >From: Heenan, Lambert [mailto:Lambert.Heenan at aig.com]
> >Sent: Wednesday, October 15, 2003 1:47 PM
> >To: 'Access Developers discussion and problem solving'
> >Subject: RE: [AccessD] Number vs text data type
> >
> >
> >Ah but...
> >
> >one day in the future the design of zip codes (which is only ONE example 
>of
> >a "number" that is not a number) could well be changed to something 
>utterly
> >different that looks nothing like a number. You can't just keep adding
> >fields and complex put-it-all-back-together routines to reconstruct the
> >original string from a combination of strings and numbers. Well you can,
> >but
> >it might make life miserable as "the management" make more wacky 
>decisions.
> >
> >As for searching for string vs numbers. Here are some results on a test 
>Db
> >I
> >threw together.
> >
> >Using a table with 581,672 records: each record has a longint field and a
> >string field. The long field is populated with random long integer 
>values,
> >and the string field has the string representation of its corresponding
> >integer field. Both fields are indexed with duplicates allowed.
> >
> >The table is in an MDB file (44.9 MBytes) which is on a network drive, 
>and
> >the network is running at just 10-mbit. So then I ran the following code 
>to
> >search for random 'numbers' both as numbers and as strings...
> >
> >Sub testSearch()
> >Dim n As Long
> >Dim s As String
> >
> >Dim TextSearchStart As Single, TextSearchEnd As Single
> >Dim NumericSearchStart As Single, NumericSearchEnd As Single
> >Dim TextSearchTime As Single
> >Dim NumericSearchTime As Single
> >Dim x As Long
> >Dim strHits As Long
> >Dim intHits As Long
> >Dim db As Database
> >Dim rs As Recordset
> >Dim sCriteria As String
> >     Randomize
> >     Set rs = OpenDynaset(db, "tblHuge")
> >     For x = 1 To 1000
> >         With rs
> >             n = CLng(Rnd() * 2 ^ 31)
> >             s = CStr(n)
> >
> >             sCriteria = "nNumber=" & n
> >             NumericSearchStart = Timer
> >             .FindFirst sCriteria
> >             intHits = intHits + Abs(CLng(Not .NoMatch))
> >             NumericSearchEnd = Timer
> >             NumericSearchTime = NumericSearchTime + (NumericSearchEnd -
> >NumericSearchStart)
> >             sCriteria = "sText='" & s & "'"
> >             TextSearchStart = Timer
> >             .FindFirst sCriteria
> >             strHits = strHits + Abs(CLng(Not .NoMatch))
> >             TextSearchEnd = Timer
> >             TextSearchTime = TextSearchTime + (TextSearchEnd -
> >TextSearchStart)
> >         End With
> >     Next x
> >     rs.Close
> >     Debug.Print "Average find times"
> >     Debug.Print "Numbers:" & Format(NumericSearchTime / 1000,
> >"0.0000000000") & " Hits:" & intHits
> >     Debug.Print "Strings:" & Format(TextSearchTime / 1000, 
>"0.0000000000")
>
> >&
> >" Hits:" & strHits
> >     Set db = Nothing
> >     Set rs = Nothing
> >End Sub
> >
> >
> >And here are the results of a few runs, last to first...
> >
> >Average find times
> >Numbers:0.0020859370 Hits:0
> >Strings:0.0025468750 Hits:0
> >Average find times
> >Numbers:0.0023203120 Hits:0
> >Strings:0.0030234380 Hits:0
> >Average find times
> >Numbers:0.0023984380 Hits:0
> >Strings:0.0040195310 Hits:0
> >Average find times
> >Numbers:0.0029257810 Hits:0
> >Strings:0.0046445310 Hits:0
> >
> >As you can see the random search values never generated any hits - not 
>too
> >surprising considering the number of possible values in 32bit long.
> >
> >The numeric search times were fairly consistent, and the string search
> >times
> >steadily dropped. I assume that there is some sort of caching going on
> >here.
> >But the main point is that at worst it took just about twice as long to
> >search for the string as it did to find the long. Not *such* as huge
> >performance hit considering we are talking about thousandths of a second.
> >There is also a fair bit or arithmetic going on between each search.
> >
> >I goes without say (so why am I saying it?) that whne yo take out the
> >indexes on the fields the search times go though the roof (for the Long
> >field as well as the string one).
> >
> >Lambert
> >
> > > -----Original Message-----
> > > From:	Drew Wutka [SMTP:DWUTKA at marlow.com]
> > > Sent:	Wednesday, October 15, 2003 12:26 PM
> > > To:	'Access Developers discussion and problem solving'
> > > Subject:	RE: [AccessD] Number vs text data type
> > >
> > > Ah, but you can always add fields to handle new anomalies!  With the 
>zip
> > > code, just add an 'extension' field.  Original Number field is left
> >intact
> > > (and all 'search' routines) would still work.
> > >
> > > Drew
> > >
> > > -----Original Message-----
> > > From: Heenan, Lambert [mailto:Lambert.Heenan at aig.com]
> > > Sent: Wednesday, October 15, 2003 8:38 AM
> > > To: 'Access Developers discussion and problem solving'
> > > Subject: RE: [AccessD] Number vs text data type
> > >
> > >
> > > This thread has been going on quite some time now, so why should I let
> >it
> > > stop :-)
> > >
> > > One key issue that which has not yet been mentioned but should be
> > > considered
> > > is is there any possibility that the design / structure of the
> >"numerical"
> > > identifier might change?
> > >
> > > To take the example of US Zip codes. Once upon a time there were no 
>Zip
> > > codes, then along came 5 digit zips, and a little later the hyphen 
>plus
> > > four
> > > digit extensions. This kind of change could happen at any point in 
>time
> > > because at the end of the day a postal code is an *arbitrary*
> >identifier.
> > > In
> > > other words there is no arithmetic meaning in a zip code, so it should
> >not
> > > be stored as a number. You CAN store a 5 digit zip code as a Long Int,
> >but
> > > you'll regret it if "the powers that be" decide to change things later
> >on.
> > >
> > > As for searching for strings in a sorted list, while it is slower to
> > > compare
> > > a whole string with another whole string, vs. comparing numeric 
>values,
> >to
> > > search a sorted list really does not take that long as a standard 
>binary
> > > chop approach is going to zero in very quickly on the target string. 
>(If
> > > my
> > > mental arithmetic is up to it, searching for a string in a list of one
> > > billion (10^9) strings would require 30 comparisons at most.) In
> >addition,
> > > the initial stages of the binary chop only need to compare the first
> > > character (or first few) in the string with the first character (or
> >first
> > > few) of the target, not much slower than integer data comparisons. And
> > > it's
> > > just as simple to figure out if a target is within some range.
> > >
> > >
> > >
> > > Lambert
> > >
> > > > -----Original Message-----
> > > > From:	Drew Wutka [SMTP:DWUTKA at marlow.com]
> > > > Sent:	Tuesday, October 14, 2003 5:18 PM
> > > > To:	'Access Developers discussion and problem solving'
> > > > Subject:	RE: [AccessD] Number vs text data type
> > > >
> > > > Yes, but if something can be stored JUST as a number, it is not only 
>a
> > > > storage space issue, but a speed issue.  If you took just the 5 
>digit
> > > Zip
> > > > code, (ignoring any extensions), and you wanted to search for
> >addresses
> > > in
> > > > the range of 70000 to 80000, if the numbers are stored as text, the
> > > > processor is going to have to compare each digit of the number,
> >because
> > > > it's
> > > > looking at ASCII (or unicode) and NOT a number.  However, if it's
> >stored
> > > > as
> > > > a number, then it's a VERY simple check to determine if it falls in
> >that
> > > > range!
> > > >
> > > > Drew
>
>_________________________________________________________________
>Tired of spam? Get advanced junk mail protection with MSN 8.
>http://join.msn.com/?page=features/junkmail
>
>_______________________________________________
>AccessD mailing list
>AccessD at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/accessd
>Website: http://www.databaseadvisors.com
>_______________________________________________
>AccessD mailing list
>AccessD at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/accessd
>Website: http://www.databaseadvisors.com

_________________________________________________________________
Help STOP SPAM with the new MSN 8 and get 2 months FREE*   
http://join.msn.com/?page=features/junkmail



More information about the AccessD mailing list