[AccessD] [SPAM] Re: [SPAM] Re: [SPAM] Re: The iPad is soooo cool (uhhh ok.)

jwcolby jwcolby at colbyconsulting.com
Tue Oct 30 22:38:23 CDT 2012


Yea, except there is no apparent "most used" fields.

I don't even understand what you are trying to teach me.

Imagine that I have a table with people in it.  John Colby has taken a survey which identifies his 
income and age, number of kids, address.  Mostly that info is known about every person.  Further 
John has taken a survey about the car he drives, an entirely different survey about the magazines he 
reads and another survey about the various stuff he does with electronics.  Those surveys make up 
perhaps 20 out of hundreds of fields in this table.  John hasn't taken any of the surveys which were 
amalgamated to supply all of the other fields.

Mary has taken a survey which provides address, income, age, number of kids.  She has taken a survey 
about the medicines she takes, the mortgage she holds, the drinks she prefers.  Those surveys make 
up perhaps 20 fields of the hundreds in the table.  Notice that Mary hasn't taken any surveys about 
cars, magazines or electronics.  James takes a survey which tells whether he owns a dog and / or a cat.

225 million people have somehow managed to get inserted into this table or 250 fields with some 
small percentage of fields in common and mostly filled out and a small handful of the other fields 
filled out.

The data never changes.  It was compiled 3 years ago, purchased by my client last year and will 
never be updated.

My client buys another database (a single table) which contains *only* the names / addresses of 
women who have dogs or cats.  That is all that is in the database.  A field that says "OwnsDog" and 
another which says "owns cat".  "Y" or nothing in the field.

The table contains 11 million names, was compiled 7 years ago, was purchased 4 years ago and will 
never be updated.

My client buys another database which contains *only* females.  Names / addresses, plus age, and a 
set of fields

Children_0_3
Children_4_6
Children_7_10
Children_11_13
Children_14-17

In these fields are codes
0=none
1=male
2=Female
3=both

The database contains 22 million names, was compiled 5 years ago, purchased 3 years ago and will 
never be updated.

My client buys a database of names and addresses of boat owners compiled by the coast guard.  It is 
a single table, names / addresses.  It has the boat length, propulsion type, value, state registered.

It was compiled 8 years ago, purchased 5 years ago and will never be updated.

The people in these tables are not common. i.e. it is not the same people in each table, it is just 
a random set of people who happen to own a dog, have kids, own a boat or...

I started with the infamous "database from hell" back in 2004 and have over the intervening years 
accumulated nine such databases.  Each is a single table the only thing in common is that they all 
contain name / address components plus some kind of information about the set of people contained. 
The people are not common, the names / addresses are not common, the information about the people 
are not common (across the various tables).

That said, I do take the First name, Last name, Address, Zip5 and Zip4 and hash them to create a 
numeric value (hash value) which uniquely identifies that person at that address.  I do this for 
every table.  Now a simple inner join on the hash tells me if I have the same first name / last name 
/ address / zip in two different tables.

Thus I discover (for example) that I have
10,494,488 women in DogsAndCats (surviving address validation)
18,949,333 women in Kids (surviving address validation)

and 2,087,493 women in common between those two tables, i.e. women with kids with dogs/cats/both.

I also know that I have

265,896 in Merchant Vessels (the boats database)
4057 of which are also in my DogsAndCats.
1112 of which are in my Kids database.

Understand that while the facts part never changes, the address part does.  People move.  People 
move and don't provide a Change of Address to the post office.  When that happens we "lose them" 
from the database since we no longer have an address portion to hash.

Furthermore when people move and *do* enter a COA, I now have them in their new and old addresses. 
So if I get a person / address, they might not hash into a current address at a database but they 
may hash into an old address from that database.

I am attempting to generate a list of every valid address encountered in any of my databases.  All 
told I have 435 million names / addresses in 9 databases.  I could then pull a hash for every 
current and past name / address

In any event, I don't get how this is anything like sales by month for Safeway or something of that 
nature.  It seems to be a different beastie entirely from the normal database.

John W. Colby
Colby Consulting

Reality is what refuses to go away
when you do not believe in it

On 10/30/2012 2:57 PM, Charlotte Foust wrote:
> So you could create dimension tables for at least the most commonly used of
> those fields that would allow you to roll up the data quickly..
>
> Charlotte
>
> On Tue, Oct 30, 2012 at 10:55 AM, jwcolby <jwcolby at colbyconsulting.com>wrote:
>
>> In these databases there are only ever two tables, a "name/address" table
>> and a "fact" table.  The two are related 1-1 with a PKID originating in the
>> fact table and propagated into the Name/Address table.
>>
>> The Name/Address table originates in the fact table and I split it out as
>> the first step in processing the database.  The name/address component is
>> sent out for Address validation / cleaning and then imported back into its
>> own separate table.  I do that simply because it allows me to standardize
>> the address validation using my own field names as well as using additional
>> fields returned by the address validation software.
>>
>> Basically the fact table has information like "Marital status", "Length of
>> residence", "Teen Driver", "collectible antiques" etc etc.  270 fields of
>> etc etc.
>>
>> Every field is not filled in for every record, but some fields are filled
>> in for all records.  There are 225 million fact records and only a small
>> portion of them are filled in for any given field.
>>
>> In fact the point of this exercise is to discover how many records are
>> filled in for each field.  We may get 5% for some fields and 90% (or even
>> 100%) for others.
>>
>>
>> John W. Colby
>> Colby Consulting
>>
>> Reality is what refuses to go away
>> when you do not believe in it
>>
>> On 10/30/2012 12:56 PM, Charlotte Foust wrote:
>>
>>> Are you using dimension tables at all, John?  I'd think a hybrid snowflake
>>> schema might help with that kind of grouping.
>>>
>>> Charlotte
>>>
>>> On Tue, Oct 30, 2012 at 7:04 AM, jwcolby <jwcolby at colbyconsulting.com>**
>>> wrote:
>>>
>>>   If I didn't have time for iHumor I would go iCrazy.
>>>>
>>>> I am currently working on a system to build a data dictionary table and
>>>> populate it for a given database.  Then I need to select which columns
>>>> (fields) to count the data items in - a groupby / count query on all
>>>> selected fields.
>>>>
>>>> The databases I use tend to have a ton of columns (277 fields in the one
>>>> I
>>>> am working on now), each column tends to have a 1/0, T/F or a limited
>>>> range
>>>> of codes 1-9 and a-z.  Unfortunately other columns have names, zips,
>>>> address fields etc so I can't do a groupby on those fields or I will get
>>>> tens of millions of counts - not the intention.
>>>>
>>>> So I build a data dictionary table which reads out the standard stuff -
>>>>
>>>>           [SchemaName]
>>>>         ,[TableName]
>>>>         ,[ColumnId]
>>>>         ,[ColumnName]
>>>>         ,[DataType]
>>>>         ,[CharacterMaximumLength]
>>>>         ,[ColumnDescription]
>>>>         ,[ColumnDefault]
>>>>         ,[IsNullable]
>>>>
>>>> plus a couple of extra fields of my own [PerformCount] and (denormalized)
>>>> [CountInfo].
>>>>
>>>> I then populate that table, unselect the fields with things like names /
>>>> addresses, then run a groupby/count on the remaining fields, updating the
>>>> [CountInfo] with the resulting data.  Push it out to a spreadsheet and
>>>> send
>>>> to the customer.
>>>>
>>>> Keeps me busy.
>>>>
>>>>
>>>> John W. Colby
>>>> Colby Consulting
>>>>
>>>> Reality is what refuses to go away
>>>> when you do not believe in it
>>>>
>>>> On 10/30/2012 9:34 AM, jack drawbridge wrote:
>>>>
>>>>   iLike it John, but when do you get time... you always seem up to your
>>>>> iBalls in real work.
>>>>> jack
>>>>>
>>>>> On Tue, Oct 30, 2012 at 8:45 AM, jwcolby <jwcolby at colbyconsulting.com>*
>>>>> ***
>>>>> wrote:
>>>>>
>>>>>    From:
>>>>>
>>>>>>
>>>>>> http://www.techradar.com/us/******news/phone-and-**communications/****<http://www.techradar.com/us/****news/phone-and-communications/****>
>>>>>> <http://**www.techradar.com/us/**news/**phone-and-communications/**<http://www.techradar.com/us/**news/phone-and-communications/**>
>>>>>>>
>>>>>> mobile-phones/apple-samsung-******smartphone-profits-add-up-**to-****
>>>>>> 106-percent-1108455<http://**w**ww.techradar.com/us/news/**<http://www.techradar.com/us/news/**>
>>>>>> phone-and-communications/****mobile-phones/apple-samsung-**
>>>>>> smartphone-profits-add-up-to-****106-percent-1108455<http://**
>>>>>> www.techradar.com/us/news/**phone-and-communications/**
>>>>>> mobile-phones/apple-samsung-**smartphone-profits-add-up-to-**
>>>>>> 106-percent-1108455<http://www.techradar.com/us/news/phone-and-communications/mobile-phones/apple-samsung-smartphone-profits-add-up-to-106-percent-1108455>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>    Apple's sales may only be hampered by the fact that its 7.9-inch iPad
>>>>>>
>>>>>>>
>>>>>>>   mini could be in short supply, as it has already sold out of its
>>>>>> initial
>>>>>> launch stock.  Customers who pre-order the WiFi model of an iPad mini
>>>>>> online Monday face a two-week shipping wait time.
>>>>>>
>>>>>> Uhhh yep, they are soooo cool that Apple can't build enough for all the
>>>>>> iFolks standing in those long iLines.
>>>>>>
>>>>>> My biggest concern is all of the poor iFolks standing in iLines in the
>>>>>> northeast as the hurricane sweeps overhead.  You know they won't
>>>>>> evacuate
>>>>>> and lose their iPlace in iLine.  There are already reports of entire
>>>>>> iLines
>>>>>> of iDiots being washed out to sea.
>>>>>>
>>>>>> The upside is that the US average IQ is inching upwards.
>>>>>>
>>>>>> ;)
>>>>>>
>>>>>> BTW, before I even pressed send I was served with an iCease and iDesist
>>>>>> for using the iCopyrighted phrase iFolks, iPlace and iLine.  For some
>>>>>> reason iDiot wasn't included in the iCease and iDesist. iStrange!
>>>>>>
>>>>>> ;)
>>>>>>
>>>>>> OK, iBack to iWork.
>>>>>>
>>>>>> --
>>>>>> John W. Colby
>>>>>> Colby Consulting
>>>>>>
>>>>>> Reality is what refuses to go away
>>>>>> when you do not believe in it
>>>>>>
>>>>>> --
>>>>>> AccessD mailing list
>>>>>> AccessD at databaseadvisors.com
>>>>>> http://databaseadvisors.com/******mailman/listinfo/accessd<http://databaseadvisors.com/****mailman/listinfo/accessd>
>>>>>> <htt**p://databaseadvisors.com/****mailman/listinfo/accessd<http://databaseadvisors.com/**mailman/listinfo/accessd>
>>>>>>>
>>>>>> <http:**//databaseadvisors.**com/**mailman/listinfo/accessd<http://databaseadvisors.com/**mailman/listinfo/accessd>
>>>>>> **<http://databaseadvisors.com/**mailman/listinfo/accessd<http://databaseadvisors.com/mailman/listinfo/accessd>
>>>>>>>
>>>>>>
>>>>>>>
>>>>>>>   Website: http://www.databaseadvisors.******com<http://www.**
>>>>>> databaseadvisors.com <http://www.databaseadvisors.**com<http://www.databaseadvisors.com>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>   --
>>>> AccessD mailing list
>>>> AccessD at databaseadvisors.com
>>>> http://databaseadvisors.com/****mailman/listinfo/accessd<http://databaseadvisors.com/**mailman/listinfo/accessd>
>>>> <http:**//databaseadvisors.com/**mailman/listinfo/accessd<http://databaseadvisors.com/mailman/listinfo/accessd>
>>>>>
>>>> Website: http://www.databaseadvisors.****com<http://www.**
>>>> databaseadvisors.com <http://www.databaseadvisors.com>>
>>>>
>>>>
>> --
>> AccessD mailing list
>> AccessD at databaseadvisors.com
>> http://databaseadvisors.com/**mailman/listinfo/accessd<http://databaseadvisors.com/mailman/listinfo/accessd>
>> Website: http://www.databaseadvisors.**com<http://www.databaseadvisors.com>
>>



More information about the AccessD mailing list