[dba-VB] How I'm approaching the problem

Charlotte Foust cfoust at infostatsystems.com
Thu Jul 19 15:04:14 CDT 2007


I see.  You had mentioned that before but I forgot (blame my age LOL).
I know about that kind of list from bitter experience, although the
biggest ones I ever had to import were less than 100K names per list.
But I also didn't have the benefits of .Net and had to work strictly in
Access 97, which didn't make it fun.  I still don't see the need to
import the data into a database immediately when you could manipulate
them in xml form until you got them the way you wanted and THEN import
the data once and for all.  You could park the rejects in another
database for reference.

Charlotte

-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Thursday, July 19, 2007 12:56 PM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] How I'm approaching the problem

>Why import them and then export them back out right away?

Because these lists can come from anywhere.  I don't know until they are
in the database what the field names even are.  ATM I am building a
table that maps "their" field name to my field name, then using that
table to do the export back out to CSV for address processing.  

Furthermore these lists ALWAYS have name / address fields PLUS 10 to 600
OTHER fields which are demographics.  Stuff like age / income, race,
age, number of children, preferences in products etc.  These are
marketing lists, collected from all over the universe.  Thus the data
has to be imported into a raw data table because I use that demographics
information for where clauses in selection queries.  The name / address
has to go back out for address processing because who knows if the
addresses were EVER valid?  We get mis-spellings, but we also get
intentional bad data - John Colby at 1234 None of your business lane.
Obviously we don't want to spend money sending mail to "none of your
business Lane".

Remember, these lists come from all over the (United States) universe.
We have no clue as to the validity of the address.  If the address is
bad
(non-deliverable) then the entire data record is tossed.

Once I export and do address validation, then we know that we can at
least deliver mail to the address.  We still don't know that the person
really lives there, or even that the name is valid.  It might be "Micky
Mouse" at
1723 Twin Pines Drive.  Or the person might have lived there 10 years
ago when they took a survey about smoking (for example) but moved 3
months later...

My job is to attempt to filter the trash as best I can, and address
validation is the one major filter point that really works, but it only
works on the address itself.  OTOH, some of the lists I get are
"pre-validated" in some fashion.  For example I just got a list of 100
million deeds.  The data is "guaranteed to be correct" although even
then the guarantee is not absolute.  But it is far better than "micky
mouse at
1234 none of your business way".

Have you ever joined a web site where they INSISTED on getting your full
name, address, phone, email, etc. and wouldn't save the form unless you
filled in each and every field?  And have you ever said to yourself - "I
want to join this place but I do not want them having all of that".  I
have, and guess what I use?  Micky mouse at 1234 None of your business
way...

So when they sell that name to the wrong person I don't get mail from
them.

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Charlotte
Foust
Sent: Thursday, July 19, 2007 3:37 PM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] How I'm approaching the problem

Why import them and then export them back out right away?  You could
build the hash in a calculated column of a dataset before you import it
in the first place, I think.  Read it into a typeddataset and add a
calculated column for the hash.  The import the enhanced typeddataset
into your table, which includes a column for the hash code.

Charlotte 

-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Thursday, July 19, 2007 12:32 PM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] How I'm approaching the problem

I will be doing a lot of processing.  There are several scenarios as to
the "source".  It all starts with raw data files brought in from DVDs
containing text files.  Once imported to a database they are immediately
exported back out for address processing.  Once processed they are
immediately reimported, in my current method I import into a separate
database.  I also build a hash of the address, family and person which
goes into 3 fields in the address database / table.  Sometime in the
future, those hashes will be compared across the databases to see where
the various objects match, a person in this table is the same as a
person in that table.  An address in this table is the same as an
address in that table.  That will allow me to build a normalized system
where I can "see" data about the same person, family or address across
lists.

Additionally, the address lists will be exported back out for address
processing periodically (monthly?) to find moves.


John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Charlotte
Foust
Sent: Thursday, July 19, 2007 3:07 PM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] How I'm approaching the problem

Are you doing any processing on the data, besides importing it into a
table?

Charlotte 

-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Thursday, July 19, 2007 11:42 AM
To: dba-vb at databaseadvisors.com
Subject: [dba-VB] How I'm approaching the problem

The problem with that is it takes a LOT of explaining to even get
started.

I am trying to develop a system where I can select a SQL Server instance
on the network.  Once I select a server, I need to select a database if
one exists, or build one if it does not exist.  Once I select (or build)
a database I need to select a table if one exists or build one if it
does not exist.  

Imagine the following scenario:

I have a directory of CSV files that I need to import into SQL Server.
I need to specify the server instance, the database, and a table into
which I want to import the data.  The server instance has to already
exist obviously.  However the database may or may not already exist.  I
need to select the server instance and create the db if none already
else select the db.  Either way I have then selected a db and then need
to see if a table exists, if not create it.  In the end select it.

So...

I have a form.  On the form I have:

1) A control to specify the path to the database files.
2) A control to select a server instance.
3) A control to select a database.
4) A control to select a table.

5) A control to specify a path to the DATA files to be imported.


#2 above needs code (and a combo as the control) to select the server.

#3 above needs code and a combo as the control to select a database
within that server

#4 above needs code and a combo to select a table within that database

1 and 5 already have controls and code to select a directory.

The entire form needs (already has as of today) code to store the
current settings as the form closes and read back out when the form
opens.

A byproduct of the way I work (I need to get real work done) is that I
often build systems like this bottom up instead of top down, though I
think top down.  Thus I build code that does the bottom piece, passing
in hard coded values for server / database / table and then come back
and either EDIT those values for the next set of data or BUILD the
pieces to select those data dynamically (where I am now).

So I have a program already in place that imports an entire directory of
CSV files into a server / database / table.  The server / database /
table pieces are passed in but hard coded in the form classes header.  

I am now trying to create the user interface to allow me to use controls
to browse the server / database / tables.  This seems like something
that .NET would just have as a class.  Dim a clsDBBrowser tree control
and click on the node, and back comes the values in a form that could be
used in an ADO connect string.  Given that I do not see that this is
true, I am building my own (not a tree control atm, but separate
combos).  A tree that looks identical to what already exists in SQL
Server (except with a server node) would be so much cleaner.

So there you have it, what I am trying to accomplish and how I am going
about it.  I have found code to obtain a server list and am working on
getting it functioning.  Why does it NEVER "just work"?  Once that is
working I have seen hints of code that obtains a list of databases in a
server, tables in a database etc.  I am working my way down the food
chain as we speak.  In each case except the server I need to be able to
"type in"
a value if I don't see what I need.  The database will just be created
(I have code to do that) but the table has to be created based on the
first row of a CSV file (I have code written to do that as well).  Bits
and pieces written, filling in the rest.  LOTS of work.

However once I get this finished, I will have a system where I can "find
/ describe" a server / database / table structure, specify where the
files come from and sit back as the files import.  I do this import on a
weekly basis, often more often than that.  It has to be easy.

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Charlotte
Foust
Sent: Thursday, July 19, 2007 1:50 PM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] VB.Net - Option Strict disallows implicit
conversion

Well, if you were REALLY lucky, you'd have someone saying "that works,
but I think I'd do it this way".  There isn't a single right way, there
are just more suitable ways for the present problem.  That's what can
get really maddening.  Of course you do have this list to play in, and
it might help others to see how you're approaching a problem too.

Charlotte 

_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com


_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com

_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com


_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com

_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com





More information about the dba-VB mailing list