Re: scripting / data problem.
- From: Patton Echols <p.echols@xxxxxxxxxxx>
- Date: Mon, 09 Nov 2009 13:13:54 -0800
On 11/09/2009 03:31 AM, Hal Burgiss wrote:
On Sun, Nov 08, 2009 at 11:41:34PM -0800, Patton Echols wrote:
file 1 - | email |
file 2 - | email | f-name | l-name |
file 3 - | f-name | l-name | company | phone |
file 4 - | company | address | city | state |
(By the way, each one has some duplicates with less than complete info.)
What we need is to be able to populate all of these files with all the
available data for each contact. The concept I have in mind is to
compare the files in pairs. So file 1 would go into file-L (for Last)
Then file 1 would be compared with each line of file-L and if there is a
record with matching email, then any extra info is added , but not
overwriting anything that is already in any field So the final file
would look like this:
[...]
I'll vote for the mysql approach. Create a csv for each file, import that intoDo you mean that all the field data must match? Or just, for example,
mysql so you have 4 tables. Then create a query that will dump out the data
the way you want. You can dump that into another table or export it. For this
to work, the fields will have to match in each table. So Comany in file3 is
equivalent to Comany in file4 (not something like Acme, Inc vs Acme,
Incorporated).
the email fields? Because part of the problem is that I have non
matching / missing data depending on file.
In order for the relations to work across all 4 files, the fields that are
being keyed on, *should* match exactly. You can get around the missing data by
using a 'left join' syntax.
Ok, I'll read up on that syntax. Since the point of the exercise is to
merge records where there is overlap AND no conflicting data.
SELECT a.email, b.firstname, b.lastname, c.company, d.address FROM file1 a,
file2 b, file3 c, file4 d WHERE a.email = b.email and
concat(b.firstname,b.lastname) = concat(c.firstname,c.lastname) and c.company
= d.company INTO OUTFILE '/tmp/output.txt' (crude and not tested).
Data manipulation is much more flexible once its in a database. At least when
you have something this squirrelly.
Does the concat commands mean that there would no longer be a separate
first name and last name?
No, that's just part of the query that contstructs a relationship across all
four tables. The output part is the 'b.firstname'. The concat thing is just
one way to make sure the columns you are trying to match, actually do match.
This will be problematic if one table has 'John Doe' and the other has
'Johnathon Q. Doe, III' type of stuff.
My current sense of the data is that the missing info is the basic
problem, not conflicting info. What may need to happen is that we merge
the missing info and leave "duplicate" records to be dealt with by hand
where "John" has his name spelled different ways.
Also, it looks like the "=" is a requirement. Is that what you meant by
the fields having to be the same? Or could I arrange for it to have
file 1 overwrite file 2 for certain fields. Or only blank fields?
An "equivalency" is a lot cleaner for the query part anyway (where you are
matching data in one table against another). But you can do anything you want
with the output part. So, instead of using b.firstname, you could use
c.firstname instead. They would both be in their respective tables (a, b, c,
and d).
Ok, so the output is the part between the SELECT and FROM statements?
So much to learn, so little time! (kidding, this is really helpful)
Thanks.
--
ubuntu-users mailing list
ubuntu-users@xxxxxxxxxxxxxxxx
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
- Follow-Ups:
- Re: scripting / data problem.
- From: Hal Burgiss
- Re: scripting / data problem.
- References:
- scripting / data problem.
- From: Patton Echols
- Re: scripting / data problem.
- From: Hal Burgiss
- Re: scripting / data problem.
- From: Patton Echols
- Re: scripting / data problem.
- From: Hal Burgiss
- scripting / data problem.
- Prev by Date: Re: MacBook Pro and Ubuntu.
- Next by Date: Re: scripting / data problem.
- Previous by thread: Re: scripting / data problem.
- Next by thread: Re: scripting / data problem.
- Index(es):
Relevant Pages
|