inicio mail me! sindicaci;ón

Dennis P. Osorio’s Blog

Technology, E-Activism, and Other Items of Interest

Merging datasets in a spreadsheet

Here´s the scenario: you have two related datasets. Let´s say one is a list of name with phone numbers, and the other is a list of names and email addresses. The two datasets have a unique id in common (like a Social Security number). You should be able to merge the two, right? You´d be surprised at how difficult some folks find this. Well, I´m here to tell you, it´s surprisingly easy.

Here is a picture to illustrate the problem. On the left, is a list of names and phone numbers. On the right, a list of names and email addresses. The goal is to have one list, not two.

The solution is a formula used by both Excel and OpenOffice´s Calc: vlookup.

Here´s how the formula works.
=vlookup(a2

A2 refers to what is being looked up. Meaning look at the cell, A2 and find it somewhere else.

=vlookup(a2;g1:i11;
g1:i11 is the range, that is, the somewhere else in question. A2 is the value being looked up in the range g1:i11. Note: Excel uses commas between each parameter, whereas Calc uses a semi-colon. Otherwise, the formula is identical.

=vlookup(a2;g1:i11;3;
When a match is found now what? Give me the value from cell 2 columns over. 1 is the column where the match is found, 2 is one column to the right, 3 is two columns to the right.

=vlookup(a2;g1:i11;3;0)
The “0″ refers to whether you want a fuzzy match. False or zero is the answer. I want an exact match.

Here it is all merged together.

You might notice that I added some dollar signs. This tends to confuse people. When you copy the formula from one cell to another, the spreadsheet (being so smart) shifts everything. Some of this shifting you want, some of it you don´t. I do want it to shift from a2 to a3 to a4, but I want the range where it is finding these matches to stay exactly the same. To control for this, you add dollar signs to indicate no shifting. There is a dollar sign in front of the a in $a2, because I want column a to stay consistent no matter where I copy and paste this formula to. The “2″ can change, but not the “a”

Similarly I indicated “$g$1:$i$11″ - this is the same as g1:i11, except that I´m specifying that I don´t want any shifting to happen if I copy and paste the formula. Neither column nor row shifting is allowed.

Finally, if you go and delete columns g:i at this point, the email lookup column will change, and you´ll get a bunch of “#N/A” values. There is a reference that you just broke. To correct for this, what I would do is insert a new column next to “email lookup”. Highlight and copy “email lookup”, select the column that you just created and paste as values. If you don´t know what this means, find the edit pull-down menu, and select paste special. There is a choice there for paste values - meaning it will paste only the text (or number). No formatting, no formulas - just the values.

No comments yet »

Your comment

HTML-Tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>