How to merge data in an Excel or Calc spreadsheet

Here is the scenario

You have two related datasets in Excel or OpenOffice’s Calc. One is a list of names with phone numbers, and the other is a list of names and email addresses. The two datasets have a unique id in common (like a Social Security number). You want to merge the two of them together. This is a common problem with an easy solution.

Sample

Using the picture to illustrate the problem, on the left-hand side is a list of names and phone numbers. On the right, a list of names and email addresses. The goal is to have one list, not two.

The solution

Vlookup – used by both Excel and OpenOffice’s Calc.

Vlookup: step-by-step

  1. =vlookup(a2 – “a2″ refers to what is being looked up. Meaning look at the cell, a2, and find it somewhere else – like within the other dataset.
  2. =vlookup(a2,g1:i11, – “g1:i11″ is the range, that is, the “somewhere else” in question. A2 is the value being looked up in the range g1:i11. Note: Excel uses commas between each parameter, whereas Calc uses a semi-colon. Otherwise, the formula is identical.
  3. =vlookup(a2,g1:i11,3, – When a match is found, what next? “3″ means: give me the value from the cell 2 columns over from where you found the match.
    • 1 would be the column where the match is found
    • 2 would be one column to the right
    • 3 is two columns to the right.
  4. =vlookup(a2,g1:i11,3,0) – “0″ refers to whether you want a fuzzy match. False or zero is the answer. I want an exact match.

What is with the dollar signs?

I used dollar signs in the images above. The dollar signs tend to confuse people. When you copy the formula from one cell to another, the spreadsheet (being so smart) shifts everything relatively.

Some of this shifting you want, some of it you don’t. I do want it to shift from a2 to a3 to a4, but I want the range where it is finding these matches to stay exactly the same.

To control for this, you add dollar signs to indicate no shifting. There is a dollar sign in front of the a in $a2, because I want column a to stay consistent no matter where I copy and paste this formula. The “2″ can change, but not the “a.”

Similarly I indicated “$g$1:$i$11″ – this is the same as g1:i11, except that I’m specifying that I don’t want any shifting to happen if I copy and paste the formula. Neither column nor row shifting is allowed.

Paste Special

Finally, if you go and delete columns g:i at this point, the email lookup column will change, and you’ll get a bunch of “#N/A” values. This is because you just broke the reference – the formula needs to be able to refer to the other cells.

To correct for this, what I would do is

  • Insert a new column next to “email lookup.”
  • Highlight and copy “email lookup.”
  • Select the column that you just created.
  • Select “paste as values” – if you don’t know what this means, find the edit pull-down menu, and select “paste special.” There is a choice there for paste values – meaning it will paste only the text (or number). No formatting, no formulas – just the values.

Was this exactly what you needed? Did this how-to suck? Did I leave something out? What other spreadsheet how-to’s do you want to see?

This entry was posted in spreadsheets (Excel or Calc) and tagged , , , , , , , , , , , . Bookmark the permalink. Trackbacks are closed, but you can post a comment.

29 Comments

  1. Tina
    Posted July 24, 2009 at 7:05 pm | Permalink

    What if you have more than one column to lookup to get the match value?

  2. Dposorio
    Posted July 25, 2009 at 12:20 pm | Permalink

    Tina: It sounds like you have two datasets. One dataset has two columns (like ‘First Name’ and ‘Last Name’). The other has the information in one column (like ‘name’). And, you want to merge the two datasets. The first step would be to combine the two columns of the first dataset into one column, so that it matches up with the other dataset. I created a new post to illustrate. See http://dposorio.com/wordpress/archives/221.

  3. Posted July 27, 2009 at 10:13 pm | Permalink

    Thanks, I wasn’t aware of vlookup. I assume you could do the same thing with multiple sheets by simply referring to the sheet name?

  4. Posted July 28, 2009 at 8:57 am | Permalink

    Leroy: yes, that’s exactly how it works. You can simply mouse your way to another sheet for the vlookup. Notice how if your sheet name has a space in it, it is enclosed within apostrophes/single quotes. I have found it easy to break these references from one worksheet to another, so I try to minimize it whenever I can.

  5. Ajay
    Posted October 25, 2009 at 2:20 pm | Permalink

    I want to no use of vlookup in data entry for correction on data
    one real sheet and another throughout that new sheet which willl be made
    so how will vlook up used on new sheet with there old refernece…………

  6. Posted April 18, 2010 at 1:18 am | Permalink

    Brilliant. Exactly what I was looking for. Thank you so much. Incindentally, I believe that to paste values in Calc you have to select “text” in the paste special box. A marginally shorter way is to click the little tab next to the paste icon and select “unformatted text” and press enter.

  7. Posted April 28, 2010 at 1:11 pm | Permalink

    Thanks for the feedback, Nat!

  8. Posted March 1, 2011 at 3:35 pm | Permalink

    Can this function be used across multipe sheets within a workbook?

  9. Posted March 1, 2011 at 4:20 pm | Permalink

    Yes! As you’re typing in the formula =vlookup… at certain points (usually after the 1st comma), try clicking on another worksheet and selecting a range. It’ll work beautifully, you’ll end up with something that will look like: =vlookup(D1,Sheet2!A1:C11,2,0) — in this example, the “Sheet2!” indicates the name of a different worksheet.

  10. Mark
    Posted September 21, 2011 at 8:04 am | Permalink

    Great tip! Thanks.

    I’m wondering if their is a method to handle many-to-one relationships. For example if Dennis has 2 phone numbers, can they each be merged?

  11. Posted September 21, 2011 at 8:32 am | Permalink

    Hi Mark:

    Thanks for the question. Are you talking about something like:

    1) Some contact info that includes a column for “phone”
    2) Another contact list that also includes a column for “phone”

    And the problem is that for an individual (say Dennis) is in both lists, has entries for “phone” in each, but the entries are different? And, you don’t want to get rid of either of those entries, you want to keep both?

    If so, I can see a few ways of handling this. Let me know if I’ve got the problem right, so I don’t reply with a solution to a different problem all together.

  12. Mark
    Posted September 22, 2011 at 7:40 am | Permalink

    Hi Dennis,

    Good idea to clarify. My challenge is less about an additional column or data conflict between thetwo sides being merged. It’s more about if there are additional rows/instances on one side of the merge than the other. So in your original sample screenshot if instead of: Dennis, Amy, Layla in column C, rows 2,3,4 it actually read: Dennis, Dennis, Dennis. The additional instances may represent your alternate numbers.

    I looked around a little and found this link, which includes conditional logic. http://www.excelforum.com/excel-worksheet-functions/619202-merging-and-consolidating-two-data-sets.html. I have not tried this yet though.

    thx,
    Mark

  13. Posted September 25, 2011 at 9:30 am | Permalink

    It sounds like it is a question of cleaning up the data.

    If you have multiple entries for the same individual, then the problem is around having duplicates.

    I think the way to handle this is to prep each list for an eventual merge, and that you will spare yourself heartache if you do this BEFORE anything gets merged.

    Take the following with a grain of salt, because I have not seen the file and there is probably a lot I do not know, but this is one way I might handle it.

    Using the original example as a basis, let us imagine that column C has three instances of “Dennis”.

    Before merging anything, I would simply sort by name (column C). I would create a new temporary column (let’s say column D), and enter a formula like:
    =IF(c1=c2,”duplicate”,”").

    If there are duplicates, the cells in column D would say “duplicate”. If not, the cells would look blank.

    When I found duplicates, I’d stare at them, and figure out a plan.

    If the duplicate rows are identical, I can simply delete one. If they differ, I would wonder if one contained obsolete data and I would try to figure a way to get rid of what I do not need.

    If names are identical, but the other columns are not, and I want to keep all the data for some reason, (for example, I have phone entries, they are all correct, it is just that one is a home land line, other is a work land line, cell, work cell, etc…) I would take the most complete entry and start adding columns for phone 1, phone 2, phone 3, etc… and then do some copying and pasting.

    Either way, the goal is to have one row per individual (or whatever) for each list. Once that is accomplished, I would then proceed to the merge portion.

    I hope this is helpful and relevant to what you are facing.

  14. Ti
    Posted October 19, 2011 at 9:05 am | Permalink

    This is really useful but I can’t quite fit it to my problem. I’ve got two columns of data one (column B) is a list of 500 errors over 3 months, each month runs on straight after the next. The other (column A) is the number of times that error has occurred per month. I want to pull into a separate worksheet the total number of instances per error message.

    Is there a way I can lookup up the error message text and get it to return the sum of all the matching numerical entries?

  15. Posted October 19, 2011 at 4:35 pm | Permalink

    Hi Ti,

    The 3 month issue in the same column is kind of throwing me. Do you mean something like:

    Column A: # of instances
    Column B: error type
    Column C: June
    Column D: July
    Column E: August

    Not sure yet, but I think the answer will end up being related to SUMIF() or COUNTIF().

  16. Ti
    Posted October 20, 2011 at 6:03 am | Permalink

    Thanks for getting back to me – the table looks like this

    Col A: # instances
    Col B: type of error

    I’ve removed the month value element, Col B has some errors types repeated, each with their own # value in Col A.

    So error type ‘x’ appears 3 times in Col B with a value for each, error ‘y’ appears twice and error ‘z’ only once and so on.

    I guess I want to pull into a worksheet, one column with a list of all the unique error types and one column with the combined total of all the # of instances for each error type occurence.

    Hope that makes sense! Thanks, Ti

  17. Posted October 20, 2011 at 6:35 am | Permalink

    Ok, this doesn’t sound so bad.

    First, I’d get rid of duplicates from column B: make sure column B is selected -> Data -> Filter -> Advanced Filter (copy to new location, unique records only). This will copy unique values from column B into a new column (let’s say column F)

    Second, next to column F (let’s say column G): for each value, enter something like =sumif(column-B-range,f1,column-A-range).

    I hope this makes sense!

  18. Ti
    Posted October 20, 2011 at 7:23 am | Permalink

    Amazing – it’s worked, thank you!

  19. Will
    Posted November 16, 2011 at 10:45 am | Permalink

    For years, I have not understood the purpose of the $ in formulas. EUREKA! Thank you!

  20. Posted November 16, 2011 at 11:59 am | Permalink

    I know how you feel, Will. I felt the same way.

  21. Sash
    Posted November 17, 2011 at 2:08 am | Permalink

    Hi,
    I am analysing household survey data in Excel and there isa need to merge six datasets. The structue looks as follows:
    Each row contains data on one, unique household.
    Column 1: household ID (like 001-03-11-04, which is actually a combination of codes for household, village, district and region)
    Column 2: First name of household head
    Column 2: Age of household head
    Column 3: Gender of household head (1-male; 2-female)
    Column 4: Membership in… (1-yes; 2-no)

    A total of 44 columns.

    The other dataset has informaton for only some of the households and all the household members are listed in rows, some columns are also different from the other dataset:
    Column 1: Household ID (the same as in the first dataset)
    Column 2: Household member ID (like 001-03-11-04-01, here the last two digits are codes for household head, spouse, son/daughter etc.)
    Column 3: Household member age
    Column 4: Education (for each household member, coded)

    I want to merge the two datasets and use PivotTable to analyse, for example membership in the organizaton by age groups and gender. This is easy if the data is merged.
    Any suggestion is appreciated. Thanks.

  22. Posted November 19, 2011 at 8:17 am | Permalink

    Hi Sash, This doesn’t sound so bad. As I’m sure you already know, merging is made possible because you have a common column in both datasets (Household ID).

    I don’t have a lot of advice, but let me say this:

    – personally, I love making copies of things. I would keep the datasets you have, and make “working” copies of each. I’d delete unnecessary columns from each dataset, so that the end result contained only the columns I wanted.

    – if you have two datasets that are being merged, be deliberate about which dataset is appended to which dataset. Should you start merging using the dataset with more records? Or the one with more columns? Will the end result contain “everything” it can even if there are a lot of blank values?

    – I would probably spend time making sure each dataset is as “clean” and consistent with itself before merging it to another dataset.

    – I also love keeping a “log” of what I’m doing & why in a separate text file (like NotePad).

    – If you’re good about keeping copies at various stages and good about taking notes, you are more free to make mistakes. If you do, take a deep breathe, try to figure out what happened, accept it, and start over.

    Best of luck!

  23. Sash
    Posted November 22, 2011 at 9:54 am | Permalink

    Hi Dposorio,

    Thanks for responding. Yes, I do make a lot of copies and what I actually want is…getting the job done without making too many copies :)
    I am indebted to two actions I take a lot while analysing data in Excel: copy-paste and filtering.

    Have a good day!

  24. Miguel C
    Posted January 8, 2012 at 2:35 pm | Permalink

    Hi Dposorio,

    This how to is great. I am starting to understand the VLOOKUP function but I hope you can help me out with a more complicated way of using it (I think).

    I have 3 different workbooks. Lets keep using names for this example. Workbook 1 has ID, Names and Phone Numbers. The second workbook has ID, Names, Age, Department. The 3rd workbook has ID, Names, Email, Home Address, and Emergency Contact. Workbook 1 is the main workbook as workbook 2 or 3 may be missing some employees (or not, but this makes it impossible to simply sort each workbook and copy/paste the information to the first/main spreadsheet). Can VLOOKUP be used to merge all this information into one? If so, how and if not I hope you know of a way. Thanks for any response.

  25. Posted January 9, 2012 at 11:05 am | Permalink

    Hi Miguel C,

    I’m glad you found this post useful. What you propose is fairly straightforward. And, yes, VLOOKUP can be used to merge all the info into one. The main reason that you will be able to do so is because of the duplicated (triplicated?) ID columns – one in each sheet.

    The trickiest question you have is how and where to merge the information to. Personally, I would create a new sheet. I’d call it summary. It would contain the summarized info from the 3 other sheets thanks to VLOOKUP.

    Here’s a picture of how I organized my “summary” sheet.

    Next, I’d take the ID column from sheet 1 and copy it into column A in the “summary” sheet. After all, sheet 1 is the “main” worksheet. This ID column can be used as the basis for all VLOOKUPS to come.

    To get the VLOOKUP to work from one sheet to another, you’ll type out the formula, and when you get to the 2nd parameter (the table array), just select the section in the other sheet with your mouse.

    The biggest headache, will be cleaning things up. You’ll end up with a bunch of values of zero. These result from doing a VLOOKUP on something that is blank. The VLOOKUP may have worked & found a match, it’s just that what it’s bringing back to you is an empty cell and that’s getting translated to a zero value.

    I would probably go through sheets 1-3 before starting any vlookup, and finding empty cells (this can be done using the pull down menu data -> filter -> aut0-filter). Once they are found, you can substitute empty cells with a value like “BLANK”. This would be a lot easier to clean up than dealing with empty cells.

    I hope you find this useful & not overwhelming. Happy merging!

  26. Al
    Posted January 27, 2012 at 11:32 pm | Permalink

    Hi Dposorio,

    The Vlookup function works great if I use the 1 (1 would be the column where the match is found) as the value from the cell it found. But if I use a 2 instead of 1, I get “#REF!” as a result instead of the numbers in the 1st column to the right of the matched column. I tried to useExcel help but it just added more fog factor.

    Any suggestions?

  27. Roy
    Posted January 28, 2012 at 10:14 am | Permalink

    Hi Dposorio,

    Wonderful information you are providing here. My question is somewhat related in that it involves merging data. I have a list of people and their emails. The company has changed names so the email’s domain changed too. Example john@abc.com is now john@xyz.com. It’s a challenge to do a text to columns because names are of differing lengths. And even if this was possible, how would I incorporate the new domain into the column?
    Thanks in advance for any suggestions,
    Roy

  28. Posted January 28, 2012 at 8:32 pm | Permalink

    Hi Al,

    Typically you get #REF! when you are “looking up” a column that is outside of the table array.

    Imagine that I enter
    =vlookup(a2,g1:i11,3,0)

    In the example, I have columns g-i. That’s 3 columns for my table array. If I had entered a 4 instead of 3 (as the 3rd parameter), Excel would return a value of #REF!

    Check that – is that the problem you’re facing?

    – Denns

  29. Posted January 28, 2012 at 8:43 pm | Permalink

    Hi Roy,

    I’m not entirely sure, but I suspect that the answer might be the use of Contr0l+H (or Find & Replace). You can quickly replace every instance of @abc.com with @xyz.com. There will be options around whether the “find” represents the entire cell contents or not. You can perform this whole Find & Replace operation based on a selection of a column or some cells or on no selection at all (that last one would mean the entire table) – you can even do this across multiple sheets in one Excel file.

    Also, with text-to-columns, one option for creating columns from text is around setting a fixed width. I never use that. I always do it based on a separator – a comma or a semi-colon. You can do more than one pass. One of those passes could be based on the @ symbol. — But, maybe you don’t even need to worry about that – maybe the find & replace thing is all you need.

    – Dennis

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>