VLookup tricks: why 2 VLOOKUPS are better than 1 VLOOKUP

Два ВПР лучше одного, понятно без перевода с английского.
vlookup_shark

For all you VLOOKUP junkies who can’t get enough of VLOOKUP, here is a wonderful reason to use 2 VLOOKUPs instead of 1 !!

Exact Match VLOOKUP is slow

When you have a lot of data in Excel (think 10,000 or 100,000 or 1,000,000 rows) you often need to detect when a value you are looking for does not exist in the data.
You can do this using VLOOKUP and the formula would look something like this, supposing you were trying to get information about a Part Number from the 4th column in a Parts List:

=VLOOKUP(PartNumber,PartsList,4,FALSE)

If Part Number is not in the Parts List then the formula will return #N/A. The final VLOOKUP argument (FALSE) tells VLOOKUP to do a linear exact match search of the Parts List, starting at the first row and looking at each row in turn until it finds the Part Number or reaches the end of the Parts List.

This works well with a small number of VLOOKUPs and when Parts List has a small number of rows, but for large numbers of VLOOKUPs with a large PartsList it can take minutes to calculate.

This problem is made even worse by the fan-out effect of making a change to the Parts List table.
Because of the way that Excel’s smart recalculation engine tracks dependencies, if you make a change to a particular row the Parts List table ALL the formulas that refer to Parts List will be recalculated, even if the result of the formula does not use the row that was changed.

So how do you speed up this slow calculation?

If you sort the PartsList table on Part Number then it would be nice to be able to use a formula like this:

=VLOOKUP(PartNumber,PartsList,4,TRUE)

The TRUE as the final VLOOKUP argument tells VLOOKUP to do a binary search of the parts list.

Binary Search is lightning fast compared to the linear search.

But it does not care about missing data: if the PartNumber does not exist VLOOKUP will happily find the next largest part number that is less than Part Number.

And it won’t tell you that it could not find the Part Number, so its extremely difficult to spot that you have got the wrong answer!

Thats why many people say you should never use the TRUE argument in VLOOKUP.

VLOOKUP has a major design error

Oh and by the way TRUE is the default if you don’t enter anything for the fourth argument.
So if your data is not sorted, or you have missing data, and you forget to add the FALSE argument you will get the wrong answer.

The magic solution: 2 VLOOKUPs are better than 1

OK, so here is the magic trick you can use to make safe fast VLOOKUPs on sorted data, even when there may be missing or mismatched data.

Suppose you use VLOOKUP to lookup the Part Number and return itself from the Part Number Column (column 1) in the Part Number Table.

=VLOOKUP(PartNumber,PartsList,1,TRUE)

If the PartNumber exists this formula will return the same Part Number as the PartNumber you gave it.

But if the PartNumber does not exist the formula will give you a different part number!

So we can use IF to find missing Part Numbers:

=IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, “Exists”, “Missing”)

And so for the final magic formula you just replace the “Exists” with the VLOOKUP for the data you want!

=IF(VLOOKUP(PartNumber,PartsList,1,TRUE)=PartNumber, VLOOKUP(PartNumber,PartsList,4,TRUE), “Missing”)

Because VLOOKUP binary search is so much faster than VLOOKUP linear search this trick is faster even with small numbers of rows (50 or so).