Donate profile for Rudi at Stack Overflow, Q&A for professional and enthusiast programmers

21/09/2015

695 views

Which Search Algorithm? - Part 1


Which search algorithm to use?

Searching is an enormous topic, so I am not even going to attempt to cover everything there is to cover. This article will be the first in a two part article that will focus on the four most talked about search algorithms, what they are for, when we should use them and how they work. The four algorithms are exhaustive search, binary search, depth-first search, and breadth-first search. In this first article, we will be covering the first two.


In order to get the best out of these articles there are a few things you’re going to need a basic understanding of in advanced. You’re going to need a decent knowledge of abstract data types, in particular arrays, lists, trees, graphs, stacks, and queues. You’re also going to need some familiarity with programming languages in order to interpret the examples I’ll be giving. The examples I will be giving of various search implementations will be in Java, JavaScript and Python. Lastly, an understanding of Big O notation will be an advantage but not essential.


From the most basic perspective performing a search simply means checking a collection for the presence of something and getting some very basic information. That information could simply be a boolean true or false value indicating whether the search found what you were looking for, the information could indicate the position of what you were looking for within a collection, or the information could be the entity you were searching for itself.


Let’s start with the absolute most basic of our four search types.



Exhaustive Search


An exhaustive search has another, less flattering, name; a brute-force search. If you’re a programmer of any description chances are you have used an exhaustive search whether you knew it or not. That’s because an exhaustive search is simply the process of checking one element of a collection at a time for an element that meets a certain criteria. This means that an exhaustive search may potentially have to check every single element in a collection to find what it is looking for. We say that an exhaustive search has a time complexity of O(n), which means that the worst case time required for an exhaustive search scales directly with the size of the collection. As such they can become unmanageably slow when working with very large collections. Below are some examples of simple functions demonstrating exhaustive searches that check for the presence of an integer in an array.



boolean exhaustiveSearch(int find, int[] list){
   for(int item : list){
       if(find == item) return true;
   }
   return false;
}


We use exhaustive searches on flat collections (not trees or graphs, we're talking sets or lists) where we know little to nothing about the order or sorting of the collection. As you'll see later we can do some much more clever searching if we know for certain that a collection's elements are ordered and sorted. Take a look at the graphic below for a visual representation of how it works. Try searching for elements in the collection.





Binary Search


The binary search is the big daddy of search. Unfortunately we can only use a binary search in very specific circumstances. What is it that we need? An ordered, sorted collection. That ordered collection can be an array list (afraid a linked list just won't cut it), a binary tree, or a few other niche data types. We often say that a binary search is a divide and conquer approach. What we mean by this is that we start our search in the middle of the collection (or the root of a binary tree) and we check there. Because the collection is ordered we can now determine something very important. Is our first check greater, or less than the value we are looking for? If it's greater, we know that the value must be on the side of the collection that contains the lesser values (or vice versa) thereby excluding the other half from the rest of the search. So we've only made one actual check but effectively checked half of the whole collection. Fantastic. Now all we do is repeat this process with the next half to check, and keep going until we find what we're looking for or we've run out of elements to check. As before, here are some working examples of functions that perform a binary search on lists.



int binarySearch(int find, int[] ints){
   int bottom = 0;
   int top = ints.length - 1;
   while(bottom <= top){
       int middle = (bottom + top)/2;
       if(ints[middle] == find) {
           return middle;
       } else if(ints[middle] > find) {
           top = middle - 1;
       } else if(ints[middle] < find) {
           bottom = middle + 1;
       }
   }
   return -1;
}


Apparently, given an hour, only 10% of programmers can write a working, bug free, binary search. Reading the above simple examples you might not think it. They seem fairly simple but you might be surprised how many fringe cases, complications and logical issues there are to account for. I'm not going to lie, my binary search examples took me some time.


How great exactly is a binary search compared to a brute-force search? Remember we said a brute-force search has a complexity of O(n)? Well, a binary search has a complexity of O(log2n). What this means is that we'll only have to perform as many checks as times you can half the size the collection. To get a real appreciation of what this means, we could search a collection of one million in twenty checks and a collection of one billion in thirty checks in the worst cases. Pretty fast right? There is a disadvantage of course, we need to be using ordered collections. Adding and removing elements from ordered collections is slow, so you only want to be using binary searches on data that either doesn't change regularly, or can be updated asynchronously by another thread. As before there are working graphical examples to try below;





That's it for our first part, I hope it has helped. In the next article we will be covering breadth-first and depth-first searches. Until then, if you have any questions, suggestions, criticism of praise you can pop them in the comments below.