Analyzing Linear and Binary Search Algorithms

Search can be considered as one of the most essential features that a software product can have for a good user experience. For instance, whenever we log onto amazon, we can directly go to the search bar and enter the product that we wish to buy, rather than going through all the products one by one!

Search algorithms can get really complex as the amount of data increases, and the key to writing efficient algorithms would be to first understand the basics. Today we will look into two simple search algorithms that work on the Array data structure. This will give you a perspective on how search operation is performed and the implications on efficiency when input data is increased.

Linear Search

As the name suggests, linear search goes over all the items in the array, one by one, until we find the desired element.

Let’s say we have the following array which stores a list of products added to an online grocery shopping cart:

At checkout, we decide to remove Soda from the list. Now in order to do that, we would first need to find the exact location of the item “soda” in our list.

A linear search would proceed as follows:

Step 1: Start from index 0. Is the item “Soda” ?

Nope the item is not Soda. Proceed to the next element

Step 2: Is the item at index 1 “Soda”?

Nope. Proceed to the next element

Step 3: Is the item at index 2 “Soda”?

Nope. Proceed to the next element

Step 4: Is the item at index 3 “Soda” ?

Nope. Proceed to the next element

Step 5: Is the item at index 4 “Soda”?

Nope. Proceed to the next element

Step 6: Is the item at index 5 “Soda”?

Yes! Finally! Phew!

The item soda was found at index 5 (or position 6) in the array by our linear search algorithm. In order to find an item at the 6th position, the algorithm took 6 steps. If Soda was present at position 7, then the algorithm would have taken 7 steps. Thus, we can say that Linear search will take n steps for finding an element at nth position. In other words, the complexity of the algorithm is O(n) or order of n.

Implementation in C#

Following is an implementation of the linear search algorithm in C#:

 public void LinearSearchProduct()
        {
            string[] cart = new string[] { "Apples", "Eggs", "Bread", 
                             "Milk", "Yogurt", "Soda", "Cheese", "Rice", 
                             "Beans" };
            string item = "Soda";
            bool found = false;

            //For loop will go over each element of the array
            for(int i=0; i < cart.Length; i++)
            {
                if(cart[i] == item)
                {
                    found = true;
                    Console.WriteLine("Product found at index: " + i);
                    break;
                }
            }

            if(found == false)
            {
                Console.WriteLine("Product not found");
            }
        }

Growth Analysis

In our example, we just had an array of nine elements. Imagine if the array contains thousands of elements. Even worse, imagine if the item that we are interested in is at the last position in the array! Since the linear search algorithm would need to go over the entire array, it would probably not be a great option to use when we have large amounts of data.

As we can see in the table below, for every new element added to the array, Linear search would need one additional step to find an item, following a growth complexity of O(n):

Number of Elements in ArrayMaximum Number of Steps Taken by Linear Search
55
66
77
88
99

Binary Search

Binary search is a much faster algorithm than linear search. The only caveat is that Binary search expects the data in our array to be sorted. In other words, the binary search algorithm works on ordered arrays. Once we understand how the algorithm works, this will make much more sense.

The algorithm works by eliminating or ignoring half of the elements of the array with every step until we find the desired element.

First, it starts by looking at the middle element of the array. If the item to be searched is equal to the middle element, we get our result.

Else, it checks if the item to be searched is less than or greater than the middle element.

If the item is less than the middle element, then it considers the first half of the array for the next step and ignores the second half.

If the item is greater than the middle element, then it considers the second half of the array and ignores the first half.

Whichever half it chooses, it repeats all the above steps again by checking the middle element and comparing it with the item to be searched.

We do this until we get the desired element.

Let’s go back to our example to visually see how it works:

This image has an empty alt attribute; its file name is image-13.png

We want to search for the item “Soda” in the above array.

First step would be to sort this array following natural ordering (ascending order or Alphabetical in case of strings). So the following would be our input array.

Step 1: Find the middle element of the above array. If the length of the array is n, then the middle element would be n/2 if n is even, and (n+1)/2, if n is odd.

Here, length of array is 9. so our middle element would be at position (9+1)/2 = 5. (This would be index 4, since arrays start from index 0)

Does the middle element = “Soda” ? Nope.

Is “Soda” less than or greater than the middle element? Greater than (Alphabetically Soda comes after Eggs)

Step 2: Select second half of the array and find the new middle element:

length of this new array is 4, so the middle element would be at the second position (4/2) – here index 6

Does the middle element = “Soda” ? Nope.

Is “Soda” less than or greater than the middle element? Greater than (Alphabetically Soda comes after Rice)

Step 3: Select the second half of the array and find the new middle element:

length of this new array is 2, so the middle element would be at the first position (2/2) – here index 7.

Does the middle element = “Soda”? Yes! Return element.

Thus, we can see that the binary search algorithm took only 3 steps to find the desired item.

Growth Analysis

One thing to note is that with every step, the number of elements that we are looking into reduce by half. Let’s look at this from another perspective, that is, for how many new elements would binary search take an additional step?

Number of Elements in ArrayNumber of steps taken by binary searchLog representation (to the base 2)
21Log(2) = 1
42Log(4) = 2
83Log(8) = 3
164Log(16) = 4

Thus we can see that whenever the elements in the array double, the binary search would take 1 additional step to find the item. This is how the time complexity grows for binary search. Since this growth can be represented in log terms, the complexity of binary search is O(logn) or order of log(n).

Implementation

The implementation of binary search in C# is as follows:

        public void BinarySearchProduct()
        {
            string[] cart = new string[] { "Apples", "Eggs", "Bread", 
                            "Milk", "Yogurt", "Soda", "Cheese", "Rice", 
                             "Beans" };
            string item = "Soda";
            bool found = false;

            //we need to sort cart to implement binary search
            Array.Sort(cart);

            int start = 0;
            int end = cart.Length - 1;

            while(start <= end)
            {
                int len = end - start + 1;
                int mid = start + (len / 2);

                if (item == cart[mid])
                {
                    Console.WriteLine("Element found at index: " + mid);
                    found = true;
                    break;
                }
                else if (item.CompareTo(cart[mid]) > 0)
                {
                    start = mid + 1;
                }
                else
                {
                    end = mid - 1;
                }
            }

            if(found == false)
            {
                Console.WriteLine("Element not found in array");
            }

        

Summary

  1. Linear Search iterates over the entire array, one element at a time, until the desired item is found
  2. Time complexity of Linear search is O(n)
  3. Binary Search works on a sorted array
  4. Binary Search eliminates half of the elements with every step, making the search faster.
  5. Time complexity of Binary Search is O(logn)

Feature image source: https://www.pexels.com/@pixabay

The World Of Arrays

Brands of cars. Debit card transactions in a day. A shopping list. All these represent a list or collection of things that can be grouped into a particular category or type.

TypeValue
Car Brands“Honda”, “Toyota”, “Hyundai”, “Ford”, “Audi”
Debit Card Transactions“$25”, “$45.67”, “$140”, “$10”
Shopping List“Jeans”, “Groceries”, “Books”

These all types can also form the data that a computer program works upon. For instance, whenever you do a self-checkout at Walmart, the software that calculates the total due amount would first need the details of the items that you bought from the store. But how does a computer store and process such a list of items?

Here is where data structures come into play, and when it comes to a collection of items, Array is a simple and powerful data structure that stands out.

Definition of Arrays

Array is a type of data structure that stores a collection of items at contiguous memory locations. What does contiguous memory locations mean? Well, think of the computer memory as a collection of blocks, where each block has its unique address (001, 002, 003…..and so on)

Imagining the computer memory as blocks with unique addresses

Each block can be considered as a small storage unit which can hold one element. An array of 5 elements will occupy 5 consecutive blocks in the memory to store its data.

Each memory location is referenced by an index. The first position would be index 0 (and will also be the address of the array). The subsequent indexes would be 1,2,3 and so on.

And now if we want to store the values of car brands in our array, the computer will store them as follows:

Delcaring and initializing arrays

An array is usually declared in a programming language as follows:

//This is C# syntax
//Other programming languages may declare arrays a bit differently

int[] arr = new int[5] // an array to store 5 integers

string[] cars = new string[5] // an array to store 5 car brands

The square brackets [] indicates the compiler that we are declaring an array, and the number inside the square brackets [5] indicates that the array would contain five elements. We need to specify the number of elements in advance, so that the array can reserve the appropriate block of memory to store the elements. (There are collections like dynamic arrays and Array lists which do not require us to specify the number of elements in advance, but these topics are not in the scope of this article)

By using indexes, we can add data to the array:

cars[0] = "Toyota" //adding Toyota to position 1
cars[1] = "Honda"
cars[2] = "Audi"
cars[3] = "Mercedes"
cars[4] = "Ford"

Reading From an Array

We can read data from the array using indexes as well

string myCar = cars[1] //read the car at index 1 or position 2 in the array

Reading from arrays is a fast operation. The computer knows the address of the array, and if we know the position of the value to be read, the computer can do a quick calculation of (address + position) and jump to that address. This is the power of storing items at contiguous locations.

In laymen terms, to read an element from an array takes only one step -> the computer calculating the address of the desired element and reading it.

In computing terms, the read operation of an array would always take constant time, which can be represented as O(1) or order of 1.

Inserting elements into an Array

Data can be inserted into an array, again by using indexes.  The easiest way to add data to the array is at the end of the array.

In the above example, we can add an element at index 5 as follows:

cars[5] = "Hyundai"

We can also insert data in the middle and beginning of an array. But this operation is a bit inefficient as we need to shift data in the array to accommodate for the newly added element. For example, if we have to insert “Kia” at index 2, then we would need to move Audi to index 3, Mercedes to index 4 and so on:

The worst case would be to enter data in the beginning of the array, as we would have to move all values one index to the right:

In laymen terms, we can say that for an array of “n” elements, the worst case scenario would be to insert an element in the beginning, which would require us to shift elements “n” times before the new element can be inserted.

In formal or computing terms, this operation would take “n” time in the worst case scenario and this complexity can be represented as O(n) or order of n.

Deleting from an Array

Deletion would be similar to insertion. Deleting an element from the end of the array would be the easiest. But, if we wish to delete an element from the beginning or middle of the array, then we need to shift all the remaining elements one position to the left.

Deleting from the middle. The elements “Mercedes” and “Ford” need to be moved one position to the left
Deleting from the beginning would require all the other elements to shift one position to the left

Like insertion operation, the worst case scenario for deletion would be the case when we want to delete an element from the beginning and we will have to shift all elements one index to the left. Thus, the complexity of the deletion operation is O(n) as well.

Searching an Array

If we want to search an element in an array (that is, we do not know the index of the element before hand), then we would need to traverse the array until we find that element. The best case scenario would be that the first element in the array is the element that we are looking for. The worst case would be the case where the element is in the end. In such a case we will have to traverse the entire array to get to the last element. Again, the complexity of searching an element in the worst case would be O(n).

Another worst case scenario would be that the element that you wish to search does not exist in the array at all. That would also lead to a full traversal of the array leading to a complexity of O(n).

Summary

  • Arrays are data structures which store a collection of items in contiguous memory locations
  • Each element in an array is assigned an index. The first element will start from index 0
  • Elements can be added and removed from the array using indexes
  • Reading operation takes constant time O(1) and this is one of the strengths of an array
  • Insertion, deletion and search operations take O(n) time in the worst case

A Quick Introduction To Data Structures

Suppose you went to a grocery store and bought two items, let’s say a can of beans and a small packet of rice. You went to the checkout counter, paid for the can of beans and rice, and politely refused to take a bag as these are just two items which you can carry one in each hand and walk to your car. Great!

Now imagine you went to the grocery store and bought eggs, bread, apples, spinach, lettuce, carrots, pasta sauce, pasta….. And you go to the checkout counter. Now you definitely need a bag! As you cannot carry all these items with your two hands, even if it is a short walk to your car.

Computer systems are the same!  Our computer programs work on data and logic. For example, in the program to calculate the sum of the numbers 1 and 2, the data would be the numbers 1 and 2 and the logic would be the addition operation. In such small programs, we do not essentially need a data structure. 

//data - Just two numbers, we can manage without a data structure
int one = 1;
int two = 2;

//call the logic
Sum(one, two)

//the logic
int sum(int a, int b){
 return (a+b);
}

But we seldom write such small programs in the real world. Consider a similar program which calculates the sum of all the numbers between 1 and 100. It would get really messy to handle 100 numbers if we start assigning a variable each to every number.

//data of a hundred numbers
int one = 1;
int two = 2;
........
int hundred = 100;

//calling the sum logic 
//since the sum method accepts only two  numbers
// we need to call the method again, and again, and again
int result = sum(one, two)
result = sum(three, result)
result = sum(four, result)
... 
//and so on

Do we really need over a 100 variables just to calculate a sum of numbers?

Well, we can also use a for loop to make this a bit cleaner, since we know the range of numbers we are working on (1-100)

int sum = 0;
For(int i = 1; i <= 100; i++){
 sum = sum + i;
}

But, the input numbers would not always be in the form of a “range” which can be represented by the index variable (i) of the for loop. For example, the input numbers could be 1, 200, 99, 450, 77 and so on.

This is where data structures come into play. Just like bags in a grocery store which help organize your groceries, data structures are nothing but storage units which the computer uses to organize the data in its memory.

Let’s go back to the grocery bag example.

Would you put loose eggs into a bag? No, they will break! That’s why the eggs are packed in a crate – neatly organized so that they may not break. Similarly, when you start putting groceries in a bag, you will not put tomatoes first and then a bottle of pasta sauce on top of it to avoid the tomatoes from squishing. Thus, when we start putting stuff in the bag, we would sub-consciously be taking care of the following:

  1. Every item is inserted in the bag properly so that nothing breaks
  2. Every item can be retrieved from the bag in proper condition
  3. Every item can be easily searched in the bag, without damaging the bag or other items
  4. Every item can be removed from the bag, without damaging the bag or other items

That is exactly how the computer works with data structures. Each “structure” stores data in such a way so that the basic operations of insertion, retrieval, deletion and search can be done easily without breaking the data structure or affecting the data.

Different Data structures store data differently to cater to different operations that can be performed on the data. Some data structures support easy insertion , while others provide fast search. Following are some common data structures that you would encounter in your coding life:

Arrays: A structure which stores  a series of data elements in contiguous locations in memory for easy retrieval

Linked Lists: Like arrays, store a series of data elements. Unlike arrays, does not store in contiguous locations. Data is stored in nodes. Each node has a pointer to point to the memory location of the next node (which has the next data). This allows for better insertion and deletion operations as compared to arrays.

Maps / Dictionaries: Store data as key-value pairs to search data efficiently

Trees/ Graphs: Like linked lists, Trees store data in nodes. Each node is structured in a way that it can have a parent and children. Graphs are similar to trees in the sense that all nodes are connected to form a kind of network. These are advanced data structures and provide the means to solve complex algorithms.

Thus we can see that different structures organize data differently providing their own advantages. This was a super fast introduction to the world of data structures, to give you an understanding of where data structures come from (without intimidating you!).

To summarize:

  • Data structures are nothing but different ways in which the computer organizes the data being used in a program
  • Different operations that can be performed on data structures are: insertion, deletion, retrieval and search of data elements
  • There are many data structures which are being used in real world programs. They all organize data differently, with different strengths when it comes to the different operations
  • Arrays, linked lists, maps, dictionaries, trees and graphs are some examples of data structures.

In my upcoming posts, we will do a deep dive into the various data structures and start solving real world problems using those data structures.

Let’s uncomplicate coding!

K.