Search

Once you have a collection of things it becomes a very frequent occurrence that you want to find a particular element of the collection. For example, you might want to search for:

the May 2003 issue of your favorite magazine collection
"the the" in a text document
a phrase in Google
an investment in a portfolio
a credit number number in a global credit collection

Search

In one type of search, you are only interested whether or not the element is in the collection (you don't need a reference to the element). This type of search returns a boolean value (true if the element is in the collection, and false otherwise).

A second type of search, you want the reference to the searched for element. This type of search returns a reference to an object if the element is in the collection, and null otherwise.

Search by Traversal

Both kinds of search can be solved using traversal. By visiting every element in the collection you can determine if a particular element is in the collection.

Suppose you want to search a Portfolio for an Investment in a particular Stock:

// Portfolio pf = ...
// Investment given = ...

boolean found = false;
for (Investment inv : pf)
{
   found = inv.equals(given);  // not quite right
}

The problem with the loop is that the value of found is determined by the last investment visited in the traversal.

Loop Invariants

Note: This is an informal discussion of loop invariants.

A loop invariant is a formal boolean statement about the relationship between variables used in the loop. They can be used to formally prove that a loop is correct.

A loop invariant must be true just before the loop is run (establishing the invariant).
Inside the loop, the invariant might become false.
At the end of the loop, the invariant must be true (maintaining the invariant).

A loop invariant is chosen so that when the loop termination condition is true and the invariant is true then the goal of the loop has been achieved.

Loop Invariants

boolean found = false;
for (Investment inv : pf)
{
   found = inv.equals(given);  // not quite right
}

What is the termination condition of the above loop?

has the last element been visited

What is the loop invariant?

found is true if inv equals given;
otherwise found is false

The invariant tells you immediately what the problem with the loop is: found is determined by the last investment visited in the traversal.

Loop Invariants

What do we want the loop invariant to be?

found is true if inv equals given;
otherwise found maintains its current value

The invariant leads to following correct loop:

boolean found = false;
for (Investment inv : pf)
{
   found = inv.equals(given) || found;
}

Search by Traversal II

When a reference to the searched for object is needed, you need a variable to hold the reference to the found object. If the object is not in the collection, then you normally indicate this by setting the variable to null.

The textbook writes the search loop like so:

Investment lastFound = null;
for (Investment inv : pf)
{
   found = found || inv.equals(given);
   if (inv.equals(given))
   {
      lastFound = inv;
   }
}

If there are multiple Investments that are equal to given then this code yields a reference to the last such Investment.

You must be very careful when writing the if statement to use inv.equals(given) because you want to assign a value to lastFound only if the current investment is equal to the one we are looking for.

Search by Traversal III

Suppose you want to find all of the Investments equal to given. In this case, you must examine all of the investments, and you need to create a collection for all of the found Investments:

Portfolio allFound = new Portfolio("", pf.size());
for (Investment inv : pf)
{
   found = found || inv.equals(given);
   if (inv.equals(given))
   {
      allFound.add(inv);
   }
}

Like the previous example, you must be very careful to use inv.equals(given) in the if statement.

Search Complexity

Suppose you use traversal to search some collection for a particular element. Also, suppose that the collection holds n elements.

What is the smallest number of elements visited in the best case?

1, the desired element happens to be
the first element visited

What is the largest number of elements visited in the worst case?

n, the desired element happens to be
the last element visited,
or it is not in the collection

big-O

Computational complexity is a branch of theoretical computer science concerned with classifying computational problems based on their difficulty. One measure of difficulty is the worst-case complexity: how many basic operations are required to solve a problem of size n where n is a large number.

For search of a collection, n is the size of the collection. In the worst case, you have to visit all n elements in the collection. We say that the time complexity of exhaustive search is T(n) = O(n).

O(n) is pronounced "big-O of n" or "order of n".

Formal Definition of big-O

f(x) = O(g(x)) as x approaches infinity

if and only if there exists positive real numbers M and x₀ such that

|f(x)| <= M|g(x)| for all x > x₀

Practical big-O

Often, the formal defintion of big-O is not used directly; instead, big-O for a function f(x) is derived using two rules:

if f(x) is a sum of terms, keep only the term with the largest growth rate
if f(x) is a product, any constant terms are omitted

Computational Complexity

Complexity does not tell you the actual running time of a program.

Complexity tells you how the running time depends on the size of the input n (for large values of n).

For exhaustive search, if you double n then the running time will also double.

Computational Complexity

More generally, the worst case complexity of a program will vary as some function f(n). In big-O notation we write O(f(n)).

f(`n`)	O(f(`n`))	name
1	O(1)	constant
log n	O(log n)	logarithmic
n	O(n)	linear
n²	O(n²)	quadratic
n^c, c > 1	O(n^c)	polynomial
cⁿ, c > 1	O(cⁿ)	exponential
n!	O(n!)	factorial

Faster Searches

If a collection stores its elements in some special way, then it is possible to perform search faster than O(n).

If a collection stores its elements in sorted order then you can use binary search to find an element. The worst case complexity of binary search is O(log n).

Suppose the collection has 1 billion elements; then, in the worst case, the loop for exhaustive search will run 1 billion times, whereas the loop for binary search will run only 30 times.

Computational Complexity

Suppose you wanted to output the investments in a portfolio in order from the smallest total current value to the highest total current value. A simple algorithm that can perform this sorting is called selection sort.

1. find the investment with the smallest current value
      output it and remove it from the portfolio
2. repeat step 1 until the portfolio is empty

What is the worst-case complexity of step 1? What is the worst-case complexity of the entire algorithm?

Day 23 — Search

Search

Search

Search by Traversal

Loop Invariants

Loop Invariants

Loop Invariants

Search by Traversal II

Search by Traversal III

Search Complexity

big-O

Formal Definition of big-O

Practical big-O

Computational Complexity

Computational Complexity

Faster Searches

Computational Complexity