Today we look at searching a collection for a particular element and computational complexity.
Once you have a collection of things it becomes a very frequent occurrence that you want to find a particular element of the collection. For example, you might want to search for:
In one type of search, you are only interested whether or
not the element is in the collection (you don't need a
reference to the element). This type of search returns a
boolean value (true
if the element is in the
collection, and false
otherwise).
A second type of search, you want the reference to the
searched for element. This type of search returns a
reference to an object if the element is in the collection,
and null
otherwise.
Both kinds of search can be solved using traversal. By visiting every element in the collection you can determine if a particular element is in the collection.
Suppose you want to search a Portfolio
for an Investment
in a particular
Stock
:
// Portfolio pf = ...
// Investment given = ...
boolean found = false;
for (Investment inv : pf)
{
found = inv.equals(given); // not quite right
}
The problem with the loop is that the value of found
is determined by the last investment visited in the traversal.
Note: This is an informal discussion of loop invariants.
A loop invariant is a formal boolean statement about the relationship between variables used in the loop. They can be used to formally prove that a loop is correct.
true
just
before the loop is run (establishing the invariant).
false
.
true
(maintaining the invariant).
A loop invariant is chosen so that when the loop termination
condition is true
and the
invariant is true
then the goal of the
loop has been achieved.
boolean found = false;
for (Investment inv : pf)
{
found = inv.equals(given); // not quite right
}
What is the termination condition of the above loop?
has the last element been visited
What is the loop invariant?
found is true if inv equals given;
otherwise found is false
The invariant tells you immediately what the problem with
the loop is: found
is determined by the last investment visited in the traversal.
What do we want the loop invariant to be?
found is true if inv equals given;
otherwise found maintains its current value
The invariant leads to following correct loop:
boolean found = false;
for (Investment inv : pf)
{
found = inv.equals(given) || found;
}
When a reference to the searched for object is needed,
you need a variable to hold the reference to the found
object. If the object is not in the collection, then
you normally indicate this by setting the variable to
null
.
The textbook writes the search loop like so:
Investment lastFound = null; for (Investment inv : pf) { found = found || inv.equals(given); if (inv.equals(given)) { lastFound = inv; } }
If there are multiple Investment
s
that are equal to given
then
this code yields a reference to the last
such Investment
.
You must be very careful when writing the if statement
to use inv.equals(given)
because you want
to assign a value to lastFound
only if the current
investment is equal to the one we are looking for.
Suppose you want to find all of the Investment
s
equal to given
. In this case, you must
examine all of the investments, and you need to create
a collection for all of the found Investment
s:
Portfolio allFound = new Portfolio("", pf.size()); for (Investment inv : pf) { found = found || inv.equals(given); if (inv.equals(given)) { allFound.add(inv); } }
Like the previous example, you must be very careful to use
inv.equals(given)
in the if statement.
Suppose you use traversal to search some collection for a
particular element. Also, suppose that the collection
holds n
elements.
What is the smallest number of elements visited in the best case?
1, the desired element happens to be
the first element visited
What is the largest number of elements visited in the worst case?
n, the desired element happens to be
the last element visited,
or it is not in the collection
Computational complexity is a branch of theoretical computer science
concerned with classifying computational problems based on their
difficulty. One measure of difficulty is the worst-case complexity:
how many basic operations are required to solve a problem of size
n
where n
is a large number.
For search of a collection, n
is the size of the
collection. In the worst case, you have to visit all n
elements in the collection. We say that the time complexity of
exhaustive search is T(n
) = O(n
).
O(n
) is pronounced "big-O of n" or "order of n".
f(x) = O(g(x)) as x approaches infinity
if and only if there exists positive real numbers M and x0 such that
|f(x)| <= M|g(x)| for all x > x0
Often, the formal defintion of big-O is not used directly; instead, big-O for a function f(x) is derived using two rules:
Complexity does not tell you the actual running time of a program.
Complexity tells you how the running time depends on
the size of the input n
(for large values
of n
).
For exhaustive search, if you double n
then the running time will also double.
More generally, the worst case complexity of a program will
vary as some function
f(n
). In big-O notation we write
O(f(n
)).
f(n ) |
O(f(n )) |
name |
---|---|---|
1 | O(1) | constant |
log n | O(log n) | logarithmic |
n | O(n) | linear |
n2 | O(n2) | quadratic |
nc, c > 1 | O(nc) | polynomial |
cn, c > 1 | O(cn) | exponential |
n! | O(n!) | factorial |
If a collection stores its elements in some special way, then it is possible to perform search faster than O(n).
If a collection stores its elements in sorted order then you can use binary search to find an element. The worst case complexity of binary search is O(log n).
Suppose the collection has 1 billion elements; then, in the worst case, the loop for exhaustive search will run 1 billion times, whereas the loop for binary search will run only 30 times.
Suppose you wanted to output the investments in a portfolio in order from the smallest total current value to the highest total current value. A simple algorithm that can perform this sorting is called selection sort.
1. find the investment with the smallest current value output it and remove it from the portfolio 2. repeat step 1 until the portfolio is empty
What is the worst-case complexity of step 1? What is the worst-case complexity of the entire algorithm?