Today we look at searching a collection for a particular element and computational complexity.
Once you have a collection of things it becomes a very frequent occurrence that you want to find a particular element of the collection. For example, you might want to search for:
In one type of search, you are only interested whether or
not the element is in the collection (you don't need a
reference to the element). This type of search returns a
boolean value (true
if the element is in the
collection, and false
otherwise).
A second type of search, you want the reference to the
searched for element. This type of search returns a
reference to an object if the element is in the collection,
and null
otherwise.
Both kinds of search can be solved using traversal. By visiting every element in the collection you can determine if a particular element is in the collection.
Suppose you want to search a Portfolio
for an Investment
in a particular
Stock
:
// Portfolio pf = ...
// Investment given = ...
boolean found = false;
for (Investment inv : pf)
{
found = inv.equals(given); // not quite right
}
The problem with the loop is that the value of found
is determined by the last investment visited in the traversal.
A loop invariant is a formal boolean statement about the relationship between variables used in the loop. They can be used to formally prove that a loop is correct.
true
just
before the loop is run (establishing the invariant).
false
.
true
(maintaining the invariant).
A loop invariant is chosen so that when the loop termination
condition is true
and the
invariant is true
then the goal of the
loop has been achieved.
boolean found = false;
for (Investment inv : pf)
{
found = inv.equals(given); // not quite right
}
What is the termination condition of the above loop?
has the last element been visited
What is the loop invariant?
found is true if inv equals given;
otherwise found is false
The invariant tells you immediately what the problem with
the loop is: found
is determined by the last investment visited in the traversal.
What do we want the loop invariant to be?
found is true if inv equals given;
otherwise found maintains its current value
The invariant leads to following correct loop:
boolean found = false;
for (Investment inv : pf)
{
found = inv.equals(given) || found;
}
When a reference to the searched for object is needed, you use the exact same approach except you also need a reference variable to hold the reference to the searched for object:
Investment lastFound = null; for (Investment inv : pf) { found = found || inv.equals(given); if (found) { lastFound = inv; } }
If there are multiple Investment
s
that are equal to given
then
this code yields a reference to the last
such Investment
.
Suppose you want to find all of the Investment
s
equal to given
. You need to create
a collection for all of the found Investment
s:
Portfolio allFound = new Portfolio("", pf.size()); for (Investment inv : pf) { found = found || inv.equals(given); if (found) { allFound.add(inv); } }
Suppose you use traversal to search some collection for a
particular element. Also, suppose that the collection
holds N
elements.
What is the smallest number of elements visited in the best case?
1, the desired element happens to be
the first element visited
What is the largest number of elements visited in the worst case?
N, the desired element happens to be
the last element visited,
or it is not in the collection
Computational complexity is a branch of theoretical computer science
concerned with classifying computational problems based on their
difficulty. One measure of difficulty is the worst-case complexity:
how many basic operations are required to solve a problem of size
N
where N
is a large number.
For search of a collection, N
is the size of the
collection. In the worst case, you have to visit all N
elements in the collection. We say that the complexity of
exhaustive search is O(N
).
O(N
) is pronounced "big-O of N" or "order of N".
Complexity does not tell you the actual running time of a program.
Complexity tells you how the running time depends on
the size of the input N
(for large values
of N
).
For exhaustive search, if you double N
then the running time will also double.
More generally, the worst case complexity of a program will
vary as some function
f(N
). In big-O notation we write
O(f(N
)).
f(N ) |
O(f(N )) |
name |
---|---|---|
1 | O(1) | constant |
log N | O(log N) | logarithmic |
N | O(N) | linear |
N2 | O(N2) | quadratic |
Nc, c > 1 | O(Nc) | polynomial |
cN, c > 1 | O(cN) | exponential |
N! | O(N!) | factorial |
If a collection stores its elements in some special way, then it is possible to perform search faster than O(N).
If a collection stores its elements in sorted order then you can use binary search to find an element. The worst case complexity of binary search is O(log N).
Suppose you wanted to output the investments in a portfolio in order from the smallest total current value to the highest total current value. A simple algorithm that can perform this sorting is called selection sort.
1. find the investment with the smallest current value output it and remove it from the portfolio 2. repeat step 1 until the portfolio is empty
What is the worst-case complexity of step 1? What is the worst-case complexity of the entire algorithm?