It's useful to estimate the cpu or memory resources an algorithm requires. "Complexity analysis" attempts to determine the relationship between the number of data elements and resource usage (time or space) with a simple formula. This can be useful if you are testing a program with a small amount of data, but later production runs will use large data sets.
Big-Oh (the "O" stands for "order of") notation is concerned with what happens for very large values of N, therefore only the largest term in the formula is needed. For example, the number of operations in some sorts is N2 - N. For large values, N is insignificant compared to N2, therefore this is described as an O(N2) algorithm, and the N term is ignored.
Here is a table of typical cases. Logarithms to base 2 (as used here) are proportional to logarithms in other base, so this doesn't affect the big-oh formula.
constant | logarithmic | linear | quadratic | cubic | ||
---|---|---|---|---|---|---|
n | O(1) | O(log N) | O(N) | O(N log N) | O(N2) | O(N3) |
1 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 2 | 2 | 4 | 8 |
4 | 1 | 2 | 4 | 8 | 16 | 64 |
8 | 1 | 3 | 8 | 24 | 64 | 512 |
16 | 1 | 4 | 16 | 64 | 256 | 4,096 |
1024 | 1 | 10 | 1,024 | 10,240 | 1,048,576 | 1,073,741,824 |
You should be clear about which cases big-oh notation describes. By default it usually refers to the average case, using random data. However, the characteristics for best, worst, and average cases can be very different, and the use of non-random data (often more realistic) data can have a big effect on some algorithms.
Complexity analysis can be very useful, but there are problems with it too.
Big-oh notation can give some very good ideas about performance for large amounts of data, but the only real way to know for sure is to actually try it with large data sets. There may be performance issues that are not taken into account by big-oh notation, eg, the effect on paging as virtual memory usage grows.
Here is a table of typical cases.
Type of Search | Big-Oh | Comments |
---|---|---|
Linear search array/ArrayList/LinkedList | O(N) | |
Binary search sorted array/ArrayList | O(log N) | Requires sorted data. |
Search balanced tree | O(log N) | |
Search hash table | O(1) |
Algorithm | array ArrayList | LinkedList |
---|---|---|
access front | O(1) | O(1) |
access back | O(1) | O(1) |
access middle | O(1) | O(N) |
insert at front | O(N) | O(1) |
insert at back | O(1) | O(1) |
insert in middle | O(N) | O(1) |
Some sorting algorithms show variability in their Big-Oh performance. It is therefore interesting to look at their best, worst, and average performance. For this description "average" is applied to uniformly distributed values. The distribution of real values for any given application may be important in selecting a particular algorithm.
Type of Sort | Best | Worst | Average | Comments |
---|---|---|---|---|
BubbleSort | O(N) | O(N2) | O(N2) | Not a good sort, except with ideal data. |
Selection sort | O(N2) | O(N2) | O(N2) | Perhaps best of O(N2) sorts |
QuickSort | O(N log N) | O(N2) | O(N log N) | Good, but it worst case is O(N2) |
HeapSort | O(N log N) | O(N log N) | O(N log N) | Typically slower than QuickSort, but worst case is much better. |
Example. I had to sort a large array of numbers. The values were almost always already in order, and even when they weren't in order there was typically only one number that was out of order. Only rarely were the values completely disorganized. I used a bubble sort because it was O(1) for my "average" data. This was many years ago when CPUs were 1000 times slower. Today I would simply use the library sort for the amount of data I had because the difference in execution time would probably be unnoticed. However, there are always data sets which are so large that a choice of algorithms really matters.
Lots!
Well, I'll fill things in here as I have time, but don't hold your breath.