Soldat Forums Archive - Efficient searching (and sorting)

March 30, 2005, 7:40 am

Only way to go is the binary search, which means you have to sort your data... say you ahve 10000 items, if they are sorted, a binary search can find ANY element of the 10000 in only -14- tries compared toa sequential search which can take 10000 tries if what you are looking for is on the end (order lg(n) algorithm, versus sequential search with order n)

please see my post below

Vijchtidoodah

March 30, 2005, 7:45 am

Do you mind telling me what that means and why we need it on whatever it's for?

chrisgbk

March 30, 2005, 7:35 pm

First notes: I discuss complexity in "Big-O" notation. This is the order of magnitude of the complexity, not an actual value. n is used to refer to the size of the data. Something that takes n+10 steps has the same order of magnitude as something that takes n steps, O(n). lg(n) means the log to the base of 2 of n. ie: lg(1024) = 10, since 2^10 = 1024. lg(2048) = 11, since 2^11 = 2048.

The first topic to be covered is sorting, since it is the crucial component of an efficient sort. The sorting algorithm I will discuss is the merge sort algorithm. The way this algorithm works is you take an array and split and merge it so that it is sorted. NOTE: it's always easier to sort as you add elements as opposed to just adding and then sorting. See below.

Example of merge sort
[code]
[6 7 8 3 4 2]
\ split /
[6 7 8] [3 4 2]
\ split / \ split /
[6] [7 8] [3] [4 2]
\split/ \split/
[6] [7] [8] [3] [4] [2]
\merge/ \merge/
[6] [7 8] [3] [2 4]
\merge/ \merge/
[6 7 8] [2 3 4]
\merge/
[2 3 4 6 7 8]
[/code]

Time complexity of merge sort = O(n * lg(n))
This merge sort code is efficient memory wise.
[code]
procedure mergeSort (array[] arrayToBeSorted)
mergeSortAscending(arrayToBeSorted, 0, length of array)
end procedure

procedure mergeSortAscending(array[] arrayToBeSorted, low, high)
if (high - low) > 1
split = (high + low)/2
(NOTE: use integer division that rounds down, ie: 1/2 = 0, 2/2 = 1, 3/2 = 1
mergeSortAscending(arrayToBeSorted, low, split)
mergeSortAscending(arrayToBeSorted, split, high)
arrayToBeSorted = mergeAscending(arrayToBeSorted, low, high, split)
end if
end procedure

procedure mergeAscending(array[] arrayToBeMerged, low, high, split) returns array[]
array[] firstPart = copy elements of arrayToBeMerged from low to split

firstIndex=low, secondIndex=split, targetIndex=low
loop while firstIndex <= (split - low) and secondIndex <= (high - split)
if element in firstPart at firstIndex < element in arrayToBeMerged at secondIndex
set element in arrayToBeMerged at targetIndex to element in firstPart at firstIndex
increment targetIndex by 1
increment firstIndex by 1
else
set element in arrayToBeMerged at targetIndex to element in arrayToBeMerged at secondIndex
increment targetIndex by 1
increment secondIndex by 1
end if
end loop
return arrayToBeMerged
end procedure
[/code]
If you keep your data sorted and you want to add an element, it's better insert it at the correct place. To do this, you simply go through your array until you find an element thats greater than the element you want to add, and insert it before it. ex: [1 2 3 5 6] insert 4, compare 1, 2, 3, 5, 5 is greater than 4, so insert it before 5. Time complexity of insertion is O(n), so it's easy to see that staying sorted from the beginning is MUCH better. (Here's something for you to try: try to adapt the binary search to find the proper place to add an element. Then your new binary insertion beomes O(lg(n)). Quite speedy ;))

Now, onto searching.

With the binary search algorithm, 50% of your searchspace +1 is cut out in each pass. So say you had 10,000 elements. After 1 comparison, you only have to check within 4999 elements. After the 2nd pass, you only have 2499 elements. And so on. Time complexity of binary search is O(lg(n))
[code]
procedure binarySearch(array[] arrayToSearch, key) returns index of match
found = -1
low = 0
high = length of array
loop while found == -1 and low <= high
guess = (high + low) / 2 (see note in merge sort on integer division)
if element in array at guess = key
found = guess
else if element > key
high = guess - 1
else
low = guess + 1
end if
end loop
return found
end procedure

[/code]

This has many possible uses, can be used for virtually anything, just thought I would share. :) And apologies if any errors exist in my psuedo code, try implementing it and tell me if it crashes >.>

FliesLikeABrick

March 30, 2005, 7:56 pm

thanks chrisgbk, but i already know what binary searching and sorting is all about :P

I'm not sure when i talked to you and why you left this link for me in IRC, but uhm.... thanks?(see edit)

I have spent quite some time studying linked data structures and have been working on quite a few C++ projects that work with optimized data structures and algorithms... not to mention in quite a few classes that deal with it

Lately i have started working in php on a text-based stats system.. just for my own amusement. The problem I hit was that with 25,000 lines in a text file, 78 columns of data, and one huge 2d array, it becomes very inefficient. php doesn't seem to have any advanced data structures like C++ style maps (it would be sex if php did), so I have started coding my own sorted insert and binary search functions to deal with it, reducing the big-O notation of my work from linear (N) time to ln(n), a drastic change for working with a 2d array containing 25,000 x 78 elements (mostly int, some string)

was: O(n) 25,000 for a wost-case search
now: O(ln(n)) ---> 14 and change :P

After working for a while in php to define all my own functions to work with this particular 2d array and keep it sorted, i've decided that i think im gonna write it in C++ at first, just because it is easier and i am more familiar with the advanced data structures available. If i was to ever release it, i would do it in php somehow.

edit: oooh.. i looked at my logs and saw why you left the link... you were talking to enesce about this and he said i might be interested... gotcha