Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Microsoft Imagine Cup

Similar presentations


Presentation on theme: "1 Microsoft Imagine Cup"— Presentation transcript:

1 1 Microsoft Imagine Cup http://www.thespoke.net/imagine john@johndowns.co.nz

2 2 CompSci 105 SS 2005 Principles of Computer Science Lecture 24: Tables and Hashing

3 3 Tables What is a table??

4 4 List ADT createTable() isEmpty() tableLength() tableInsert(item) tablDelete(searchKey) tableRetrieve(searchKey) tableTraverse()

5 5 Search Key It is important that the search key remain the same as long as the item is stored in the table. public abstract class KeyedItem { private Comparable searchKey; Public KeyItem(comparable key) { searchKey = key; } // end constructor public Comparable getKey() { return searchKey; } // end getKey } // end KeyedItem

6 6 Implementation?? Implementations for the ADT Table –Linear approaches Unsorted, array based Unsorted, reference based Sorted (by search key), array based Sorted (by search key), reference based –Non-linear approach Binary Search Tree The requirements of a particular application influence the selection of an implementation –What operations and how often they are used

7 7 Which to use??

8 8 ADT Table Unsorted Array Binary Search Tree ADT Table Program that uses a table Textbook, p. 504-517

9 9 Databases Relational databases are simply a set of tables filled with data Use a variety of methods to store/retrieve that data

10 10 Hash Tables

11 11 Wouldn’t it be nice? Table ADT Key can be used as an array index IDSurnameFirst Name 1EksepshenCatchda 3BaseeksBeegoh 0HeadnodeDummy 2GettingsoonAyplus

12 12 Wouldn’t it be nice? Table ADT Key can be used as an array index IDSurnameFirst Name 1EksepshenCatchda 3BaseeksBeegoh 0HeadnodeDummy 2GettingsoonAyplus 0Headnode, Dummy 1Eksepshen, Catchda 2 3Baseeks, Beegoh

13 13 The Problem IDSurnameFirst Name 9978291EksepshenCatchda 3024817BaseeksBeegoh 3423930HeadnodeDummy 2048171GettingsoonAyplus ID range is far greater than can or should be stored...

14 14 The Problem General case is where we have a large range of possible keys/values, but are only storing a small number of items How do we distribute items in a smaller space?

15 15 Naive Solution If we have N possible search key values and M locations Simply divide N into M lots: e.g. N=1-1000, M=10 1-100 100-200200-300300-400400-500500-600600-700700-800800-900 900-1000 0413256789

16 16 What about collisions? If we want to store two items with search key 150 and 160, they will collide in the same array point 1-100 100-200200-300300-400400-500500-600600-700700-800800-900 900-1000 0413256789

17 17 Hash Functions 0 1 2 3 4 5 6 7 8 9 ? 3423930 Hash Table Hash Function

18 18 A Hash Function 0 1 2 3 4 5 6 7 8 9 ID % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon Hash Function Hash Table

19 19 Collision 0 Headnode 1 2 3 4 5 6 7 Baseeks 8 9 key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon Hash Function Hash Table Eksepshen Gettingsoon

20 20 Hash Function Tricks 0 1 2 3 4 5 6 7 8 9 ? 3423930

21 21 Requirements of Hash Functions Don’t produce values outside of array Distribute items as evenly as possible Use all available space in array to minimise collision

22 22 Selecting Digits Digits 3 and 5 Hash Function How big do we need? 3423930 29

23 23 Folding Digits Sum of all digits Hash Function How big do we need? 3423930 3+4+2+3+9+ 3+0 = 24

24 24 Folding Digits Group and add digits Hash Function How big do we need? 3423930 342+393+0 = 735

25 25 Handling Characters Sum of Unicodes Hash Function “Catchda” Fold these as well?

26 26 Are these any good?? Do they even distribute values? No mention of array size?

27 27 Modulo Arithmetic % tablesize Hash Function 0 1 2 3 4 5 6 7 8 9 3423930

28 28 Modulo Arithmetic % tablesize Hash Function 0 1 2 3 4 5 6 3423930

29 29 Hash Functions Can combine multiple Hash functions into one Combine folding with modulus

30 30 Solutions to Collision?? All methods will result in collision There are many solutions....

31 31 Separate Chaining 0 1 2 3 4 5 6 7 8 9 key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon Hash Function Hash Table GettingsoonEksepshen Headnode Baseeks

32 32 Separate Chaining Could use ANY of the data structures so far Search time is reduced, but extra data structures required Can’t we just use array??

33 33 Linear Probing key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon Hash Function Hash Table 0 Headnode 1 Eksepshen 2 3 4 5 6 7Baseeks 8 9 Clustering

34 34 Linear Probing key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon 3153010Elizabeth II Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon 3 4 5 6 7 Baseeks 8 9 Clustering

35 35 Linear Probing key % 10 IDSurname 9978291Eksepshen 3024817Baseeks 3423930Headnode 2048171Gettingsoon 3153010Elizabeth II Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon 3 Elizabeth II 4 5 6 7 Baseeks 8 9 Clustering

36 36 Finding a Node

37 37 key % 10 Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon 3 Elizabeth II 4 5 6 7 Baseeks 8 9 Problem: Find item with key 9978291 Solution: Search as if we were ADDING the item, checking each place we come across Stop if found or reach null Finding a Node

38 38 Efficiency?? What is the efficiency of these operations?? What is it dependant upon?? When is it best/worst??

39 39 Efficiency


Download ppt "1 Microsoft Imagine Cup"

Similar presentations


Ads by Google