Managing Query Execution in Database Engine Part – 7
Making Use of Indexes
The nested repetition technique can be made more effectual when indexes are obtainable on both join columns or attributes in the relations or tables Table 1 and Table 2.
Let’s assume that an individual have accessible indexes on both join columns or attributes Column 1 and Column 2 in the relations or tables Table 1 and Table 2 respectively. An individual at this point of time can scan both the indexes to decide whether a couple of tuples or rows have the identical data of the join attribute or column. If the data is the similar, the tuple or row from Table 1 is carefully chosen as well as then every the tuples or rows from Table 2 are carefully chosen which have the identical join column or attribute data. This is completed by means of scanning the index on the join attribute or column in Table 2. The index on the join column or attribute in Table 1 is now scanned to examine if there are more than the one (1) tuple or row with the similar data of the attribute or column. Every row or tuple of Table 1 which have the similar join attribute or column data are then carefully chosen as well as joined with the tuples or rows of Table 2 which have already been carefully chosen. The procedure then carries on with the subsequent data for which tuples or rows are obtainable in Table 1 as well as Table 2.
Obviously this technique needs significant storing thus an individual can store every attributes or columns from Table 1 as well as Table 2 which have the similar join attribute or column data. The cost of the join if the indexes are castoff can be projected as follows. So, let’s take the cost of reading of the indexes be INXN1 as well as INXN2, at this point of time the total reading cost will be:-
Cost = INXN1 + INXN2 + N1 + N2
Cost savings by means of using indexes can be huge enough to rationalize the construction of an index at the time when a join needs to be calculated.
The Sort Merge Technique
The nested scan method is easy but includes similar every single block of Table 1 with every single block of Table 2. This can be dodged when both relations or tables were well ordered on the join attribute or column. The sort merge algorithm was presented by Blasgen as well as Eswaran in the year of 1977. It is a conventional procedure which has been the choice for connecting relations or tables which have no index on either of the two (2) attributes or columns.
This technique includes categorization of the table or relations Table 1 as well as Table 2 on the join attributes or columns, if it is not already categorized. Keeping them as provisional lists as well as then scanning them block wise and then joining those tuples or rows which are satisfying the join condition. The benefit of this system is that all of the inner relation or table in the nested reiteration does not want to be read in for every single row or tuple of the outer relation or table. This saving can be significant when the outer relation or table is large.
Let the cost of categorization X as well as Y be C X and C Y respectively in addition to that let the cost of reading the two (2) tables or relations in main memory be M X and M Y respectively. The aggregate cost of the join is then as follows:-
Cost = C X + C Y + M X + M Y
When one (1) otherwise both the relations or tables are already categorized on the join column or attribute at that point the cost of the join decreases.
The algorithm can be enhanced if an individual do make use of Multi – way Merge Sort, the cost of categorization is n Log n.
Simple Hash Join Technique
This technique includes building a hash table or relation of the lesser table or relation Table 1 by means of hashing every single tuple or row on its hash attribute or columns. As it was presumed that the relation Table 1 is too big to fit in the main memory, the hash table or relation would normally not fit into the main memory. The hash table or relation for that reason should be built in phases. A number of addresses of the hash table or relation are first (1st) carefully chosen such that the tuples or rows hashed to those addresses can be kept in the main memory. The tuples or rows of Table 1 that do not hash to these addresses are again written back to the disk. Let these tuples or rows be a table or relation Table 1. At this instant the algorithm mechanism is as subsequent:
(a) Scan table or relation Table 1 as well as hash every single tuple row on its join attribute or column. If the hashed data is equivalent to one (1) of the addresses which are in the main memory, stock the tuple or row in the hash table or relation. Or else write the tuple or row again back to the disk in a fresh table or relation Table 1.
(b) Scan the table or relation Table 2 as well as hash every single tuple or row of Table 2 on its join attribute or column. Any one of the subsequent three (3) conditions should hold true:
1. The hashed data is equivalent to one (1) of the carefully chosen data, and one (1) or more (N) tuple or row of Table 1 with identical attribute or column data occurs. An individual associate the tuples or rows of Table 1 which match with the tuple or row of Table 2 in addition to output as the subsequent tuples or rows in the join.
2. The hashed data is equivalent to one (1) of the carefully chosen data, but then again there is no tuple or row in Table 1 with similar join attribute or column data. These tuple or rows of Table 2 are disallowed.
3. The hashed data is not equivalent to one (1) of the carefully chosen data. These tuples or row are again written back to disk as a fresh table or relation Table 2.
The above mentioned phases carry on till Table 2 is completed.
(c) Redo the phases (a) as well as (b) until either table or relation Table 1 or Table 2 or both are exhausted.
Grace Hash – Join Technique
This technique is an alteration of the Simple Hash Join technique in which the partitioning of Table 1 is finished before Table 2 is scanned as well as partitioning of Table 2 is done before the joining stage. The technique is comprised of the subsequent three (3) stages:
1. Partition Table 1 – As Table 1 is supposed to be too large to fit in the main memory, a hash table for it cannot be built in the main memory. The first phase of the algorithm involves partitioning the relation into n buckets, each bucket corresponding to a hash table entry. The number of buckets n is chosen to be large enough so that each bucket will comfortably fit in the main memory.
2. Partition Table 2 – The second (2nd) stage of the algorithm includes partitioning the table or relation Table 2 into the similar number (n) of buckets, every single bucket equivalent to a hash relation or table entry. The similar hashing function as for Table 1 is castoff.
3. Calculate the Join – A bucket of Table 1 is read in as well as the matching bucket of Table 2 is read in. Identical tuples or rows from the two (2) buckets are joined together as well as the output as part of the join.
Hybrid Hash Join Technique
The hybrid hash join algorithm is an alteration of the Grace hash join method.
Combination
Combination is frequently found in queries given the occurrence of necessities of discovering an average, the maximum or else how many times something is occurring. The functions maintained in Structured Query Language (SQL) are average, maximum, minimum, count, as well as sum. Combination can itself be of dissimilar kinds counting combination which only necessitates one (1) table or relation, for an instance discovering the lowest mark in a subject, otherwise it might include a relation or table however it may require something like discovering the number of patients under one (1) particular doctor. The last combination will clearly need a number of grouping of the tuples or rows in the relation or table before combination can be applied on.
This is the last part of the article series “Managing Query Execution in Database Engine”. Hope the readers will be benefitted by the article.
Thank you.