# Internal Sorting And External Sorting English Language Essay

✅ Paper Type: Free Essay |
✅ Subject: English Language |

✅ Wordcount: 5278 words |
✅ Published: 1st Jan 2015 |

Sorting is a technique through which we arrange the data in such a manner so that the searching of the data becomes easy. A lot of sorting techniques has been implemented till now to cope up the faster execution of the result and to manage the data comfortably . Sorting and Searching are fundamental operations in computer science. Sorting refers to the operation of arranging data in some given order. Searching refers to the operation of searching the particular record from the existing information. Normally, the information retrieval involves searching, sorting and merging. In this chapter we will discuss the searching and sorting techniques in detail.Sorting is very important in every computer application. Sorting refers to arranging of data elements in some given order. Many Sorting algorithms are available to sort the given set of elements.

Let get to know about two sorting techniques and analyze their performance. The two techniques are:

Internal Sorting

External Sorting

Internal Sorting takes place in the main memory of a computer. The internal sorting methods are applied to small collection of data. It means that, the entire collection of data to be sorted in small enough that the sorting can take place within main memory. We will study the following methods of internal sorting

1. Insertion sort

2. Selection sort

3. Merge Sort

4. Radix Sort

5. Quick Sort

6. Heap Sort

7. Bubble Sort

Also a lot of algorithms are involved in sorting . Hence we should understand first that what is an algorithm .

Informally, an algorithm is any well-defined computational procedure that takes some value,

or set of values, as input and produces some value, or set of values, as output. An algorithm is

thus a sequence of computational steps that transform the input into the output.

We can also view an algorithm as a tool for solving a well-specified computational problem.

The statement of the problem specifies in general terms the desired input/output relationship.

The algorithm describes a specific computational procedure for achieving that input/output

relationship.

For example, one might need to sort a sequence of numbers into non decreasing order. This

problem arises frequently in practice and provides fertile ground for introducing many

standard design techniques and analysis tools. Here is how we formally define the sorting problem.

## Insertion Sort

This is a naturally occurring sorting method exemplified by a card player arranging the cards dealt to him. He picks up the cards as they are dealt and inserts them into the required position. Thus at every step, we insert an item into its proper place in an already ordered list.

We will illustrate insertion sort with an example (refer to Figure 10.1) before presenting the formal algorithm.

Sort the following list using the insertion sort method:

Thus to find the correct position search the list till an item just greater than the target is found. Shift all the items from this point one down the list. Insert the target in the vacated slot. Repeat this process for all the elements in the list. This results in sorted list.

## Bubble Sort

In this sorting algorithm, multiple swappings take place in one pass. Smaller elements move or ‘bubble’ up to the top of the list, hence the name given to the algorithm.

In this method, adjacent members of the list to be sorted are compared.If the item on top is greater than the item immediately below it, then they are swapped. This process is carried on till the list is sorted.

The detailed algorithm follows:

Algorithm: BUBBLE SORT 6

1. Begin

2. Read the n elements

3. for i=1 to n

for j=n downto i+1

if a[j] <= a[j-1]

swap(a[j],a[j-1])

4. End // of Bubble Sort

Total number of comparisons in Bubble sort :

= (N-1) +(N-2) . . . + 2 + 1

= (N-1)*N / 2 =O(N2)

This inefficiency is due to the fact that an item moves only to the next position in each pass.

## Quick Sort

This is the most widely used internal sorting algorithm. In its basic form, it was invented by C.A.R. Hoare in 1960. Its popularity lies in the ease of implementation, moderate use of resources and acceptable behaviour for a variety of sorting cases. The basis of quick sort is the divide and conquer strategy i.e. Divide the problem [list to be sorted] into sub-problems [sub-lists], until solved sub problems [sorted sub-lists] are found. This is implemented as follows:

Choose one item A[I] from the list A[ ].

Rearrange the list so that this item is in the proper position, i.e., all preceding items have a lesser value and all succeeding items have a greater value than this item.

1. Place A[0], A[1] .. A[I-1] in sublist 1

2. A[I]

3. Place A[I + 1], A[I + 2] … A[N] in sublist 2

Repeat steps 1 & 2 for sublist1 & sublist2 till A[ ] is a sorted list.

As can be seen, this algorithm has a recursive structure.

The divide’ procedure is of utmost importance in this algorithm. This is usually implemented as follows:

1. Choose A[I] as the dividing element.

2. From the left end of the list (A[O] onwards) scan till an item A[R] is found whose value is greater than A[I].

3. From the right end of list [A[N] backwards] scan till an item A[L] is found whose value is less than A[1].

4. Swap A[R] & A[L].

5. Continue steps 2, 3 & 4 till the scan pointers cross. Stop at this stage.

6. At this point, sublist1 & sublist are ready.

7. Now do the same for each of sublist1 & sublist2.

C:UserssaurabhDesktopbubble-sort-3.pngC:UserssaurabhDesktopbubble_sort.jpg

EXTERNAL SORT GENERAL

## Merging Lists Outline

1.Load the next sorted runs R1and R2into main memory buffers B1and B2 a page-at-a-time (i.e., initially first page from each run) (see left figure)

â€¢Obviously R1>=B1 and R2>=B2 (a Run might be larger than a Buffer)

â€¢The rest pages will be loaded to main memory during subsequent steps.

2.Initialize indices i, j to the head of each list (i.e., i=j=0)

3.Compare B1[i] with B2[j] and move smallest item to OUTPUT buffer.

â€¢If B1[i] was smallest item then i++ else j++ (see right figure)

â€¢If OUTPUT gets full, it is appended to the end of a file on DISK and cleared in RAM.

4.Repeat the above until either index i or j reaches the end of its buffer.

â€¢At this point write the remaining records to OUTPUT, flush it to disk and finish.

5.Repeat procedure from 1-4 until all runs have been traversed.

A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using commodity processors, memory, and arrays of SCSI disks, AlphaSort runs the industry-standard sort benchmark in seven seconds. This beats the best published record on a 32-CPU 32-disk Hypercube by 8:1. On another benchmark, AlphaSort sorted more than a gigabyte in one minute. AlphaSort is a cache-sensitive, memory-intensive sort algorithm. We argue that modern architectures require algorithm designers to re-examine their use of the memory hierarchy. AlphaSort uses clustered data structures to get good cache locality, file striping to get high disk bandwidth, QuickSort to generate runs, and replacement-selection to merge the runs. It uses shared memory multiprocessors to break the sort into subsort chores. Because startup times are becoming a significant part of the total time, we propose two new benchmarks: (1) MinuteSort: how much can you sort in one minute, and (2) PennySort: how much can you sort for one penny.

Internal or External?

In an internal sort, the list of records is small enough to be maintained entirely in

physical memory for the duration of the sort.

In an external sort, the list of records will not fit entirely into physical memory at once.

In that case, the records are kept in disk files and only a selection of them are resident

in physical memory at any given time.

The records stored in the list may be simple (e.g., string or int) or they may be

complex structures. In any case, we will assume that:

– there is an implicit conversion of a record into its key value,

– key values may be compared using the usual relational operators (<, >, etc.).

The first assumption requires that the record type implementation provides an

appropriate conversion operator.

The second assumption requires that the key type implementation provides an

overloading of the relational operators.

None of these features are difficult for a client of the sortable list to implement, and the

assumptions will make the implementation of the sortable list considerably simpler and

more elegant.

Sorting a list requires accessing the data elements stored in the list. For efficiency this

suggests that the sort capability should be provided via member functions of the list

class in question. We will follow that approach here, although the alternative is

certainly feasible.

Building upon the LinkListT or ArrayT classes discussed earlier, we could derive

sorted variants, overriding some member functions of the base class and adding new

member functions for sorting.

If we do that, what base member functions must be overridden?

– The insertion functions must now take into account the ordering of the list

elements, placing each new element in the proper location.

– A number of the base member functions are unsuitable for a sorted list type

and must be disabled.

– The search functions may now take advantage of the list element ordering.

template

protected:

LinkNodeT

LinkNodeT

LinkNodeT

public:

LinkListT();

LinkListT(const LinkListT

LinkListT

~LinkListT();

bool isEmpty() const;

bool inList() const;

bool PrefixNode(Item newData);

bool AppendNode(Item newData);

bool InsertAfterCurr(Item newData);

bool Advance();

void gotoHead();

void gotoTail();

bool MakeEmpty();

bool DeleteCurrentNode();

bool DeleteValue(Item Target);

Item getCurrentData() const;

void PrintList(ostream& Out);

Inserting a new record into an ordered list requires:

– searching the list until the appropriate location is found

– creating an empty “slot” in the list to hold the new record

– placing the new record into that “slot”

The search phase is essentially trivial, aside from the issue of ties.

Naturally, a binary search should be used.

The creation of an empty “slot” depends upon the type of list:

– For a contiguous list, the tail of the list must be shifted to make room for the

new element.

– For a linked list, a new list node must be allocated and inserted.

Placing the record into the “slot” is trivial.

The approach described so far maintains the list in sorted order at all times, by placing

each new element in the appropriate location when it is inserted to the list.

For a list containing N elements, whether contiguous or linked, the average cost of

ordered insertion of a single new element is Î˜(N). When an exact analysis is done,

performance is worse, of course, for a contiguous list.

Greater efficiency can be obtained if the list elements are inserted and then an efficient

sort algorithm is applied to the list.

Which approach is preferred depends upon how often the list receives insertions, and

whether those insertions tend to be grouped or isolated (in time).

Chapter 2

The Literature Review

Sorting is a filed which has been a field of interest for all of the researcher who has been somehow connected with the faster execution of the database and all sorted arrays .

A lot of researcher have their own view about sorting . Lets get to know first that what they think about sorting and then we will be listing the problems which are now currently in sorting and then we would be providing a possible solution for this . lets see how it goes .

Torsten Grust

Says that DBMS does not execute a query as a large monolithic block

but rather provides a number of specialized routines, the

## query operators.

â€¢ Operators are “plugged together” to form a network of

operators, a plan, that is capable of evaluating a given query.

â€¢ Each operator is carefully implemented to perform a specific

task well (i.e., time- and space-efficient).

Constantin Zopounidis has his opinion about sorting . According to him

When considering a discrete set of alternatives described by some criteria, there are four different kinds

of analyses that can be performed in order to provide significant support to decision-makers (Roy, 1985):

(1) to identify the best alternative or select a limited set of the best alternatives, (2) to construct a rankordering

of the alternatives from the best to the worst ones, (3) to classify/sort the alternatives into

predefined homogenous groups, (4) to identify the major distinguishing features of the alternatives and

perform their description based on these features. The former three approaches (choice, ranking, classification/

sorting) lead to a specific evaluation outcome. In deriving this outcome, both choice and ranking are

based on relative judgments and consequently the evaluation result depends on the considered set of alternatives.

On the other hand, taking a classification/sorting decision the decision-maker needs to perform

absolute judgments. Since the groups are usually specified independently of the alternatives under con

sideration, the classification/sorting of the alternatives requires their comparison to some reference profiles

that distinguish the groups.

While both classification and sorting refer to the assignment of a set of alternatives into predefined

groups, they differ with respect to the way that the groups are defined. In that sense, classification refers to

the case where the groups are defined in a nominal way. On the contrary, sorting (a term which is widely

used by multicriteria decision aiding (MCDA) researchers) refers to the case where the groups are defined in

an ordinal way starting from those including the most preferred alternatives to those including the least

preferred alternatives. Both kinds of problems have numerous practical applications, included but not

limited to:

â€¢ Medicine: performing medical diagnosis through the classification of patients into diseases groups on the

basis of some symptoms (Stefanowski and Slowinski, 1998; Tsumoto, 1998; Belacel, 2000; Michalowski

et al., 2001).

â€¢ Pattern recognition: examination of the physical characteristics of objects or individuals and their classification

into appropriate classes (Ripley, 1996; Young and Fu, 1997; Nieddu and Patrizi, 2000). Letter

recognition is one of the best examples in this field.

â€¢ Human resources management: assignment of personnel into appropriate occupation groups according to

their qualifications (Rulon et al., 1967; Gochet et al., 1997).

â€¢ Production systems management and technical diagnosis: monitoring the operation of complex production

systems for fault diagnosis purposes (Nowicki et al., 1992; Catelani and Fort, 2000; Shen et al., 2000).

â€¢ Marketing: customer satisfaction measurement, analysis of the characteristics of different groups of customers,

development of market penetration strategies, etc. (Dutka, 1995; Siskos et al., 1998).

â€¢ Environmental and energy management, ecology: analysis and measurement of the environmental impacts

of different energy policies, investigation of the efficiency of energy policies at the country level (Diakoulaki

et al., 1999; Rossi et al., 1999; Flinkman et al., 2000).

â€¢ Financial management and economics: business failure prediction, credit risk assessment for firms and

consumers, stock evaluation and classification, country risk assessment, bond rating, etc. (Altman et

al., 1981; Slowinski and Zopounidis, 1995; Zopounidis, 1998; Doumpos and Zopounidis, 1998; Greco

et al., 1998; Zopounidis et al., 1999a,b).

This wide range of real-world applications of the classification/sorting problem has constituted a major

motivation for researchers in developing methodologies for constructing classification/sorting models. The

development of such models necessitates the consideration of a realistic framework that accommodates the

multidimensional nature of real-world decision-making problems. The development of multidimensional

classification models can be traced back to the work of Fisher (1936) on the linear discriminant analysis,

that was later extended to the quadratic form by Smith (1947). Both linear and quadratic discriminant

analysis have dominated the field of classification model development for several decades, along with logit/

probit analysis (Bliss, 1934; Berkson, 1944), which gained the research interest during the 1970s after the

work of McFadden (1974). While these statistical techniques have been heavily criticized for their statistical

assumptions (Altman et al., 1981), they provided the necessary basis for understanding the nature and the

peculiarities of the classification/sorting model development process and the objectives that this process

should accommodate, thus constituting a helpful basis for further research.

The recent research in developing classification/sorting models is based on operations research and

artificial intelligence techniques. Methodologies such as neural networks, machine learning, rough sets,

fuzzy sets and MCDA are considered by researchers both at the theoretical and practical levels. The research .

made at the theoretical level focuses on different aspects of the model development and validation

process. At the practical level, researchers focus on the use of classification/sorting methodologies to analyze

real-world problems and provide decision support, or on the investigation of the performance of

different methodologies using real-world data. While all methodologies have both advantages and disadvantages,

their discussion is out of the scope of this paper

focuses on the MCDA

approach for developing classification and sorting models. Compared to alternative approaches, MCDA

research does not focus solely on developing ”automatic” procedures for analyzing an existing data set in

order to construct a classification/sorting model. MCDA researchers also emphasize on the development of

efficient preference modeling methodologies that will enable the decision analyst to incorporate the decision-

maker’s preferences in the developed classification/sorting model. Following the MCDA framework,

the objective of this paper is to review the research conducted over the last three decades on the development

and application of MCDA classification and sorting methodologies.

The rest of the article is organized as follows. Section 2 provides a review of MCDA classification/

sorting techniques. Section 3 presents applications of these techniques in a variety of real-world decisionmaking

problems and lists some multicriteria decision support systems which have been developed for

classification and sorting model development. Finally, Section 4 concludes the paper and discusses some

interesting future research directions.

Günter J. Hitsch, Al›·i Hortaçsu, and Dan Ariely has its researches about Matching and Sorting .It is not necessary that sorting comes for data only . Sorting can be in any thing , like for a website also Like for an online dating site . Gunter has his reviews about that . Lets see what they say. According to them

Based on the preference estimates, we predict who matches with whom using the algorithm

of David Gale and Lloyd S. Shapley (1962).2 Although the Gale-Shapley mechanism does not

provide a literal description of how matches are formed online (or offline), it has some appealing

features for which we consider it as a theoretical benchmark. First, Adachi (2003) shows that

the stable (equilibrium) matching predicted by the Gale-Shapley algorithm can be seen as the

limit outcome of a two-sided search and matching model with negligible search costs, which

resembles the institutional environment of online dating more closely. Second, the Gale-Shapley

model provides an efficiency benchmark, since the stable matching predicted by the algorithm is

also the Pareto-optimal match, within the set of stable matches, for the side of the market (men or

women) that makes the offers in the deferred acceptance procedure (Roth and Sotomayor 1990,

Theorem 2.12).

We first document that the users of the dating site sort along various attributes, such as age,

looks, income, education, and ethnicity. The Gale-Shapley model, based on the estimated mate

preferences, is able to predict these sorting patterns: the observed correlations in user attributes

largely agree with the correlations in the predicted stable matches. This finding provides an outof-

sample confirmation of the validity of the estimated preferences, which are based only on data

of the users’ first-contact decisions, but not on information about the observed matches which

are typically achieved only after several e-mails are sent back and forth. Furthermore, the result

shows that in online dating, sorting can arise without any search frictions: mate preferences,

rational behavior, and the equilibrium mechanism by which matches are formed generate sorting

patterns that are qualitatively similar to those observed “offline.”

The strong agreement between the observed and predicted matches suggests that the online

dating market achieves an approximately efficient outcome within the set of stable matches.

We further confirm this result by showing that, on average, the site users would not prefer the

matches obtained under the Gale-Shapley mechanism to the actually achieved match. The conclusion

from these findings is that the design of the online dating site that we study provides an

efficient, frictionless market environment.

In the last part of this paper we explore to what extent our approach can also explain “offline”

sorting patterns in marriages. There are two caveats to this exercise. First, mate preferences

and sorting patterns in online dating may differ from the mate preferences and resulting sorting

patterns in a marriage market. Previous studies, however, do not support this objection: Edward

O. Laumann et al. (1994) demonstrate similar degrees of sorting along age, education, and ethnicity/

race across married couples, cohabiting couples, and couples in a short-term relationship

(we confirm these facts using data from the National Survey of Family Growth). Nonetheless,

in order to make statements about marriage patterns based on the preference estimates obtained

from our data, we need to assume that conditional on observable attributes, the users of our dating

site do not differ in their mate preferences from the population at large. Second, the Gale-

Shapley model is a nontransferable utility framework, which may be appropriate for dating,

while marriages may be better described by a transferable utility framework.

We cannot directly compare the attribute correlations observed online to correlations in marriages

due to differences between the online and offline populations along demographic characteristics.

Thus, we reweight our sample of Web site users to match the offline population along

key demographic attributes, and then predict equilibrium matches for this “synthetic” sample

of men and women. We find that the Gale-Shapley algorithm predicts sorting patterns that are

qualitatively, and for some attributes quantitatively, similar to sorting patterns in marriages. This

suggests that preferences are one important cause of offline sorting; the prevalence of ethnically

homogeneous marriages, for example, is unlikely to be due entirely to segregation. However, we

underpredict the exact degree of sorting along some attributes, most prominently for education.

One possible reason for the difference between the actual and predicted correlation in education

(and other attributes) is that search frictions are also one cause of sorting in marriages.

Finally, we attempt to uncover the importance of different preference components in driving

observed sorting patterns. As we discussed above, the horizontal and vertical components of

preferences can, in principle, lead to the same matching outcomes. Our framework enables us to

analyze the relative importance of these two different components by constructing counterfactual

preference profiles that omit one of the components and recomputing the equilibrium matches.

The result of the exercise indicates that “horizontal” preference components are essential in

generating the sorting patterns observed in the data. A similar exercise suggests that same-race

preferences are an essential source of the observed patterns of marriage within ethnic groups.

Our work is related to recent structural econometric work that estimates mate preferences

based on marriage data (Linda Y. Wong 2003; Eugene Choo and Aloysius Siow 2006;

Christopher J. Flinn and Daniela del Boca 2006). The common empirical strategy of these

papers is to fit observed marriage outcomes using a structural model of equilibrium match

formation, in which preferences are parametrized. While Flinn and Del Boca (2006) use the

Gale-Shapley model for the marriage market, Choo and Siow (2006) use a frictionless transferable

utility matching framework. Perhaps the paper that is closest to ours is Wong (2003),

which nests an equilibrium two-sided search model within a maximum likelihood procedure

to explain marriage outcomes in the Panel Study of Income Dynamics (PSID). She also utilizes

data on time-to-marriage to pin down the arrival rate of marriage opportunities. Compared to

these papers, our data contain more detailed mate attribute information; measures of physical

traits, for example, are not used by the papers noted above. Our setting also allows us

to observe the search process directly, providing us with information regarding the choice

sets available to agents, and enabling us to estimate mate preferences based on “first-contact”

behavior alone. On the other hand, our data on online dating are, by design, less related to

marital preferences than data on actual marriages.

Our work is also related to a recent series of papers utilizing data from “speed-dating” events

by Robert Kurzban and Jason Weeden (2005), Raymond Fisman et al. (2006, 2008), and Paul W.

Eastwick and Eli J. Finkel (2008). While the online dating sample we use is larger and, compared

to most of these papers, more representative of the population at large, our revealed preference

findings are similar. The main goal of our paper, however, is to characterize the equilibrium

market outcomes in online dating and marriage markets, which is not attempted by the aforementioned

papers.

Thomas Niemann says that Hash tables are a simple and effective method to implement dictionaries. Average time to search for an element is O(1), while worst-case time is O(n). Cormen [2009] and Knuth [1998] both contain excellent discussions on hashing.

## Theory

A hash table is simply an array that is addressed via a hash function. For example, in Figure 3-1, hashTable is an array with 8 elements. Each element is a pointer to a linked list of numeric data. The hash function for this example simply divides the data key by 8, and uses the remainder as an index into the table. This yields a number from 0 to 7. Since the range of indices for hashTable is 0 to 7, we are guaranteed that the index is valid.

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our servicesTo insert a new item in the table, we hash the key to determine which list the item goes on, and then insert the item at the beginning of the list. For example, to insert 11, we divide 11 by 8 giving a remainder of 3. Thus, 11 goes on the list starting at hashTable. To find a number, we hash the number and chain down the correct list to see if it is in the table. To delete a number, we find the number and remove the node from the linked list.

Entries in the hash table are dynamically allocated and entered on a linked list associated with each hash table entry. This technique is known as chaining. An alternative method, where all entries are stored in the hash table itself, is known as open addressing and may be found in the references.

If the hash function is uniform, or equally distributes the data keys among the hash table indices, then hashing effectively subdivides the list to be searched. Worst-case behavior occurs when all keys hash to the same index. Then we simply have a single linked list that must be sequentially searched. Consequently, it is important to choose a good hash function. Several methods may be used to hash key values. To illustrate the techniques, I will assume unsigned char is 8-bits, unsigned short int is 16-bits and unsigned long int is 32-bits.

Division method (tablesize = prime). This technique was used in the preceeding example. A hashValue, from 0 to (HASH_TABLE_SIZE – 1), is computed by dividing the key value by the size of the hash table and taking the remainder. For example:

## typedef int HashIndexType;

## HashIndexType hash(int key) {

## return key % HASH_TABLE_SIZE;

## }

Selecting an appropriate HASH_TABLE_SIZE is important to the success of this method. For example, a HASH_TABLE_SIZE divisible by two would yield even hash values

## Cite This Work

To export a reference to this article please select a referencing stye below:

## Related Services

View all## DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: