Sunday, 20 November 2011

Sets, Bags and Sequences

Elementary or "naive" set theory is used to define basic mathematical structures. A set is an arbitrary collection of elements, which may be real or imaginary, physical or abstract. In mathematics, sets are usually composed of abstract things like numbers and points, but one can also talk about sets of apples, oranges, people, or canaries. In computer science, sets are composed of bits, bytes, pointers, and blocks of storage. In many applications, the elements are never defined, but are left as abstractions that could be represented in many different ways in the human brain, on a piece of paper, or in computer storage.

Curly braces are used to enclose a set specification. For small, finite sets, the specification of a set can be an exhaustive list of all its elements:
{1, 97, 63, 12}.

This specifies a set consisting of the four integers 1, 97, 63, and 12. Since the order of listing the elements is immaterial, the following specification is equivalent to the one above:
{12, 63, 97, 1}.
`If the set is very large, like the set of all mammals, a complete listing is impossible. It is hard enough to enumerate all the people in a single city, let alone all the cats, dogs, mice, deer, sheep, and kangaroos in the entire world. For such sets, the specification must state some rule or property that determines which elements are in the set:`
`{x | vertebrate(x) and warmBlooded(x) and hasHair(x) and lactiferous(x)}`
This specification may be read the set of all x such that x is vertebrate, x is warm blooded, x has hair, and x is lactiferous. A given set may be specified in more than one way. The following four specifications all determine the same set:
`{1, 2, 3}`
`{x | x is an integer and 0<x<4}{x | x is a positive integer, x divides 6, and x6}{x | x=1 or x=2 or x=3}`

A set specification that lists all elements explicitly is called a definition by extension. A specification that states a property that must be true of each element is called a definition by intension. Only finite sets can be defined by extension. Infinite sets must always be defined by intension or by some operations upon other infinite sets that were previously defined by intension.

In any theory using sets, there are two privileged sets: the empty set {}, which contains no elements at all, and the universal set U, which contains every element that is being considered. In mathematical discussions, for example, the universal set may be the set of all integers Z or the set of all real numbers R. In most discussions, the universal set is usually defined at the beginning of the presentation. Thereafter, other sets are built up from U: subsets of U, pairs of elements of U, sets of sets of elements from U, etc.

Of all the operators that deal with sets, the most basic is , which states whether a particular element is in a set: the notation xS means that x is an element of the set S; it may also be read x is a member of the set S or simply x is in S. All other operators on sets can be defined in terms of . Let A and B be any two sets. Following are the common operators of set theory; listed for each one is its name, standard symbol, informal English definition, and formal definition in terms of :
• Union. AB is the set that contains all the elements in either A or B or both:
`AB = {x | xA or xB}.`
• Intersection. AB is the set that contains all the elements that are in both A and B:
`AB = {x | xA and xB}.`
• Complement. -A is the set that contains everything in the universal set that is not in A:
`-A  = {x | xU and not xA}.`
• Difference. A-B is the set that contains all the elements that are in A but not in B:
`A-B = {x | xA and not xB}.`
• Subset. AB means that every element of A is also an element of B:
```If xA, then xB.
```
In particular, every set is a subset of itself: AA.
• Proper subset. A is a proper subset of B if AB and there is at least one element of B that is not in A:
`If xA,then xB and there exists some bwhere bB and not bA.`
• Superset. A is a superset of B if B is a subset of A.
• Empty set. The empty set has no elements: for every x, it is false that x{}. The empty set is a subset of every set, including itself: for every set A, {}A.
• Disjoint sets. Two sets A and B are said to be disjoint if they have no common elements; i.e., their intersection is the empty set: AB={}.
The operators for union, intersection, and complement satisfy several standard identities. Some of the identities listed below are similar to the rules of ordinary arithmetic. Addition and multiplication, for example, obey the rules of commutativity and associativity, and the minus sign obeys the rule of double complementation. Idempotency, absorption, and De Morgan's laws, however, do not hold for ordinary arithmetic. Distributivity holds for multiplication over addition, but addition does not distribute over multiplication.
• Idempotency. AA is identical to A, and AA is also identical to A.
• Commutativity. AB is identical to BA, and AB is identical to BA.
• Associativity. A(BC) is identical to (AB)C, and A(BC) is identical to (AB)C.
• Distributivity. A(BC) is identical to (AB)(AC), and A(BC) is identical to (AB)(AC).
• Absorption. A(AB) is identical to A, and A(AB) is also identical to A.
• Double complementation. - -A is identical to A.
• De Morgan's laws. -(AB) is identical to -A-B. and -(AB) is identical to -A-B.

For complex sets, the rule for determining which elements are in the set may be too complex to state in a single expression. An example is the set of all grammatical sentences in some language, natural or artificial. Such sets are typically specified by a recursive definition:
• First a finite starting set of elements is given.
• Then some operations are specified for generating new elements of the set from old elements.
• Finally, the set is defined to be the smallest set containing the starting elements and all others that can be derived from them by repeated application of the generating operations.
The set resulting from these operations is said to be the closure of the starting set under the given generating operations. As an example of a recursive definition, the set S of all positive integers not divisible by 3 could be specified by intension:
`S = {x | x is an integer, x>0, and 3 does not divide x}.`
But the property x is an integer depends on some prior definition of the set of all integers. The following recursive definition depends only on the operation of adding 3:
• Let the set {1, 2} be a subset of S.
• If x is any element of S, then x+3 is also an element of S.
• S is the smallest set that has the above two properties; i.e., S is a proper subset of any other set that has those properties.
All elements of S may be enumerated by starting with {1, 2}. The first stage of adding 3 generates the new elements 4 and 5, adding 3 to them gives 7 and 8, then 10 and 11, and so on. The set S is the closure of the set {1, 2} under the operation of adding 3. A recursive definition is a special kind of definition by intension. Formal grammars define languages by a recursive definition in which the generating operations are specified by a set of production rules.

A set has no duplicate elements. Since all duplicates are discarded in computing the union of two sets, the union operator is idempotent: AA=A. In some cases, one may want to allow duplicates; therefore, a bag is a collection of things with possible duplicates. Since there may be more than one occurrence of a given element x, the count operator @ is a generalization of the element operator . The expression x@A is the number of times the element x occurs in the bag A. Bags are useful for many purposes, such as taking averages: if four men have heights of 178cm, 184cm, 178cm, and 181cm, then the set of those numbers is {178, 181, 184} with the average 181; but the bag of the numbers is {178, 178, 181, 184} with average 180.25.
sequence is an ordered bag. To distinguish ordered sequences from unordered sets and bags, the elements of a sequence are enclosed in angle brackets: 178, 184, 178, 181; the empty sequence is written . If a sequence has n elements, the elements are numbered from 1 to n (or alternatively from 0 to n-1). A sequence of two elements is sometimes called an ordered pair; a sequence of three elements, a triple; a sequence of four, aquadruple; a sequence of five, a quintuple; and a sequence of n elements, an n-tuple. Historically, the theory of sets was first defined without considering order. On a piece of paper or in computer storage, however, the elements of a set must be listed in some order. Sequences are therefore easier to represent than bags, and bags are easier to represent than sets: a bag is a sequence with the ordering ignored, and a set is a sequence with both order and duplicates ignored.

New sets may be created by combining elements from the universe U in various ways. The cross product of two sets A and B, written A×B, is the set of all possible ordered pairs with the first element of each pair taken from Aand the second element from B. If A is the set {1,2} and B is the set {x,y,z}, then A×B is the set,
{1,x1,y1,z2,x2,y2,z}.

With the notation for defining a set by a property or rule, it is possible to give a general definition for the cross product A×B:

{x,y | xA and yB}.

The cross product can also be extended to three or more sets. The product A×B×C is defined as

{x,y,z | xAyB, and zC}.

Since René Descartes introduced pairs of numbers for identifying points on a plane, the cross product is also called the Cartesian product in his honor.

Inside a computer or the human brain, all sets that are explicitly stored must be finite. But mathematical definitions and proofs are generally simpler if there is no upper limit on the size of sets. Therefore, definitions in computer science often permit infinite sets, but with the understanding that any implementation will only choose a finite subset. Most infinite sets discussed in computer science are assumed to be countable: a countably infinite set is one whose elements can be put in a one-to-one correspondence with the integers. The set of all real numbers is uncountable, but such sets are far beyond anything that can be implemented in computer systems.

The terminology for sets is quite standard, although some authors use the word class for set and others make a distinction between classes and sets. Bags are not used as commonly as sets, and the terminology is less standard. Some authors use the word multiset for a bag. Sequences are sometimes called lists or vectors, but some authors draw distinctions between them. Some authors use the symbol ∅ for the empty set, but the notation {} is more consistent with the notation  for the empty sequence.