Peter Müller
Globewide Network Academy (GNA)
pmueller@uu-gna.mit.edu
This chapter is a refreshment of procedural programming. It is included here, because we don't know the background you have and because we want to present some general problems associated with procedural programming. If you understand the principles presented here, you are ready to follow the ``real'' object-oriented chapters. If you have problems in understanding this chapter, you might want to learn more about procedural programming prior to reading the following chapters.
Although this chapter is extremely technical and already provides code examples in a particular language, subsequent chapters will start with a higher level of abstraction. We will come back to this level of abstraction when we start introducing C++ as our object-oriented programming language.
Currently every software application has to be object-oriented. Object-orientation became to todays vogue word in software engineering. However, not all applications or programs which are declared to be object-oriented are it really. There is a trend to say that everything where ``objects'' can be identified is ``object-oriented''.
So, why does it make sense to use object-orientation? This tutorial focusses on object-oriented programming, hence, to answer the question let's start with a programming example. Programs use data structures to store data. Several data structures exist, for example lists, trees, arrays, sets, bags or queues to name a few. Each of these data structures can be characterized by their structure and their access methods.
You all know singly linked lists which use a very simple structure as shown in Figure 2.1.
Figure 2.1: Structure of a singly linked list.
It just provides access methods to append a new element to its end and to delete the element at the front. Complex data structures might use already existing ones. For example a queue can be structured like a singly linked list. However, queues provide access methods to put a data element at the end and to get the first data element (first-in first-out (FIFO) behaviour).
We will now present an example which we use to present some design concepts. Since this example is just used to illustrate some design concepts and problems it is neither complete nor optimal. Refer to chapter 10 for a complete object-oriented discussion about the design of data structures.
Suppose you want to program a list in a procedural programming language such as C or Pascal. As you believe that lists are a common data structure, you decide to implement it in a module. Typically, this requires to write two files: the interface definition and the implementation. For obvious reasons, let's stick to C as the implementation language. In C, the interface description is the include or header file. Your first sketch of your list module interface might look like this:
/* A singly linked list module. */ #ifndef __LIST1_H #define __LIST1_H int list_initialize(void); int list_append(void *data); int list_delete(void); void list_end(void); void *list_getFirst(void); void *list_getNext(void); #endif
Interface definitions just describe what is available and not how it is made available. You hide the information of the implementation in the implementation file. This is a fundamental principle in software engineering, so let's repeat it: You hide information of the actual implementation (information hiding). This enables you to change the implementation, for example to use a faster but more memory consuming algorithm for storing elements without the need to change other modules of your program: The call to the provided functions remains the same.
The idea of this interface is as follows: Before using the list one have to call list_initialize() to initialize variables local to the module. The following two functions implement the mentioned access methods append and delete. The append functions needs a more detailed discussion. Function list_append() takes one argument data of arbitrary type. This is necessary since you wish to use your list in several different environments, hence, the type of the data elements to be stored in the list is not known beforehand. Consequently, you have to use a void pointer which allows to point to any kind of data. The third function list_end() needs to be called when the program terminates to enable the module to clean up its internally used variables. For example you might want to release allocated memory.
With the last two functions getFirst() and getNext() you offer a simple mechanism to traverse through the list. Traversing can be done using the following loop:
SomeDataType *datap; datap = (SomeDataType *) list_getFirst(); while (datap) { doSomething(datap); datap = (SomeDataType *) list_getNext(); }
Since you use void pointers you have to cast them to the actual data types stored in your list.
Now you have a list module which allows you to use a list with any type of data elements. But what, if you need more than one list in one of your projects?
You decide to redesign your list module to be able to manage more than one list. You therefore create a new interface description which now includes a definition for a list handle. This handle is used in every function to uniquely identify the list in question. Your include file of your new list module looks like this:
/* A list module for more than one list */ #ifndef __LIST2_H #define __LIST2_H typedef struct list_handle_struct *list_handle_t; list_handle_t list_create(); void list_destroy(list_handle_t this); int list_append(list_handle_t this, void *data); void *list_getFirst(list_handle_t this); void *list_getNext(list_handle_t this); #endif
You use typedef to define a new type list_handle_t which represents your list handle. The type is a pointer to a struct where the actual representation is not further specified. You also hide the implementation detail of this type in your implementation file. Note the difference to the previous version where you just hide functions. Now you also hide information for an user defined data type called list_handle_struct.
You use list_create() to obtain a handle to a new thus empty list. Every other function now contains the special parameter this which just identifies the list in question. All functions now operate on this handle rather than a module global list.
Now you might say, that you can create list objects. Each such object can be uniquely identified by its handle and only those methods are applicable which are defined to operate on this handle.
The example above shows, that you already program with some object-oriented concepts in mind. However, the example implies some problems which we will outline now.
In the example every time you want to use a list, you explicitly have to declare a handle and perform a call to list_create() to obtain a valid one. After the use of the list you must explicitly call list_destroy() with the handle of the list you want to be destroyed. If you want to use a list within a function, say, foo() you use the following code frame:
void foo() { list_handle_t list; list = list_create(); /* Do something with the list. */ ... list_destroy(list); }
Let's compare the list with other data types, for example an integer. Integers are declared within a particular scope (for example within a function). Once you've defined them, you can use them. Once you leave the scope (for example the function where the integer was defined) the integer is lost. It is automatically created and destroyed. Some compilers even initialize newly created integers to a specific value, typically 0 (zero).
Where is the difference to list ``objects''? The lifetime of a list is also defined by its scope, hence, it must be created once the scope is entered and destroyed once it is left. On creation time a list should be initialized to be empty. Therefore we would like to be able to define a list similar to the definition of an integer. A code frame for this would look like this:
void foo() { list_handle_t list; /* List is created and initialized */ /* Do something with the list. */ ... } /* List is destroyed */
Decoupling of data and operations leads usually to a structure based on the operations rather than the data: Modules group common operations (such as those list_...() operations) together. You then use these operations by providing explicitly the data to them on which they should operate. The resulting module structure is therefore oriented on the operations rather than the actual data. One could say that the defined operations specify the data to be used.
In object-orientation, structure is organized by the data. You chooses the data representations which best fit your requirements. Consequently, your programs get structured by the data rather than operations. Thus, it is exactly the other way around: The data specifies valid operations. Now modules group data representations together.
In our list example we have to use a void pointer to allow the list to carry any data we like. This implies, that the compiler cannot guarantee for type safety. Consider the following example which compiles without any compiler warnings nor errors:
void foo() { SomeDataType data1; SomeOtherType data2; list_handle_t list; list = list_create(); list_append(list, (void *) &data1); list_append(list, (void *) &data2); /* Oops */ ... list_destroy(list); }
It is in your responsibility to ensure that your list is used consistently. A possible solution is to additionally add information about the type to each list element. However, this implies more overhead and does not prevent you from knowing what you are doing.
What we would like to have is a mechanism which allows us to specify on which data type the list should be defined. The overall function of the list is always the same, whether we store apples, numbers, cars or even lists. Therefore it would be nice to declare a new list with something like:
list_handle_t<Apple> list1; /* a list of apples */ list_handle_t<Car> list2; /* a list of cars */
The corresponding list routines should then automatically return the correct data types. The compiler should be able to check for type consistency.
The list example include operations to traverse through the list. Typically a cursor is used for that purpose which points to the current element. This implies a traversing strategy which defines the order in which the elements of the data structure are to be visited.
For a simple data structure like the singly linked list one can think of only one traversing strategy. Starting with the leftmost element one successively visits the right neighbours until one reaches the last element. However more complex data structures such as trees can be traversed using different strategies. Consequently it should be possible to separate the actual representation or shape of the data structure from its traversing strategy. We will investigate this in more detail in chapter 10.
What we have shown with the traversing strategy applies to other strategies as well. For example insertion might be done such that an order over the elements is archieved or not.
#define sum(a, b) ((a) + (b))
would replace any occurrences of ``sum(a, b)'' with ``((a) + (b))''. For example
sumOf1and2 = sum(1, 2);
would result in a replacement by the preprocessor to
sumOf1and2 = ((1) + (2));
which assigns ``3'' to sumOf1and2. Thus the preprocessor is really a textual replacement of occurrences of macros defined with ``#define''.
How could this mechanism be used to imitate the parameterized list definitions as presented in section 2.2.3? What changes must be made to the method definitions?
Note: Without changing the interface definition macros cannot be used. Use them for the method calls, not for list object definitions.
Note: Assume that you have a list module which offers an interface as described in section 2.1 ``Handling multiple lists''.