Main Table of Contents


Table of Contents

Object databases
The Object Oriented Programming Model
Classes and Objects
Encapsulation
Inheritance
Polymorphism
User Defined Data Types
Identity
Natural modeling of relationships among objects
Summary
Limits of Conventional Database System
Limited data types
Limited modeling of data relationships
No way of grouping code with data
Limited manipulation of data
Poor integration with programming languages
Summary
Persistence and Object Oriented Databases for C++
Persistence
Persistence and object orientation
Resolving references
One-to-many references
Queries
Object management
Transient members
Summary
What can you expect from POET?


Object databases


Intro
The Object Oriented Programming Model
Limits of Conventional Database System
Persistence and Object Oriented Databases for C++
What can you expect from POET?

Object databases are a powerful new tool for software developers (footnote #1) . Unlike relational and table-oriented systems, they provide full support for the object oriented programming model used in languages like C++ and Smalltalk. This model is intuitive, good at modeling relationships, and very suitable for large software projects.

Conventional databases are good at managing large amounts of data, sharing data among programs, and fast value-based queries. They are not very good at modeling the relationships among data - everything must be represented as series of two dimensional tables.

An object database combines the semantics of an object oriented programming language with the data management and query facilities of a conventional database system. This makes it easy to manage large amounts of data and model the relationships among the data. If an object database is integrated with an object oriented language then it should support the semantics of that language - relationships established in the program should automatically be represented in the database when objects are stored. This chapter discusses object oriented systems, databases and object database systems. We will see that object databases have significant advantages compared to conventional table-oriented and relational databases. Small applications will be less complex and easier to understand. Large or complex applications gain the simple, intuitive structure which may mean the difference between success and failure.

footnote #1: Object databases are sometimes called object oriented databases (OODBs) or object oriented database management systems (ODBMS or OODBMS). We prefer the term "object databases", which is simpler and does not contribute to the proliferation of acronyms in the software industry.


The Object Oriented Programming Model


Intro
Classes and Objects
Encapsulation
Inheritance
Polymorphism
User Defined Data Types
Identity
Natural modeling of relationships among objects
Summary

The first object oriented programming language, Simula, was designed for simulations, as was C++. Natural modeling of objects and their relationships was a driving factor in the design of these languages. Programming in this model is fun; your programs consist of lots of objects asking each other to do things. The early popularity of object oriented languages like Smalltalk was due to the fact that people enjoyed programming in them. People who wanted to do software engineering used more bureaucratic and boring languages.

Today object oriented techniques are often seen as a modern form of software engineering. Over the years we have come to realize that modeling is a good way to develop well-structured software systems. The same structures that help us to express the semantic relationships among objects can be used to design programs that are modular, contain well defined interfaces, and are structured along the lines of the problem that is to be solved. Intelligent use of the object oriented programming model results in programs that are better structured and easier to understand. To understand why this is so we must introduce you to the basics of the object oriented paradigm.


Classes and Objects


Intro

Traditional programming languages allow related data to be grouped using data structures. Related code may be grouped by placing it in one program file. A data structure is often directly related to a set of functions which provide it with a certain behavior. For instance, a C program might have a data structure called a circle. This circle might have a radius, a center, a color, and a width and color for drawing the margin. A set of functions will also have to be written for manipulating circles; these functions might draw the circle on the screen, resize the circle, or change its color. These functions make sense only in relation to the circle data structure. In a traditional programming environment we tend to think of the data structure as the circle, and the functions manipulate the circle. However, nothing in the data structure is round; it is actually just a set of parameters that can be used to create a circle. The function that draws a circle converts the data structure into a circle on the screen. Neither the data structure nor the set of associated functions is a circle by itself.

Object oriented programs allow the programmer to combine related code and data in one structure called an object. The definition of this object is called a class. In our example there might be a class called Circle which contains both the data for a circle and the functions needed to draw it, change it, or report its characteristics. For instance, this could be the class declaration for our Circle:

class Circle
{
private:
int radius;
int center_x;
int center_y;
public:
int Draw();
float ComputeArea();
void SetRadius(int NewRadius);
int SetCenter(int x, int y);
};

Now we can create a circle, set its center and radius, and draw it on the screen:

Circle c;
c.SetRadius(50);
c.SetCenter(100, 200);
c.Draw();
float area = c.ComputeArea()

Classes and objects are simple and elegant even for small programs, but they really shine when used in large, complex projects.


Encapsulation


Intro

Encapsulation refers to two properties of objects which have already been alluded to. First, related code and data are grouped together into one entity called an object. This simplifies the structure of a program by stating explicitly how code and data are related. Second, the class definition can hide some of its members from the rest of the program. In our example all of the data structures are 'private.' This means that they can be accessed only by the functions which are members of the Circle class. The rest of the members are 'public', and these serve as the interface to the class. You can change anything in the private part of a class without affecting code that uses the class. If the public interface changes, then code using the class may also have to be changed.


Inheritance


Intro

Our circle will probably have other data and functions which are needed by any shape. For instance, our circle may have a color, a position, or a line width, and there may be functions that change these values. If every shape has to have these, we can group them into a class called Shape:

class Shape
{
private:
Color shape_color; // Color is a struct or class
Color line_color;
int x_position;
int y_position;
int line_width;
public:
void SetColor (Color NewColor);
void SetLineColor (Color NewColor);
void SetLineWidth (int Width);
void MoveTo (int x, int y);
};

Now we can give our circle everything that a shape has simply by saying that a circle is a shape. The C++ syntax looks like this:

class Circle : public Shape
{
private:
int radius;
public:
void Draw();
float ComputeArea();
void SetRadius (int NewRadius);
int SetCenter (int x, int y);
};

In object oriented systems we call this "inheritance". Since a Circle is a Shape it inherits everything that a Shape has. For instance, we can create a circle and set its color and position:

Circle c;
c.SetColor (Chartreuse);
c.MoveTo (100, 200);

Inheritance simplifies the maintenance of code. If we decide to add a new function or data member to every shape, we do not have to search through our files and make our changes to circles, ovals, squares, trapezoids, etc. Instead, we simply change the Shape class. Since all shapes are derived from Shape, these changes affect every shape in the program.


Polymorphism


Intro

Every shape should be able to draw itself on the screen. It is not very easy to write a Draw() function which works for any arbitrary shape, so we will probably have to write a separate Draw() function for each shape. However, it would be nice to be able to write general purpose functions which work properly with any shape. For instance, we might want to write a function which moves an arbitrary shape:

Move (Shape *shape, int new_x, int new_y)
{
shape->Erase();
shape->MoveTo(new_x, new_y);
shape->Draw();
}

The function call shape->Draw() needs to be able to draw any shape, but we have already said that we need a different Draw() function for each shape. If the shape is a circle then the function Circle::Draw() should be called, if it is a trapezoid then Trapezoid::Draw() should be called. C++ implements polymorphism with virtual functions. In our example the Shape class has a virtual function called Draw():

class Shape
{
public:
virtual void Draw();
};
class Circle : public Shape
{
public:
virtual void Draw (); // Circle::Draw overrides Shape::Draw()
};

Since Draw is a virtual function Circle::Draw() overrides Shape::Draw(). This means we can call our Move() function with the address of a circle and the Move() function will call Circle::Draw() and Circle::Erase().


User Defined Data Types


Intro

In object oriented languages it is possible to add new data types. In fact, a new data type is just a class. For instance, we can define a class called Complex which contains both the data structures and the functions needed to implement complex arithmetic. When we define a new data type, it is often useful to redefine operators like +, -, *, or /. In our current example, we might define these operators to support complex arithmetic. Now we can write statements like:

Complex f = 2.43;
f *= 3.141592654;


Identity


Intro

Conventional database systems distinguish entities by the values they contain. Object oriented systems give each object its own identity. It can be distinguished from any other object, even if another object contains exactly the same values, and references can be made to any object. In C++ the identity of an object is its address, which can be used in pointer references throughout the program.


Natural modeling of relationships among objects


Intro

The major problem in large, complex software systems is managing the relationships among components. Objects are well suited to the natural modeling of relationships, which is a major reason that object oriented languages are so useful for large projects. Computer scientists often talk about modeling the world with ISA and HASA relationships. The ISA relationship represents the phrase "is a", and is used to denote set/subset relationships: a programmer ISA person, a laser printer ISA printer, and a help window ISA window. The HASA relationship represents the phrase "has a", and is used to denote components or relationships: a programmer HASA salary, a laser printer HASA current font, and a help window HASA border. The object oriented programming model has direct support for these concepts. ISA is modeled through the class hierarchy; since a programmer ISA person, Person will be the base class for Programmer. HASA is modeled either by containment or by pointer references. For instance, since a laser printer HASA current font, our LaserPrinter object might contain a Font object or a pointer to a Font object.

Our laser printer probably has lots of fonts besides the active one. We need some way to represent a set of references, which we might call a HASMANY relationship. Object oriented languages often do this through container classes, but there is no set of container classes available in all C++ implementations. Later we will show how we have solved this problem by implementing our own containers to be used with the database.


Summary


Intro

The strength of object oriented programming is that programs closely reflect the structure of the problem to be solved. Related code and data are grouped into objects, each of which has a clearly defined public interface. The logical relationships among objects can be explicitly stated using inheritance and polymorphism. Object oriented programs are modular, easy to understand, and easy to maintain.


Limits of Conventional Database System


Intro
Limited data types
Limited modeling of data relationships
No way of grouping code with data
Limited manipulation of data
Poor integration with programming languages
Summary

Database systems are designed for managing large amounts of data, and they provide many important features that object oriented programming languages do not: permanent storage, fast queries, sharing of objects among programs, device-independent formats, and sophisticated error handling for database operations. Relational database systems and table-oriented systems based on B-Tree or Indexed Sequential Access Method are the standard systems currently used in most software development. Each requires that all data be portrayed as a series of two dimensional tables. The relational model defines the structures, operations, and design principles to be used with these tables. These systems are quite appropriate for some applications and were a real breakthrough in their time, but software developers are rapidly learning that life is not a series of two dimensional tables. The growing complexity of modern programs and the increasing use of dynamic data models have pushed traditional databases to their limit. The limited data models they support can result in significant software development costs since they do not allow program designs that closely match the problem domain. They are not even worth considering for some application areas like Computer Aided Design, Computer Aided Engineering, Multimedia and Office Automation.


Limited data types


Intro

Modern software systems often contain data types which are not easily modeled using such predefined types. For example, a CAD program might have an array of shapes, or a desktop publishing program might model a page as a series of frames which may contain bit maps, paragraphs, or vector drawings. We have already seen that object oriented programs allow us to declare new data types as needed. Conventional databases have a fixed set of data types. The better systems include both simple data types like INTEGER, FLOAT, or CHAR and complex data types like DATE, TIME, or CURRENCY.

New data types cannot be added by the user. If your database does not have the data type you need, you are stuck. Aggregate data types like arrays are rarely available. The only way to group data is to put it in a table.


Limited modeling of data relationships


Intro

In conventional database systems each item is represented as a row in a table. Tables may be accessed sequentially or by searching for values. The only way to express relationships among items is by setting values in the rows. In each table one or more columns is chosen as the primary key; this must be unique for each row in the table. For instance, the primary keys for a student, a teacher, and a class might each be represented as identification numbers.

The relational model is weak when showing many-to-one relationships, which generally require the introduction of a new table. In our example, the only way to show which students are taking a class is to create an 'enrollment' table which has a row for each student and contains the student identification number and class identification number in each row. Since relational databases have no concept of hierarchy, it is difficult to model the ISA relationship. Suppose we have a 'people' table, a 'students' table, and a 'teachers' table. Every student is also a person, so a given student will have fields in both the 'people' table and the 'students' table. To update all of a student's information you must find the rows of each table whose identification numbers match. Every level in the hierarchy requires a new table, and every program using the database must update every relevant table appropriately. The hierarchy is not explicitly represented in the database; you simply have to know why the various tables are there.


No way of grouping code with data


Intro

We have already seen that object oriented programming languages allow related code and data to be combined to form objects. There is no way to do this in a conventional database system. If you know the name of a table you may use it, and the system will not prevent you from changing the wrong table. As long as you have the right password, everything in the database is globally accessible to all of your code.


Limited manipulation of data


Intro

Database languages are often very poor at manipulating data. SQL, for instance, provides an elegant query language but only limited operations for manipulating data, and these are not completely consistent. To be formal we would say that SQL is not computationally complete even though it is relationally complete; a normal human being might say that SQL is great for searching but lousy for anything else. Because of this, most serious relational database applications use languages like COBOL or C for data manipulation; database manipulation and queries are done using embedded SQL, an interface which allows SQL statements to be inserted in programs written in other languages.


Poor integration with programming languages


Intro

In the last paragraph we mentioned that most serious database applications are written in conventional programming languages - but these languages do not understand tables and rows, and SQL does not understand the data structures of other languages. Since the database and the host programming language use two different models and different data types, the programmer must constantly convert between the two systems.

The first method does not let the programmer use many features of the host language; the second means a great deal of overhead and frustration since the relationships among data must be constantly converted to support both programming models. Such a program has two distinct designs, one for the program itself and one for the database.


Summary


Intro

To store data in a conventional database it must be dissected into a series of two dimensional tables. Only predefined data types are supported. Object oriented programming languages have a rich set of features for creating data types and representing the relationships among data which are not supported in such databases.


Persistence and Object Oriented Databases for C++


Intro
Persistence
Persistence and object orientation
Resolving references
One-to-many references
Queries
Object management
Transient members
Summary

An object database is a database which fully supports the object oriented model. Like an object oriented programming language, it is designed for expressing the relationships among data. Like a conventional database, it is designed for managing large amounts of data and fast value-based queries. Persistence is a language extension which allows the programmer to store and retrieve objects. We have chosen to implement persistence as an extension to C++, which is widespread, portable, efficient, easily extensible, and particularly good at expressing the relationships among objects. You program in standard C++ and use your favorite compiler. Our language extensions are limited to the declaration syntax for persistent classes. We provide a precompiler, which converts your persistent classes to vanilla C++ code, and a class library, which implements the object database.

The rest of this chapter briefly discusses the basic features of POET. It is somewhat abstract - the next chapter will give you a concrete understanding of POET with short programs that show it actually being used.


Persistence


Intro

The original implementation of Smalltalk had a simple method for storing objects; the program's entire memory image could be dumped to disk and restored when running the program later. This scheme has some real advantages. It is very simple to implement, requires almost no effort from the programmer, and fully implements all aspects of the programming language (after all, the program sticks everything in memory somewhere!). It also has some real disadvantages. The number of objects that can be stored depends on the amount of available main memory, only the whole programming context may be stored and retrieved, objects may not be shared among programs or retrieved on another kind of computer, and there is no way to implement intelligent error recovery.

In POET, a class is persistent if it is defined using the 'persistent' keyword. Every object of a persistent class has the ability to store itself in a database.


Persistence and object orientation


Intro

If you store an object in a database and read it back, it should behave exactly as though it had never been stored. The object you read from the database must have the same identity, encapsulation, inheritance structure, polymorphism, and references as the original object. Many "object oriented" database systems flunk this test! POET correctly handles all aspects of an object's identity and behavior.


Resolving references


Intro

POET automatically converts your C++ pointers and references to a form that can be stored in the database. The objects or data to which they refer are also stored. This means that POET does not just store objects, it also stores the relationships among objects that are found in your C++ program. When you read an object from the database all references are resolved, the referenced objects or data are loaded into memory, and your pointers are set to the appropriate RAM addresses. This means that you can access objects directly using the C++ pointers in your object - everything you need is sitting at the right place in your RAM! If your data structures are densely connected or large, then you may want to decide when to load referenced data and objects. POET allows you to do this with on demand references.

To do this correctly POET needs to know the location and type of all pointers and references in your persistent objects. The PTXX precompiler parses your persistent class definitions and stores your class definitions in the database.


One-to-many references


Intro

Objects often need to reference many other objects. For instance, a father may have many children. C++ does not have a standard way for expressing one-to-many relationships. POET provides a container class called a set which can be used to hold a variable number of items. You can place sets in your objects to hold references when you have one-to-many relationships.


Queries


Intro

Using queries, you can find objects in your database. The result of a query is stored in a set. This set can be sorted based on any values in the object. Queries can specify values for any data member in an object, and can also specify values of objects referenced by an object. To speed up data access you can define indexes for your classes.


Object management


Intro

Each object may exist only once in memory. This ensures that changes made to an object in one part of a program will not be overwritten by another part of the same program. POET is careful to avoid duplicating objects. Whenever a database operation loads an object, POET first checks to see if it is already in memory. If so, it simply returns a pointer to the existing object.

Since each object may have any number of references to it, it is not safe to simply delete the object. But your memory fills up quickly if you never delete anything! POET uses a counter to keep track of the number of references made to each object. When you are done with a reference, you call the object's Forget() method. If there are no other active references, the object is deleted.


Transient members


Intro

Objects sometimes contain data or references to data that should not be stored. For instance, an object may contain a pointer to a bit image which is needed only temporarily. You can define these members to be transient so that they will not be stored in the database.


Summary


Intro

POET is integrated with the C++ programming language. It lets you program in standard C++ using your favorite compiler. POET is fully object oriented - the objects you read from the database look and act just like the objects you stored. Moreover, POET automatically resolves your pointer references and stores referenced objects and data in the database. This means that POET does not just store objects, it also stores the relationships among objects that are found in your C++ program. When you load an object these relationships are restored. POET also allows you to do value-based queries, just as you would in conventional database systems.


What can you expect from POET?


Intro

POET gives you an object database with full support for the semantics of C++. It is powerful and easy to use. When we say that POET is object oriented we mean that it uses classes and objects to provide these features:

Encapsulation

Inheritance

Polymorphism

User-defined data types

Identity

Natural modeling of relationships among objects

When we say that it is a database we mean that it provides these features:

Long term storage

Large capacity for storage

Value-based queries

Sharing objects among programs

Device-independent formats

Sophisticated error handling for database operations

We find that an object database should support:

Resolution of references in the program

One-to-many references

Value-based queries with sorted results

Indexes

Intelligent object management

Transient members

Object oriented programming is well suited for large and complex programs. Databases are well suited for large amounts of data and for fast value-based queries. POET gives you both. When you use POET you have all the tools you need for complex programs which access large amounts of data.


Copyright (c) 1996 POET Software, Inc. All rights reserved. Reproduction in whole or in part in any form or medium without the express permission of POET Software, Inc. is prohibited.