Are you one of those people who are sure that in the year 2050 most databases will be using the relational model and the SQL language? If not, this article is for you.
What is wrong with the relational model and why don't I like SQL?
Well the model does not seem to be bad and the language is good. However FORTRAN also worked very well. Why then develop other languages like C and C++? FORTRAN gurus probably believed that it was nonsense and called those working on new languages idiots. FORTRAN is well suited for math applications however it does not fare well for developing applications with advanced GUI. It worked well in character based screen environments but the interface has changed since the IBM370 times.
I think that somewhat the same situation is happening with relational databases. The relational model works well with the documents that came to computer world from being paper based. In the pre-computer era those documents were kept on cards or paper based tables. Such tables were often based on fixed number of columns and variable number of rows. The same paradigm was transferred to computers. Obviously E.F. Codd made a great contribution since he developed the mathematical foundation on which processing of such information is based. Operations on tables produce another table. That is all wonderful, but the shape of the table was still the same. I do not want to say that hierarcical or network models of data existing at the same time were better than the relational one. The relational model is an excellent match for financial application, however this model puts strong restrictions on the way the data can be organized. Representation of all information in relational form is not always possible and can be very artificial at best.
Let's take a table from a relational database, but tweak the relational paradigm a little bit and see what will happen:
One of the cornerstones of the relational model is atomicity of the field. But why can the field not be a structure? Let's say that this is possible. In that case the fields of that structure could be structures themselves. Looks like the field has acquired hierarchy and the table got another dimension. Such structuring of the information allows retrieval of all information about the object in one call.
C/C++ has a very powerful concept called pointers. A pointer does not itself hold value but points to the location where the value is held. In case of databases the pointer should point to the location of the information. This pointer is not the common key field that joins two relational tables. The key as it is does not exist. The pointer is resembles a link from one web page to another one.
The pointer holds the value of a unique identifier that can be translated to the address of the record. Therefore each record must have a unique identifier. We can retrieve information from another table using the pointer. The pointer links a field in one record and a record in another (or the same) table. When retrieving the information by pointer there is no need for a search, so the access rate is faster.
Let's move further. Why must the number of fields in different records be the same? Suppose some information is the same for every record but we also give each record some individuality. Actually the same goal can be achieved in the relational model but it will require creation of one extra table per new field. In our case we will extend our definition of record - the record will have constant set of fields with any regular relational table plus variable set of fields that we will call properties.
There is one more suggestion. Why should any particular field have the same type in each row of the table? Why can't the field hold its own definition? Whether the field is atomic or a structure, it can hold its own definitions.
One of the very important changes in the approach to programming during last 10-15 years was wide acceptance of object-oriented languages. I think that simple translation of that paradigm to the world of databases did not prove itself but there are some qualities that we can still borrow. The main building block of any object-oriented language is a class, which is roughly a set of data and methods that work with that data. Add methods (functions) to the data we have been discussing and the result resembles and OO class.
Now our table has a more complicated structure. It still has rows but the rows are of variable length and type. One can see that the relation model is a subset of the one proposed here. It also incorporates features of earlier hierarchical and network database models as well as includes the so-called Entity-Attribute-Value model.
We have one little problem here however. It is not very logical to name such a complex structure as table. I think it would be better to transform the term 'Table' to 'Class' and the term 'Record' to 'Object'.
So what do we finally have here?
1.Class has an arbitrary number of fields and methods.
2.The object (instance of the class) has a constant set of fields defined at the class level and variable set of fields that are pertinent to that particular instance.
3.The field can hold either a value or be a pointer to the object.
4.The field can be atomic or be a structure itself.
5.The structure of the field can be self-describing (holds its own schema)
Based on these principles, I have been developing a DBMS that I call ConteXt. During this process I have discovered that the major portion of application programmer time is spent on writing user interfaces. Usually the DBMS has only the "database engine" that supplies data access. An application programmer writes the interface between application and the "database engine" using SQL. This approach follows in the path of database products such as Access and FoxPro where tools for defining a user interface were included. ConteXt has a database browser (I call it DB-browser) that allows interaction between the end user and the data held in ConteXt. This browser incorporates various user interface and administration functions. The use of such tools changes the way to write database applications. Now the programmer does not write the whole application but just a few compoments to "help" the browser. All standard functions responsible for data display, manipulation, search, editing etc. are already incorporated into the browser. The programmer can create methods that are triggered when the user performs some actions. If there are no methods defined the browser invokes default actions.
To display information the browser uses so called forms. The forms are pertinent to a class and define how the instances of that class are being displayed. Class can have an unlimited number of associated forms. The database browser can also work over the web in the context of any standard web browser that supports Java Applets. Along with the regular fields ConteXt also has "virtual fields". They are not present in instances of the class but are being calculated at runtime. Other than that, there is virtually no difference between regular and virtual fields.
You can download evaluation versions ConteXt for Linux, FreeBSD and Windows from http://www.unixspace.com/context . There is much else that can be done. For example, the browser currently does not support self-describing structured fields.
If anybody wants to participate in this project I would welcome it.
You should read "The Third Manifesto" by C J Date and Hugh Darwen. Among other things they define the "Two Great Blunders" made by OO databases - and you have just committed both of them. They are, in no particular order:
- introducing pointers into the database
- believing that a table (relation variable) is a class (domain)
They also show that the ONE good thing that OO can contribute is user-defined datatypes, and that the relational model has always supported those anyway (though commercial DBMSs have not until recently, and still not well).
Before you try to overthrow relational, it is best to understand it. Yes, it will still be around in 2050. What you are proposing is a big step backwards to pre-relational, hierarchic databases with pointers.
ConteXt is not a hierarchic databases with pointers. ConteXt is not a pure OO database. The Context database model can be considered as a collection of object-oriented, network and semistructured models or as a some kind of object database. In other words this is a flexible model, you can use any type of database structure depending on task.
Concerning "believing that a table (relation variable) is a class (domain)"
It depend on what we call 'class'. The class in ConteXt is a directory that contains: a file - a common schema of object, executable file with methods, internal class containing forms, and one or several files containing a set of objects. We can call 'class' only the schema of the object and its associated methods. In this case a class does not include objects and it is not a variable. The problem is - we have the terms 'class' and 'object' (record) but we have no term for an array (set)
(Sorry, the last phrase in my previous message has been lost )
The problem is - we have the terms 'class' and 'object' (record) but we have no term for an array (set) of objects. Physically it is a file (or several files) but I have no idea how it should be named (extent?).