PDA

View Full Version : How to model searchable properties of an entity


Dilip Angal
08-13-04, 02:14
Hi
I have been struggeling with this question for a while.

Let us say I have Part# and it has very large number of flexible
attributes defined by the user, such as color, width, height, etc...
Depending on Part number, these attributes may change.

To make the things worse, they are searchable, meaning show me all the
parts which are blue in color and whose width is less than 10 inches,
etc.

Now how do I model my parts table.
If I make, name value pair, them my search will has as many self joins
as I have ands in my query.
If I model attributes as columns, then I don't even know all the
possible attributes I can have and besides, even if I know them, not
all attributes are applicable to all the parts and hence I have really
screwed up table

Any help will be appreciated.
Dilip

Alan
08-13-04, 08:04
"Dilip Angal" <dilip_angal@yahoo.com> wrote in message
news:df683587.0408122214.51852da3@posting.google.c om...
> Hi
> I have been struggeling with this question for a while.
>
> Let us say I have Part# and it has very large number of flexible
> attributes defined by the user, such as color, width, height, etc...
> Depending on Part number, these attributes may change.
>
> To make the things worse, they are searchable, meaning show me all the
> parts which are blue in color and whose width is less than 10 inches,
> etc.
>
> Now how do I model my parts table.
> If I make, name value pair, them my search will has as many self joins
> as I have ands in my query.
> If I model attributes as columns, then I don't even know all the
> possible attributes I can have and besides, even if I know them, not
> all attributes are applicable to all the parts and hence I have really
> screwed up table
>
> Any help will be appreciated.
> Dilip


This is a common problem, usually discussed in university courses. Is this
for a class assignment?

Dilip Angal
08-13-04, 14:20
"Alan" <not.me@uhuh.rcn.com> wrote in message news:<V62Tc.11115$dG.2722@trndny02>...
> "Dilip Angal" <dilip_angal@yahoo.com> wrote in message
> news:df683587.0408122214.51852da3@posting.google.c om...
> > Hi
> > I have been struggeling with this question for a while.
> >
> > Let us say I have Part# and it has very large number of flexible
> > attributes defined by the user, such as color, width, height, etc...
> > Depending on Part number, these attributes may change.
> >
> > To make the things worse, they are searchable, meaning show me all the
> > parts which are blue in color and whose width is less than 10 inches,
> > etc.
> >
> > Now how do I model my parts table.
> > If I make, name value pair, them my search will has as many self joins
> > as I have ands in my query.
> > If I model attributes as columns, then I don't even know all the
> > possible attributes I can have and besides, even if I know them, not
> > all attributes are applicable to all the parts and hence I have really
> > screwed up table
> >
> > Any help will be appreciated.
> > Dilip
>
>
> This is a common problem, usually discussed in university courses. Is this
> for a class assignment?

No Not for Class assignment. I have a real life need.
I would be curious to find out what are the solutions given in university courses.
Dilip

--CELKO--
08-13-04, 21:58
>> Let us say I have Part# and it has very large number of flexible
attributes defined by the user, such as color, width, height, etc...
<<

Fire the DBA immediately; users NEVER get to define the database and
this bum is not doing his job.

>> Depending on Part number, these attributes may change.<<

No, the values of the attributes will change. If the attributes
change, we have a new entity, by definition. Or a bad data model.

>> To make the things worse, they are searchable, meaning show me all
the parts which are blue in color and whose width is less than 10
inches, etc. <<

If the goods are that weird and changable, look at using a Google or
EBay document search engine. Put a relational inventory system on the
front end tpo track the document side of the house.

Very few businesses work this way, just as very people actually search
on oddball random attributes.

>> If I make, name value pair, them my search will has as many self
joins
as I have ands in my query. <<

This design error is called EAV (entity-attribute-value) and it is
common enough to have a name -- like "cancer" :)

The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.

Dilip Angal
08-14-04, 02:11
CELKO
May be I did not clarify my problem.
My business needs are I need to allow users to define attributes on
the fly.
This is not the DBA's fault.

For e.g Part# 1001 has 20 attributes and part # 1002 has 10.
User may choose to add 1 more to Part#20.
That does not change the part entity. It only adds the attribute.
If I do that I will have as many entities as number of parts I have in
database.

ABout your point, that data model is screwed is,.
well that is my exact question. What should be the data model for this
business neeed.

In your own example, If I had 200 cars and each car having one or
other attribute different than others, (different meaning, it is
specified for one car but not for other) then I will end up with 200
tables.
In my case I have 100,000 parts. I will end up with 100,000 tables.

Dilip

celko212@earthlink.net (--CELKO--) wrote in message news:<18c7b3c2.0408131758.65f91710@posting.google.com>...
> >> Let us say I have Part# and it has very large number of flexible
> attributes defined by the user, such as color, width, height, etc...
> <<
>
> Fire the DBA immediately; users NEVER get to define the database and
> this bum is not doing his job.
>
> >> Depending on Part number, these attributes may change.<<
>
> No, the values of the attributes will change. If the attributes
> change, we have a new entity, by definition. Or a bad data model.
>
> >> To make the things worse, they are searchable, meaning show me all
> the parts which are blue in color and whose width is less than 10
> inches, etc. <<
>
> If the goods are that weird and changable, look at using a Google or
> EBay document search engine. Put a relational inventory system on the
> front end tpo track the document side of the house.
>
> Very few businesses work this way, just as very people actually search
> on oddball random attributes.
>
> >> If I make, name value pair, them my search will has as many self
> joins
> as I have ands in my query. <<
>
> This design error is called EAV (entity-attribute-value) and it is
> common enough to have a name -- like "cancer" :)
>
> The classic scenario calls for a root class with all the common
> attributes and then specialized sub-classes under it. As an example,
> let's take the class of Vehicles and find an industry standard
> identifier (VIN), and add two mutually exclusive sub-classes, Sport
> utility vehicles and sedans ('SUV', 'SED').
>
> CREATE TABLE Vehicles
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) NOT NULL
> CHECK(vehicle_type IN ('SUV', 'SED')),
> UNIQUE (vin, vehicle_type),
> ..);
>
> Notice the overlapping candidate keys. I then use a compound candidate
> key (vin, vehicle_type) and a constraint in each sub-class table to
> assure that the vehicle_type is locked and agrees with the Vehicles
> table. Add some DRI actions and you are done:
>
> CREATE TABLE SUV
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
> CHECK(vehicle_type = 'SUV'),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Vehicles(vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> CREATE TABLE Sedans
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
> CHECK(vehicle_type = 'SED'),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Vehicles(vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> I can continue to build a hierarchy like this. For example, if I had
> a Sedans table that broke down into two-door and four-door sedans, I
> could a schema like this:
>
> CREATE TABLE Sedans
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
> CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Vehicles(vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> CREATE TABLE TwoDoor
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
> CHECK(vehicle_type = '2DR'),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Sedans(vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> CREATE TABLE FourDoor
> (vin CHAR(17) NOT NULL PRIMARY KEY,
> vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
> CHECK(vehicle_type = '4DR'),
> UNIQUE (vin, vehicle_type),
> FOREIGN KEY (vin, vehicle_type)
> REFERENCES Sedans (vin, vehicle_type)
> ON UPDATE CASCADE
> ON DELETE CASCADE,
> ..);
>
> The idea is to build a chain of identifiers and types in a UNIQUE()
> constraint that go up the tree when you use a REFERENCES constraint.
> Obviously, you can do variants of this trick to get different class
> structures.
>
> If an entity doesn't have to be exclusively one subtype, you play with
> the root of the class hierarchy:
>
> CREATE TABLE Vehicles
> (vin CHAR(17) NOT NULL,
> vehicle_type CHAR(3) NOT NULL
> CHECK(vehicle_type IN ('SUV', 'SED')),
> PRIMARY KEY (vin, vehicle_type),
> ..);
>
> Now start hiding all this stuff in VIEWs immediately and add an
> INSTEAD OF trigger to those VIEWs.

mAsterdam
08-14-04, 04:05
Dilip Angal wrote:

> CELKO

Mind if I join?

> May be I did not clarify my problem.
> My business needs are I need to allow users to define attributes on
> the fly.

To me this suggests: for this requirement use searchable
documents with tags, not tables. 'Attribute' has several
meanings.

> This is not the DBA's fault.
>
> For e.g Part# 1001 has 20 attributes and part # 1002 has 10.
> User may choose to add 1 more to Part#20.

> That does not change the part entity. It only adds the attribute.
> If I do that I will have as many entities as number of parts I have in
> database.

> ABout your point, that data model is screwed is,.
> well that is my exact question. What should be the data model for this
> business neeed.
>
> In your own example, If I had 200 cars and each car having one or
> other attribute different than others, (different meaning, it is
> specified for one car but not for other) then I will end up with 200
> tables.
> In my case I have 100,000 parts. I will end up with 100,000 tables.

Model what you *can*, not what you cannot.


BTW could you please stop topposting?

Todd B
08-14-04, 13:08
dilip_angal@yahoo.com (Dilip Angal) wrote in message news:<df683587.0408122214.51852da3@posting.google.com>...
> Hi
> I have been struggeling with this question for a while.
>
> Let us say I have Part# and it has very large number of flexible
> attributes defined by the user, such as color, width, height, etc...
> Depending on Part number, these attributes may change.
>
> To make the things worse, they are searchable, meaning show me all the
> parts which are blue in color and whose width is less than 10 inches,
> etc.
>
> Now how do I model my parts table.
> If I make, name value pair, them my search will has as many self joins
> as I have ands in my query.
> If I model attributes as columns, then I don't even know all the
> possible attributes I can have and besides, even if I know them, not
> all attributes are applicable to all the parts and hence I have really
> screwed up table
>
> Any help will be appreciated.
> Dilip

create table inventory (
part_number int not null primary key;
part_name varchar(100)
);

create table attributes (
attribute_name varchar(20) not null primary key
);

create table attributed (
attribute_name varchar(20) not null,
part_number int not null,
foreign key attribute_name references attributes (attribute_name) on
update cascade on delete cascade,
foreign key part_number referenes inventory (part_number) on update
cascade on delete cascade,
primary key (attribute_name, part_number)
);

Of course, this only works if all of your attributes are of the type
varchar(20) and only allow an attribute on a single part once (i.e.
you can't be both 'blue' and in addition 'blue').

Also, if you have a hierarchical model (like in a bill of materials),
you might be better off looking into nested sets or nested intervals.
In that case, see some of Joe Celko's or Vadim Tropashko's work. It
would require, however, some procedural code, if you are not
disinclined to that.

Todd

Dawn M. Wolthuis
08-14-04, 13:59
"Dilip Angal" <dilip_angal@yahoo.com> wrote in message
news:df683587.0408132211.1ef5dc39@posting.google.c om...
> CELKO
> May be I did not clarify my problem.
> My business needs are I need to allow users to define attributes on
> the fly.
> This is not the DBA's fault.

Scenarios such as this are really quite common and typically toss SQL-DBMS
folks into either significant design errors or dog ugly designs (from my
perspective, that is, although they are elegant in their own way, see
Celko).

If you are saddled with an RDBMS, then this might not be helpful at all, but
if you are positioned to use an old-fashioned non-RDBMS solution (or a new
agile database, perhaps) you might want a scenario like this:

Don't have users defining stored data (what would be base tables in an
RDBMS) but permit them to define derived data, aka UDF's, aka virtual
fields. In particular, they could be granted the ability to provide a
synonym for an attribute and a redefinition of the type for that attribute
(DBA's are rolling their eyes). Then I can provide attributes named ATTR1,
ATTR2, ATTR3, etc and allow these to be redefined (by the user) as either
single-valued attributes or multi-valued lists of whatever type is needed.
This is a common approach when using IBM U2 databases, for example (UniData
or UniVerse) and is employed by at least one application software vendor I
know who has the best profitability over time (> 30 years) of just about any
application software provider (IIRC).

Although I would not recommend it, you could even permit the same "dummy
field" to be redefined differently for different records, with synonyms for
the very same field, even if it is employed as a number for one subclass of
this entity and alpha for another.

It is this type of flexibility and the related maintenance complexities that
RDBMS's were built to avoid. However, they seem to have thrown the baby out
with the bath water.

Signed --old DP professional

--CELKO--
08-14-04, 15:02
>> May be I did not clarify my problem. My business needs are I need
to allow users to define attributes on the fly. This is not the DBA's
fault. <<

But it is his responsibility. You do not let users define attributes
on the fly. NEVER.

If you do, you will soon have "color" from an American, "colour" from
a Brit, "clr" from a geek who likes to abbreviate, and "clor" from
someone with bad typing skills. The values will "red" from the
English speakers, "rouge" from the Francophones and the Pantone color
number from the people who really use colors (http://www.pantone.com).
Likewise, numeric values will be given on different scales and to
different precision.

Data integrity lasts about a year in actual use with this kind of
design.

>> That does not change the part entity. It only adds the attribute.
<<

What is the BASIC definition of an entity in the RDBMS? An entity
*IS* defined by a single set of attributes and it is modeled in a
table with columns that have those attributes.

The whole point is that this no longer the original Parts table when
it changes. If I started with a wooden charriot, added a motor,
rubber tires, a metal body, etc. would it still be that same charriot?
Nope, I converted it into an automobile.

You are thinking like an OO person in a relational database.

>> About your point, that data model is screwed, well that is my exact
question. What should be the data model for this business need. <<

I gave you one way to do class heirarchies with PK-FK references. The
overhead is pretty bad if the nesting gets deep or very wide. But you
have to actually design each final sub-class.

>> In your own example, If I had 200 cars and each car having one or
other attribute different than others, (different meaning, it is
specified for one car but not for other) then I will end up with 200
tables. <<

Yes, this is why we have a limited range of types of vehicles and
options, which are defined by law for tax purposes, marketing, etc.
The customizations are off to the side in 200 small tables, if you
really needed huge amounts of data on them. Any answer is going to be
VERY industry-specific.

The ultimate "high volume, high turnover rate, mixed inventory"
problem is EBay. They use text for their descriptions and do document
management, not RDBMS. They do let the users write their own data,
but use parsing routines to put the item in a category and extract key
words for text searching.

Laconic2
08-14-04, 15:47
"--CELKO--" <jcelko212@earthlink.net> wrote in message
news:18c7b3c2.0408141102.52d7eb35@posting.google.c om...
> >> May be I did not clarify my problem. My business needs are I need
> to allow users to define attributes on the fly. This is not the DBA's
> fault. <<
>
> But it is his responsibility. You do not let users define attributes
> on the fly. NEVER.

I get your point, but the attitude you recommend sometimes just does not
prevail in the real world.

It reminds me of the lawyer, commenting on some celebrity case, who says,
"I would never permit my client to talk to the press." That's a good
poistion to take, but the lawyer isn't in control, much as he'd like to be.

It would be better to simply tell such a person what is going to go wrong.
You did this, quite well, in the rest of the post that i quoted above.

Dilip Angal
08-15-04, 22:04
I still don't know how to convince you guys.
You are all pointing to the process issues that users may give
different spellings for color.
I can handle that part by having an admin to create the properties who
makes sure that the property being added to the part is really
required and is not used in past by any one in any different context.

But the basic fact does not change.
I need to keep adding these properties from time to time because user
communitiy can not decide all of them upfront.

Also, I may have 500 such properties and I can have up to 1M different
part numnbers. This will give you some idea of complexity.
User may choose, show me all the parts with width 10 inches, and price
less than $10 and .... can go on for ever. I need to come back in
resonable time (Reasonable can be couple of seconds)

Also note that the Search engine techniques used by EBya like company
works only if these attributes are no updated. If they are ever
updated, Search engines like inverted index fails miserably.

So, please try to understand my problme and accept it as it is and if
you have solution to solve it, please let me know.
Dilip

toddkennethbenson@yahoo.com (Todd B) wrote in message news:<ef8e4d1e.0408140908.6cef8c9f@posting.google.com>...
> dilip_angal@yahoo.com (Dilip Angal) wrote in message news:<df683587.0408122214.51852da3@posting.google.com>...
> > Hi
> > I have been struggeling with this question for a while.
> >
> > Let us say I have Part# and it has very large number of flexible
> > attributes defined by the user, such as color, width, height, etc...
> > Depending on Part number, these attributes may change.
> >
> > To make the things worse, they are searchable, meaning show me all the
> > parts which are blue in color and whose width is less than 10 inches,
> > etc.
> >
> > Now how do I model my parts table.
> > If I make, name value pair, them my search will has as many self joins
> > as I have ands in my query.
> > If I model attributes as columns, then I don't even know all the
> > possible attributes I can have and besides, even if I know them, not
> > all attributes are applicable to all the parts and hence I have really
> > screwed up table
> >
> > Any help will be appreciated.
> > Dilip
>
> create table inventory (
> part_number int not null primary key;
> part_name varchar(100)
> );
>
> create table attributes (
> attribute_name varchar(20) not null primary key
> );
>
> create table attributed (
> attribute_name varchar(20) not null,
> part_number int not null,
> foreign key attribute_name references attributes (attribute_name) on
> update cascade on delete cascade,
> foreign key part_number referenes inventory (part_number) on update
> cascade on delete cascade,
> primary key (attribute_name, part_number)
> );
>
> Of course, this only works if all of your attributes are of the type
> varchar(20) and only allow an attribute on a single part once (i.e.
> you can't be both 'blue' and in addition 'blue').
>
> Also, if you have a hierarchical model (like in a bill of materials),
> you might be better off looking into nested sets or nested intervals.
> In that case, see some of Joe Celko's or Vadim Tropashko's work. It
> would require, however, some procedural code, if you are not
> disinclined to that.
>
> Todd

pstnotpd
08-16-04, 05:37
I've done something like this a couple of years ago. What I ended up
doing was creating tables for meta-data, somewhat like Todd is
suggesting, but more elaborate. This came rather naturally as we were
trying to implement a product hierarchy in which a certain product class
had certain properties and specific products could have seperate properties.

This would allow you to add a part as an instance of a pre-defined part-
class and add your own attributes to it as well. I must agree that this
is an OO way of thinking translated to relational, but it's not mutually
exclusive.


Dilip Angal wrote:

> I still don't know how to convince you guys.
> You are all pointing to the process issues that users may give
> different spellings for color.
> I can handle that part by having an admin to create the properties who
> makes sure that the property being added to the part is really
> required and is not used in past by any one in any different context.
>
> But the basic fact does not change.
> I need to keep adding these properties from time to time because user
> communitiy can not decide all of them upfront.
>
> Also, I may have 500 such properties and I can have up to 1M different
> part numnbers. This will give you some idea of complexity.
> User may choose, show me all the parts with width 10 inches, and price
> less than $10 and .... can go on for ever. I need to come back in
> resonable time (Reasonable can be couple of seconds)
>
> Also note that the Search engine techniques used by EBya like company
> works only if these attributes are no updated. If they are ever
> updated, Search engines like inverted index fails miserably.
>
> So, please try to understand my problme and accept it as it is and if
> you have solution to solve it, please let me know.
> Dilip

Laconic2
08-16-04, 09:29
"Dilip Angal" <dilip_angal@yahoo.com> wrote in message
news:df683587.0408151804.e7bbb5a@posting.google.co m...
> I still don't know how to convince you guys.
> You are all pointing to the process issues that users may give
> different spellings for color.
> I can handle that part by having an admin to create the properties who
> makes sure that the property being added to the part is really
> required and is not used in past by any one in any different context.
>
Thank you for clarifying the issues.

By "admin", do you mean a DBA?

If so, I agree with you that that's the person who should control traffic,
and add new properties when they are needed,
and are not old properties in a new disguise. Having this under a DBA's
control is better than having it under the application programmer's control,
the users' control, or out of control.

The question is how is it best to exercise this control?


> But the basic fact does not change.
> I need to keep adding these properties from time to time because user
> communitiy can not decide all of them upfront.

In a classical database, new properties are discovered over time, but at a
much slower rate than you are suggesting.

In a classical approach, a truly new property is added to the database by
the DBA using ALTER TABLE ... ADD COLUMN, or in extreme cases CREATE TABLE.
The DBA is essentially revising and extending the schema so as to evolve the
schema in parallel with the evolving understanding of the subject matter.

The reason OO programmers don't like this is that they don't want to deal
with a dynamically evolving schema. They would either have to revise and
extend their programs after they are already in the field, or code a much
more complex application to deal with an evolving schema.

So they propose a compromise: a static schema that can descibe a
dynamically evolving subject matter. The problem is that that doesn't
eliminate the pain. It simply transfers it to someone else. Your admin is
goin to be a very busy fellow. He's basically forced to maintain in data
what a classical database would have maintained in metadata.

It gets worse. A year and a half down the road, performing useful queries
on this data is going to be like looking for the body of a murder victim in
a landfill. The container wasn't built with that kind of search in mind.

Tony
08-16-04, 10:11
dilip_angal@yahoo.com (Dilip Angal) wrote in message news:<df683587.0408151804.e7bbb5a@posting.google.com>...
> I still don't know how to convince you guys.
> You are all pointing to the process issues that users may give
> different spellings for color.
> I can handle that part by having an admin to create the properties who
> makes sure that the property being added to the part is really
> required and is not used in past by any one in any different context.
>
> But the basic fact does not change.
> I need to keep adding these properties from time to time because user
> communitiy can not decide all of them upfront.

You can add new columns to tables from time to time? Your application
could use the data dictionary to determine what columns exist (dynamic
SQL).

> Also, I may have 500 such properties and I can have up to 1M different
> part numnbers. This will give you some idea of complexity.
> User may choose, show me all the parts with width 10 inches, and price
> less than $10 and .... can go on for ever. I need to come back in
> resonable time (Reasonable can be couple of seconds)

These cannot be realistic requirements! What user is going to be
looking for anything 10 inches wide and costing under $10? Unless
they just want something to use for a door-stop, and don't care
whether it is a hammer or a toaster...

> Also note that the Search engine techniques used by EBya like company
> works only if these attributes are no updated. If they are ever
> updated, Search engines like inverted index fails miserably.
>
> So, please try to understand my problme and accept it as it is and if
> you have solution to solve it, please let me know.

I think people have understood, and given useful advice. Either you
fully model your data (1000 different tables for 1000 different
entities or whatever it is), or you don't (e.g. you use the ghastly
"EAV" approach). Or you adopt a "middle ground" and fully model the
most important, common and stable features (e.g. price - everything
has a price, right?), then add some flexibility for the unforeseen
(which could be free text, EAV or whatever).

pstnotpd
08-16-04, 10:13
> In a classical database, new properties are discovered over time, but at a
> much slower rate than you are suggesting.
Perhaps, but they're getting more 'classical' by the day.

> In a classical approach, a truly new property is added to the database by
> the DBA using ALTER TABLE ... ADD COLUMN, or in extreme cases CREATE TABLE.
> The DBA is essentially revising and extending the schema so as to evolve the
> schema in parallel with the evolving understanding of the subject matter.
Yep

> The reason OO programmers don't like this is that they don't want to deal
> with a dynamically evolving schema. They would either have to revise and
> extend their programs after they are already in the field, or code a much
> more complex application to deal with an evolving schema.
>
> So they propose a compromise: a static schema that can descibe a
> dynamically evolving subject matter. The problem is that that doesn't
> eliminate the pain. It simply transfers it to someone else. Your admin is
> goin to be a very busy fellow. He's basically forced to maintain in data
> what a classical database would have maintained in metadata.
I don't think I agree on this one. What Dilip is asking for is a way to
allow part 'attributes' to be governed by other tables. Something which
is 'doable' in the relational sense. Your ADD COLUMN approach will
require alot of superflous columns in the part table. If someone wants
to add an attribute 'diameter' for a part 'bolt', this would hardly be
applicable for the part 'chicken'. I think it's possible to do this
properly, the performance question aside.

> It gets worse. A year and a half down the road, performing useful queries
> on this data is going to be like looking for the body of a murder victim in
> a landfill. The container wasn't built with that kind of search in mind.
That's very true.

Tony
08-16-04, 10:16
pstnotpd <schoenmakers@tpd.tno.nl> wrote in message news:<cfpv8e$55q$1@voyager.news.surf.net>...
> I've done something like this a couple of years ago. What I ended up
> doing was creating tables for meta-data, somewhat like Todd is
> suggesting, but more elaborate. This came rather naturally as we were
> trying to implement a product hierarchy in which a certain product class
> had certain properties and specific products could have seperate properties.

The DBMS already contains meta-data in the form of the data
dictionary, which tells you what attributes (columns) each entity type
(table) has, and what values are permitted there (constraints). Why
not use that rather than reinvent it?

pstnotpd
08-16-04, 10:20
> The DBMS already contains meta-data in the form of the data
> dictionary, which tells you what attributes (columns) each entity type
> (table) has, and what values are permitted there (constraints). Why
> not use that rather than reinvent it?

Because the DD contains meta-data about the DBMS tables itself, which I
didn't want.

Todd B
08-16-04, 13:14
dilip_angal@yahoo.com (Dilip Angal) wrote in message news:<df683587.0408151804.e7bbb5a@posting.google.com>...
> I still don't know how to convince you guys.
> You are all pointing to the process issues that users may give
> different spellings for color.

Yes, everyone sees a lot of threads along those lines in this group.

> I can handle that part by having an admin to create the properties who
> makes sure that the property being added to the part is really
> required and is not used in past by any one in any different context.
>
> But the basic fact does not change.
> I need to keep adding these properties from time to time because user
> communitiy can not decide all of them upfront.

I made a simple suggestion to you, but of course, as it seems in many
of these groups, until you are very specific, than you will find only
simple suggestions.

> Also, I may have 500 such properties and I can have up to 1M different
> part numnbers. This will give you some idea of complexity.
> User may choose, show me all the parts with width 10 inches, and price
> less than $10 and .... can go on for ever. I need to come back in
> resonable time (Reasonable can be couple of seconds)

From your description so far, this doesn't really sound that
complicated, even by the size that you're talking about.

> Also note that the Search engine techniques used by EBya like company
> works only if these attributes are no updated. If they are ever
> updated, Search engines like inverted index fails miserably.
>
> So, please try to understand my problme and accept it as it is and if
> you have solution to solve it, please let me know.
> Dilip

I'm trying to understand your problem, but I don't think it was
flushed out as of yet (i.e. I don't think I have enough information
about your situation). Or maybe I'm being stupid; that's a definite
posibility :)

Todd

Tony
08-16-04, 16:15
pstnotpd <schoenmakers@tpd.tno.nl> wrote in message news:<cfqfrt$fir$1@voyager.news.surf.net>...
> > The DBMS already contains meta-data in the form of the data
> > dictionary, which tells you what attributes (columns) each entity type
> > (table) has, and what values are permitted there (constraints). Why
> > not use that rather than reinvent it?
>
> Because the DD contains meta-data about the DBMS tables itself, which I
> didn't want.

So what sort of meta-data do you want then? I may be missing something here.

pstnotpd
08-17-04, 02:50
> So what sort of meta-data do you want then? I may be missing something here.
The meta data describing the part attributes. Maybe I'm using the wrong
word here. It's more like a template.

B.T.W. I'm not trying to be a zealot here, I just think Dilip's problem
can be solved because I've done something similar.

Laconic2
08-17-04, 08:58
"pstnotpd" <schoenmakers@tpd.tno.nl> wrote in message
news:cfs9rq$r41$1@voyager.news.surf.net...
> > So what sort of meta-data do you want then? I may be missing something
here.

> The meta data describing the part attributes. Maybe I'm using the wrong
> word here. It's more like a template.

data that describes parts is data.

data that describes data is meta data.

I'm not sure exactly what you mean by a template. I've used the term myself
in home grown tools I've built to improve my productivity when the client's
clock is running. But you may mean something completely different.




>
> B.T.W. I'm not trying to be a zealot here, I just think Dilip's problem
> can be solved because I've done something similar.

IMO, nobody is trying to be a zealot here. Tony doesn't come across as a
zealot either.

If I come across as a zealot, it's not intentional. I've seen the solution
you outline, after it's been in the field for about a year, and I don't
think you fully appreciate the downside of your solution.

Mixing data about more than one attribute in a single column, and then
using metadata stored in a nearby column to unscramble the first column has
lots of downsides. There are downsides for revising and extending the
application, downsides for performance, downsides for the learning curve
of new people who have to use the data, etc.

That doesn't mean your solution can't work. It does mean that these
downsides need to be evaluated and mitigated before accepting your solution.

pstnotpd
08-17-04, 09:10
Laconic2 wrote:
> data that describes parts is data.
True

> data that describes data is meta data.
True

> I'm not sure exactly what you mean by a template. I've used the term myself
> in home grown tools I've built to improve my productivity when the client's
> clock is running. But you may mean something completely different.
I do

> Mixing data about more than one attribute in a single column, and then
> using metadata stored in a nearby column to unscramble the first column has
> lots of downsides. There are downsides for revising and extending the
> application, downsides for performance, downsides for the learning curve
> of new people who have to use the data, etc.
Won't argue about that. Elegant is not a word I would use to describe
this one.

> That doesn't mean your solution can't work. It does mean that these
> downsides need to be evaluated and mitigated before accepting your solution.
Yep.

I think Dilips main problem is trying to get an OO mindset translated to
relational. Did we ever hear what kind of DB he's actually using?

I for one would be really interested in another way of doing this
without having to define a load of tables or 1 table with a load of
columns which will be mostly NULL by nature.

Laconic2
08-17-04, 10:16
"pstnotpd" <schoenmakers@tpd.tno.nl> wrote in message
news:cft046$627$1@voyager.news.surf.net...
> Laconic2 wrote:

> > I'm not sure exactly what you mean by a template. I've used the term
myself
> > in home grown tools I've built to improve my productivity when the
client's
> > clock is running. But you may mean something completely different.
> I do

Let's see if we can sort out what we each mean by "template". There must be
some overlap. Neither you nor I came to this term in a vacuum.


When I built a "template processor" for myself, here's what I was faced
with:

I wanted a tool that I could use to expand a template into a series of
texts, based on a list of "actual values".
The template consisted of "fixed text" and "formal parameters" with a way
to tell them apart.
The expander would simply read the template once for each entry in the list
of actual values. The output of this pass
would contain the fixed text, and the actual values from the list entry
instead of the formal parameters.

Turned out to be easy to implement in a language that was lass than a true
programming language, but available at all the sites I visited.

The process is a little like a mail merge process, although I learned it
from people who used it for defining macros in assembly language. The
expansion of the macros ended up doing something like what you get out a
mail merge, except that it operated on code rather than on a form letter.

Turns out to be spectacularly useful, in the right hands. Since I was
building it for myself, I made no attempt to make it "idiot proof". Turns
out to be very easy to make data lists from the output of SQL statements,
and pretty easy to make templates by abstracting a repetitive process.

Thing is, that in this tool, I freely and knowingly blended data and meta
data in ways that, I claim, are not good to do with publicly shared data.
However this blending was carefully "encapsulated" inside my tool. By the
time the output was deliverable, it was in a form that could easily be
explained.

Perhaps you could tell me a little more what you mean by a "template".

Laconic2
08-17-04, 10:33
"pstnotpd" <schoenmakers@tpd.tno.nl> wrote in message
news:cft046$627$1@voyager.news.surf.net...


> I think Dilips main problem is trying to get an OO mindset translated to
> relational. Did we ever hear what kind of DB he's actually using?
>

In the original question Dilip mentioned the alternative of storing each
attribute in a column of its own. That sure sounds like SQL and relational
to me. I guess I sort of took it for granted from that point on. But, now
that you mention it, I don't recall him directly saying.

Trying to reconcile the OO mindset and the relational mindset is a quandary
that professionals have faced tens of thousands of times in the last 14
years or so. I have tried to describe each of these two mindsets to both
camps more than once in public forums ( forums =fora ?). I usually
simplify each description to the point where the mindset's adherents think
my description is caricature. And each group tends think of the other
group as some malignant mixture of crazy, stupid, ignorant, and evil.

This is actually a very common reaction when two cultures meet.

I usually end up with everybody mad at me, and ready to roast me. I'm a
little hesitant to try it again.

> I for one would be really interested in another way of doing this
> without having to define a load of tables or 1 table with a load of
> columns which will be mostly NULL by nature.

What's wrong with "a load of tables"? If each table relates to a
"proposition" expressed as a "relation", then if you have a lot of
propositions, perhaps you ought to have a lot of tables.

It continues to baffle me why people think that terabytes of data is "not
complex", but a hundred tables or so is beyond human comprehension. But
that probably reflects the fact that I'm closer to the "data centric"
mindset than I am to the OO mindset.

Rene Hartmann
08-17-04, 15:04
On Tue, 17 Aug 2004, Laconic2 wrote:
>
> What's wrong with "a load of tables"? If each table relates to a
> "proposition" expressed as a "relation", then if you have a lot of
> propositions, perhaps you ought to have a lot of tables.
>

I am also wondering why many people feel so unformfortable about
having may tables. For most OO people, it is quite natural to have
many classes, and to have "complex objects" which consist of many
"sub-objects". But when it's about a relational model of the data, they
say: "Oh look, how complex and clumsy this is - the data is spread over so
many tables!"

Seems that there is some psychological problem about it. OO
Programmers feel good if they have all their data in a single "object", no
matter how complex that objects is and of how many sub-objects it is
a kind of "object", so they feel uncomfortable if data is spread over
several rows. Joining these rows together appears a clumsy operation to
them, though it's logically a simple operation and also not necessarily
a slow one. (Though it may sometimes be slow in practice)

--
Rene=B4 Hartmann

Rene Hartmann
08-17-04, 16:01
I=B4m sory - somehow a line was deleted from my posting, so I fix it here:
>
> Seems that there is some psychological problem about it. OO
> Programmers feel good if they have all their data in a single "object", n=
o
> matter how complex that objects is and of how many sub-objects it is
composed of. To them, a row in a relational table is
> a kind of "object", so they feel uncomfortable if data is spread over
> several rows. Joining these rows together appears a clumsy operation to
> them, though it's logically a simple operation and also not necessarily
> a slow one. (Though it may sometimes be slow in practice)

--
Rene=B4 Hartmann

Laconic2
08-17-04, 16:22
"Rene Hartmann" <rehartmann@t-online.de> wrote in message
news:Pine.LNX.4.33.0408172050260.1642-100000@linux.local...
> I am also wondering why many people feel so unformfortable about
> having may tables. For most OO people, it is quite natural to have
> many classes, and to have "complex objects" which consist of many
> "sub-objects". But when it's about a relational model of the data, they
> say: "Oh look, how complex and clumsy this is - the data is spread over so
> many tables!"

I think there are many reasons. But the one that strikes me as the deepest
one is that
object oriented and data centric people have a fundamentally different way
of thinking about
abstraction.

Abstraction, in general, is the deliberate omission from consideration of
certain details, that are seen as immaterial to the matter at hand.
Basically, the human mind can't cope with "reality" unless some details
are omitted.

There is a special form of abstraction in the OO world called,
"encapsulation". Most readers will already know what encapsulation is, so
I won't try to define it. But basically, "encapsulation" facilitates
abstraction by hiding information that ought to be hidden, for some reason.
If that information changes later on, the consequenses are contained
precisely because the information was hidden.

Data Centric people (like myself) go for a different special form of
abstraction called "data independence". Basically a query's results (and to
some extent it's performance) should not depend on features of the data
unused by the query. Thus, if a query pulls last names, first names, and
phone numbers out of a database, it should work the same if someone comes
along and changes all the zip codes from 5 digits to 9 digits.

Encapsulation and Data Independence both protect systems from what has been
called "the ripple effect". (You throw a rock in the middle of a pond, and
pretty soon, there are ripples all over the surface. ) But they do so in
such radically different ways that each of them looks counterproductive, if
not downright dangerous, to the other camp.

mAsterdam
08-17-04, 18:01
Laconic2 wrote:

> Rene Hartmann wrote:
>
>>I am also wondering why many people feel so unformfortable about
>>having may tables. For most OO people, it is quite natural to have
>>many classes, and to have "complex objects" which consist of many
>>"sub-objects". But when it's about a relational model of the data,
>>they say: "Oh look, how complex and clumsy this is - the data is
>> spread over so many tables!"
>
> I think there are many reasons. But the one that strikes me as
> the deepest one is that object oriented and data centric people
> have a fundamentally different way of thinking about abstraction.

Different? From what? I guess you meant from thinking in
processes or from object thinking, and where they are not the
same I'll assume object thinking.

> Abstraction, in general, is the deliberate omission from consideration of
> certain details, that are seen as immaterial to the matter at hand.
> Basically, the human mind can't cope with "reality" unless some details
> are omitted.

That is the common ground - a good place to start (re)building the
framework, I think.

> There is a special form of abstraction in the OO world called,
> "encapsulation". Most readers will already know what encapsulation is, so
> I won't try to define it. But basically, "encapsulation" facilitates
> abstraction by hiding information that ought to be hidden, for some reason.
> If that information changes later on, the consequenses are contained
> precisely because the information was hidden.

Encapsulation is one form of information hiding in order to
modularise. The basis for modularisation in OO is
somewhat anthropomorphic: the system is thought of as
pieces of data with behaviour. These pieces interact.
They live within running programs.

> Data Centric people (like myself) go for a different special form of
> abstraction called "data independence". Basically a query's results (and to
> some extent it's performance) should not depend on features of the data
> unused by the query. Thus, if a query pulls last names, first names, and
> phone numbers out of a database, it should work the same if someone comes
> along and changes all the zip codes from 5 digits to 9 digits.

> Encapsulation and Data Independence both protect systems from what has been
> called "the ripple effect". (You throw a rock in the middle of a pond, and
> pretty soon, there are ripples all over the surface. ) But they do so in
> such radically different ways that each of them looks counterproductive, if
> not downright dangerous, to the other camp.

You speak of camps - I admit I recognise them.
Yet - both ways of thinking are necessary to develop
a working system. Data lying around uselessly or objects having
significance only within one program-lifetime don't count
as a useful system.

pstnotpd
08-18-04, 02:03
A 'load of tables' holding the same abstraction (in this case 'part')
usually result in report queries having to UNION over these tables.

Rene Hartmann wrote:

> On Tue, 17 Aug 2004, Laconic2 wrote:
>
>>What's wrong with "a load of tables"? If each table relates to a
>>"proposition" expressed as a "relation", then if you have a lot of
>>propositions, perhaps you ought to have a lot of tables.

pstnotpd
08-18-04, 02:28
Rene Hartmann wrote:

> I am also wondering why many people feel so unformfortable about
> having may tables. For most OO people, it is quite natural to have
> many classes, and to have "complex objects" which consist of many
> "sub-objects". But when it's about a relational model of the data, they
> say: "Oh look, how complex and clumsy this is - the data is spread over so
> many tables!"
Maybe because of the fact that due to the inheritance relationship of
object you can refer to the sub-objects as an instance of the parent
class. So if properly designed the complexity of the class hierarchy is
not relevant.

> Seems that there is some psychological problem about it. OO
> Programmers feel good if they have all their data in a single "object", no
> matter how complex that objects is and of how many sub-objects it is
> a kind of "object", so they feel uncomfortable if data is spread over
> several rows. Joining these rows together appears a clumsy operation to
> them, though it's logically a simple operation and also not necessarily
> a slow one. (Though it may sometimes be slow in practice)
>
> --
> Rene´ Hartmann
>

Let me please make clear again that I'm neither an OO nor a relational
proponent. I'm just interested in the mix and I am not convinced that
the 'load of tables' is a good idea. But neither am I convinced that my
original idea will turn out very practical either.

According to Fabian Pascal's excellent 'debunking' site an OO type
problem like this can be properly done in an RDBMS, so I'm curious how.

Tony
08-18-04, 04:51
"Laconic2" <laconic2@comcast.net> wrote in message news:<_sGdndMfcq9Xi7_cRVn-qQ@comcast.com>...
> What's wrong with "a load of tables"? If each table relates to a
> "proposition" expressed as a "relation", then if you have a lot of
> propositions, perhaps you ought to have a lot of tables.
>
> It continues to baffle me why people think that terabytes of data is "not
> complex", but a hundred tables or so is beyond human comprehension. But
> that probably reflects the fact that I'm closer to the "data centric"
> mindset than I am to the OO mindset.

I couldn't agree more. And if the issue IS coming from an OO
perspective, than how is having hundreds of classes and subclasses
good while hundreds of tables is bad?

I think one possible issue is this requirement to perform queries like
"select everything that is blue". If we have 1000 different product
types, and of those 500 have a colour attribute, then either (a) we
need a supertype table that pulls together either all 1000 product
types, or just the 500 that have a colour, or (b) we need to UNION
together 500 tables. Of course, this is only an issue if such queries
ever make any sense: e.g. maybe someone with a colour-coordinated
kitchen might want to see all kitchen equipment that comes in blue.

mAsterdam
08-18-04, 05:07
Tony wrote:

> Laconic2 wrote:
>
>>What's wrong with "a load of tables"? If each table relates to a
>>"proposition" expressed as a "relation", then if you have a lot of
>>propositions, perhaps you ought to have a lot of tables.
>>
>>It continues to baffle me why people think that terabytes of data is "not
>>complex", but a hundred tables or so is beyond human comprehension. But
>>that probably reflects the fact that I'm closer to the "data centric"
>>mindset than I am to the OO mindset.
>
> I couldn't agree more. And if the issue IS coming from an OO
> perspective, than how is having hundreds of classes and subclasses
> good while hundreds of tables is bad?

An important difference would be: the hundreds of tables,
and so the complexity, are by default exposed. The subclasses
(and complexitiy) are hidden.

> I think one possible issue is this requirement to perform queries like
> "select everything that is blue". If we have 1000 different product
> types, and of those 500 have a colour attribute, then either (a) we
> need a supertype table that pulls together either all 1000 product
> types, or just the 500 that have a colour, or (b) we need to UNION
> together 500 tables.

.... and hide the base tables.

> Of course, this is only an issue if such queries
> ever make any sense: e.g. maybe someone with a colour-coordinated
> kitchen might want to see all kitchen equipment that comes in blue.

There will be very weird searches possible.

Wolfgang Loch
08-18-04, 06:07
> And if the issue IS coming from an OO perspective, than how
> is having hundreds of classes and subclasses good while
> hundreds of tables is bad?

I think the real question is runtime versus design time. From what I
understand the attributes have to be added dynamically at runtime by
users without assistance from a programmer or DBA.

The challenge from OO perspective is similar to data-centric view.
Neither can't create hundreds of classes at runtime nor you can create
hundreds of tables at runtime. What is need is a dynamic structure that
allows for adding attributes to existsing entities or classes.

In an OO program I would use a Map or Hash data structure. A
corresponding structure is required to persist this Hash into the
database. The database should allow queries by name-value pairs which
returns all objects having a certain value of a user-defined attribute.

Wolfgang

Bernard Peek
08-18-04, 07:46
In message <18c7b3c2.0408131758.65f91710@posting.google.com>, --CELKO--
<jcelko212@earthlink.net> writes
>>> Let us say I have Part# and it has very large number of flexible
>attributes defined by the user, such as color, width, height, etc...
><<
>
>Fire the DBA immediately; users NEVER get to define the database and
>this bum is not doing his job.

It's normal for the attributes of the entity (the metadata) to be known
at design time, and the DBA is required to extract that knowledge from
the users and code it into the database.

There are some situations where that process model fails because the
metadata is unknown at design time, the information is not available to
the users and so can't be encoded as the database structure. The example
I usually quote is the library system. New classifications need to be
added and the system has to be able to cope with it.

This isn't a desirable situation and it's best avoided if possible, but
if you get handed a lemon like this then knowing a recipe for lemonade
is useful.

One solution is to train all of the users up to DBA level and give all
of them CREATE and ALTER TABLE permissions. When a user discovers a new
attribute they just add a new field to the appropriate table. Having
multiple DBAs working on one system is problematic but survivable.

The entity-attribute-value system is an alternative but I don't believe
it is much better, and I speak as one who has built such systems. You
still need every user to have pretty much the same training as a DBA. If
you can't ensure that all of your users are of that calibre then the
system will rapidly become unworkable. I sometimes use an e-a-v system
as part of the process of designing a new system, but I haven't yet
handed over a finished design with it still in place.

If you are going to have the metadata evolving as the database is built
then there is no alternative but to have highly trained users with an
understanding of information management. There are several different
ways of using them but this basic requirement is pretty much
unavoidable.

If your budget won't stretch that far then try to fund a one-way ticket
to Acapulco.



--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author. Will work for money.

Karel Miklav
08-18-04, 08:20
Bernard Peek wrote:
> If you are going to have the metadata evolving as the database is built
> then there is no alternative but to have highly trained users with an
> understanding of information management. There are several different
> ways of using them but this basic requirement is pretty much unavoidable.

There is an alternative. Although you may not have a clue what the
metadata will look like you can implement its functional categories in
your software. Then require users to categorize the metadata as they
enter it.

Principle is the same, but if lots of folks will use this application it
might be easier to add some code than reeducate the whole industry.

Regards,
Karel Miklav

Laconic2
08-18-04, 09:35
"Bernard Peek" <bap@shrdlu.com> wrote in message
news:pXPshxP7CzIBFwKF@shrdlu.com...

> The entity-attribute-value system is an alternative but I don't believe
> it is much better, and I speak as one who has built such systems.You
> still need every user to have pretty much the same training as a DBA. If
> you can't ensure that all of your users are of that calibre then the
> system will rapidly become unworkable. I sometimes use an e-a-v system
> as part of the process of designing a new system, but I haven't yet
> handed over a finished design with it still in place.

Bernard,

Good post. I like the way you outline the trade offs in dealing with a bad
but real situation.

One point I want to make clear about the EAV solution. At the conceptual
level of data abstraction, this is
tantamount to giving the users CREATE TABLE or ADD COLUMN privileges. In
essence, what we are permitting the
users to do is revise and extend the portion of the subject matter modeled
by the database.

That is why, IMO, "You still need every user to have pretty much the same
training as a DBA."

Intelligent use of EAV and intelligent use of CREATE TABLE, ADD COLUMN
require the same level of sophistication concerning mapping the subject
matter into data. At other levels of abstraction they are, of course,
different. But this is what the proponents of EAV sometimes overlook.
People who can't do data modeling can't use EAV.

Laconic2
08-18-04, 10:09
"Tony" <andrewst@onetel.net.uk> wrote in message
news:c0e3f26e.0408180051.2033a4fd@posting.google.c om...

> I couldn't agree more. And if the issue IS coming from an OO
> perspective, than how is having hundreds of classes and subclasses
> good while hundreds of tables is bad?

Good post Tony. I can't tell you the number of times you've said what I was
about to say, only better. I cuts down on the number of responses I post.


In this case, I think we got an answer from an Object perspective: it has
to do with the way inheritance works.
In particular, a specialized object really belongs to more than one class.
It might belong to the class "parts" and also to the class "pneumatic
wheels". A "pneumatic wheel" is just a special class of "part". If we ask
what the "unit price" or the "maximum tire pressure" of an object is,
it's the object's responsibility to figure out whether to answer the
question like a part or like a pneumatic wheel. Therefore the coomplexity
doesn't burden the user of the object. That isn't exactly what the object
centered opinion said, but it's in the same spirit.

You don't get the same level of detail hiding in a database. That is, not
unless you start to use views. You can store data that pertains to all
parts, like unit price, in one table. You can store data that pertains
only to pneumatic wheels, like maximum tire pressure in another table. And
you can provide a view that combines the data together.

But there are objections to that.

>
> I think one possible issue is this requirement to perform queries like
> "select everything that is blue". If we have 1000 different product
> types, and of those 500 have a colour attribute, then either (a) we
> need a supertype table that pulls together either all 1000 product
> types, or just the 500 that have a colour, or (b) we need to UNION
> together 500 tables. Of course, this is only an issue if such queries
> ever make any sense: e.g. maybe someone with a colour-coordinated
> kitchen might want to see all kitchen equipment that comes in blue.

It gets worse. Once a beaureaucracy gets large enough, you get people who
operate on the data without a clue as to what it really means. The
following dialogue is not invented:

"We shouldn't have sent this guy a letter. He's a B73."
"What the hell is a B73?"
"I have no idea. But we never send them any letters."

This really happens. Anybody who never heard an exchange like this has just
never worked in a beaureaucratic environment.
In cases like this you need to either consult an EAV table, or a view that
performs hundreds of unions.

The real problem, for me, is that what I learned, years ago, is to first
model the data, and then build the system.

What most people like Dilip want to do, rightly or wrongly, is build the
system, and then model the data.

Consider the title of this thread: "How to model searchable properties of
an entity". If you build a world wide web first, then want to make
everything searchable, the solution is Google. I'm not sure I have a real
good soultion in the case at hand.
But I'm sure that EAV has a downside, and a real big one.

Laconic2
08-18-04, 10:21
"mAsterdam" <mAsterdam@vrijdag.org> wrote in message
news:41231c36$0$48933$e4fe514c@news.xs4all.nl...

> ... and hide the base tables.

I think you are onto something here. The idea needs work, but it's
potentially so powerful that it could result in a useful synthesis between
object centered thinking and data centered thinking. I encourage you to
develop this further.

> There will be very weird searches possible.

Yes. That's always true. But we should not focus all our design effort on
the most unexpected searches.

If the truly wierd (or perhaps "truly novel") uses of the data take more
time and effort than the expected uses of the data, that's probably
appropriate.

Laconic2
08-18-04, 10:24
"Wolfgang Loch" <spam@wolosoft.com> wrote in message
news:cfv9ou$2ut$1@news.epages.de...
> The challenge from OO perspective is similar to data-centric view.
> Neither can't create hundreds of classes at runtime nor you can create
> hundreds of tables at runtime. What is need is a dynamic structure that
> allows for adding attributes to existsing entities or classes.

Actually, you can do both of those things. It just takes time, at runtime.

The things that you build in at design time (or maybe at build time) are
the things you anticipated.

--CELKO--
08-18-04, 11:30
>> There are some situations where that process model fails because
the metadata is unknown at design time, the information is not
available to the users and so can't be encoded as the database
structure. The example I usually quote is the library system. New
classifications need to be added and the system has to be able to cope
with it. <<

That is a bad example; libraries have been pretty well defined and
encoded for at least the last century. If I need a new
classification, I have the Dewey Decimal framework to look it up if it
already exists, or to get some open slots if it really is brand new.

>> One solution is to train all of the users up to DBA level and give
all of them CREATE and ALTER TABLE permissions. When a user discovers
a new attribute they just add a new field to the appropriate table.
Having multiple DBAs working on one system is problematic but
survivable. <<

Been there; done that. Prostate cancer is also survivable.

>> The entity-attribute-value system is an alternative but I don't
believe it is much better, and I speak as one who has built such
systems. You still need every user to have pretty much the same
training as a DBA. If you can't ensure that all of your users are of
that calibre then the system will rapidly become unworkable. <<

You have to ensure that all the present users are of the highest
calibre and then ensure that all the FUTURE users are too! If you can
make prediction like that, then play the horses and get out of IT.

>> I sometimes use an e-a-v system as part of the process of designing
a new system, but I haven't yet handed over a finished design with it
still in place. <<

If I need a "sand box" database during the design phase, I have found
it is easy for me to write DDL on the fly. If I am gathering very raw
data, a notebook is usually fine. I don't see EAV as even a good
development tool.

>> If you are going to have the metadata evolving as the database is
built then there is no alternative but to have highly trained users
with an understanding of information management. There are several
different ways of using them but this basic requirement is pretty much
unavoidable. <<

I agree. Have you run into the "Agile Database" people yet? They
seem to want to let the programmers design the database on the fly and
get rid of the DBA completely.

Laconic2
08-18-04, 11:57
"--CELKO--" <jcelko212@earthlink.net> wrote in message
news:18c7b3c2.0408180730.484a40bc@posting.google.c om...

> That is a bad example; libraries have been pretty well defined and
> encoded for at least the last century. If I need a new
> classification, I have the Dewey Decimal framework to look it up if it
> already exists, or to get some open slots if it really is brand new.

I disagree. A library is a classic example of building a repository of
knowledge where the structure of the content of the knowledge is only dimly
known at build time. The sum total of human knowledge is nearly four times
what it was when Dewey Decimal was first thought up, and yet it's still
working. Not bad.

Of course there are a few quirks: A tiny portion of the number space was
dedicated to all of "information science", while huge areas of numbers are
reserved for new advances in phlogiston theory. But you can't win them all.

But it is useful to think about the difference in our expectations of "the
DBA" and of "the librarian". We do not expect the same level of control or
accountability from a librarian that we do from a DBA.

Or at least you and I both continue to have high expectations of a DBA.
While I disagree with you from time to time, I think we rarely disagree at
the fundamental level.

Bernard Peek
08-18-04, 13:56
In message <cfvhht02nig@enews3.newsguy.com>, Karel Miklav
<karel@inetis.spppambait.com> writes
>Bernard Peek wrote:
>> If you are going to have the metadata evolving as the database is
>>built then there is no alternative but to have highly trained users
>>with an understanding of information management. There are several
>>different ways of using them but this basic requirement is pretty
>>much unavoidable.
>
>There is an alternative. Although you may not have a clue what the
>metadata will look like you can implement its functional categories in
>your software. Then require users to categorize the metadata as they
>enter it.

As I read it that requires you to know enough about the unknown metadata
to be able to define categories for it. That's how the Dewey
classification system works. You define 10 major categories and fit new
entries into one of them. The problem is that one of these classes is
always "miscellaneous" and that particular bin always ends up a mess.

The difficulty with handling unknowns is that they are unknown in
unknown ways. You can't be certain that the next one will fit into any
of the categories that you created. Unless, of course, you make the
system completely generic. That's the e-a-v system.

>
>Principle is the same, but if lots of folks will use this application
>it might be easier to add some code than reeducate the whole industry.

You can take steps to reduce the effort needed but you can't eliminate
the work completely and all of the users still need to be fully trained.
If you have the ten Dewey classes you still have to train all of your
librarians about boundary conditions. For instance which class does
computing belong in? You have a choice of mathematics, electronics,
logic or miscellaneous.



--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author. Will work for money.

Bernard Peek
08-18-04, 14:05
In message <dqudncaVN9gpx77cRVn-vw@comcast.com>, Laconic2
<laconic2@comcast.net> writes

>People who can't do data modeling can't use EAV.

Regrettably it doesn't stop them from trying. As Joe Celko points out,
you can get away with it for a while but it will probably collapse in a
heap. Librarians have successfully made a system like this work but it
takes a prodigious amount of effort. For a start all of the users are
likely to have at least two degrees, one of them in library science
itself. Secondly they have an auditing system in place that analyses new
additions and vets them for acceptability, then propagates metadata to
every other librarian that needs it.

Some 20 years ago I invented a way of building and maintaining an e-a-v
database using similar techniques. Now I have the Internet and WWW it
could be built. There are still situations where something like this
could be useful but I doubt that there is space in the market for more
than one.



--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author. Will work for money.

Bernard Peek
08-18-04, 14:23
In message <18c7b3c2.0408180730.484a40bc@posting.google.com>, --CELKO--
<jcelko212@earthlink.net> writes
>>> There are some situations where that process model fails because
>the metadata is unknown at design time, the information is not
>available to the users and so can't be encoded as the database
>structure. The example I usually quote is the library system. New
>classifications need to be added and the system has to be able to cope
>with it. <<
>
>That is a bad example; libraries have been pretty well defined and
>encoded for at least the last century. If I need a new
>classification, I have the Dewey Decimal framework to look it up if it
>already exists, or to get some open slots if it really is brand new.

I've explained Dewey's limitations in another post in this thread.

>
>>> One solution is to train all of the users up to DBA level and give
>all of them CREATE and ALTER TABLE permissions. When a user discovers
>a new attribute they just add a new field to the appropriate table.
>Having multiple DBAs working on one system is problematic but
>survivable. <<
>
>Been there; done that. Prostate cancer is also survivable.

Heh! I occasionally have to remind people that not everything is
improved by being done gigitally.

>
>>> The entity-attribute-value system is an alternative but I don't
>believe it is much better, and I speak as one who has built such
>systems. You still need every user to have pretty much the same
>training as a DBA. If you can't ensure that all of your users are of
>that calibre then the system will rapidly become unworkable. <<
>
>You have to ensure that all the present users are of the highest
>calibre and then ensure that all the FUTURE users are too! If you can
>make prediction like that, then play the horses and get out of IT.

Regrettably the IT industry has retreated from me. Anyone want to hire a
DBA in the UK?

>
>>> I sometimes use an e-a-v system as part of the process of designing
>a new system, but I haven't yet handed over a finished design with it
>still in place. <<
>
>If I need a "sand box" database during the design phase, I have found
>it is easy for me to write DDL on the fly. If I am gathering very raw
>data, a notebook is usually fine. I don't see EAV as even a good
>development tool.

I worked on the design of a CASE tool and an e-a-v system was part of
the metadata repository.

>
>>> If you are going to have the metadata evolving as the database is
>built then there is no alternative but to have highly trained users
>with an understanding of information management. There are several
>different ways of using them but this basic requirement is pretty much
>unavoidable. <<
>
>I agree. Have you run into the "Agile Database" people yet? They
>seem to want to let the programmers design the database on the fly and
>get rid of the DBA completely.

I haven't. I'd probably stroke my beard and smile. I was there 20 years
ago. I'm currently monitoring a project where the database is being
designed by a java programmer. He doesn't see the point of putting
business rules in the database because it's easier to code them in the
application.




--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author. Will work for money.

Bernard Peek
08-18-04, 14:33
In message <_sGdndMfcq9Xi7_cRVn-qQ@comcast.com>, Laconic2
<laconic2@comcast.net> writes


>I usually end up with everybody mad at me, and ready to roast me. I'm a
>little hesitant to try it again.

I get that reaction when I tell the OO people that the concept of class
is a fundamental error in OO thinking.


>
>> I for one would be really interested in another way of doing this
>> without having to define a load of tables or 1 table with a load of
>> columns which will be mostly NULL by nature.
>
>What's wrong with "a load of tables"? If each table relates to a
>"proposition" expressed as a "relation", then if you have a lot of
>propositions, perhaps you ought to have a lot of tables.

Absolutely. The data drives the design. If the data is complex then
learn to deal with it.



--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author. Will work for money.

Bernard Peek
08-18-04, 15:00
In message <c0e3f26e.0408160611.70207f65@posting.google.com>, Tony
<andrewst@onetel.net.uk> writes


>> Also, I may have 500 such properties and I can have up to 1M different
>> part numnbers. This will give you some idea of complexity.
>> User may choose, show me all the parts with width 10 inches, and price
>> less than $10 and .... can go on for ever. I need to come back in
>> resonable time (Reasonable can be couple of seconds)
>
>These cannot be realistic requirements! What user is going to be
>looking for anything 10 inches wide and costing under $10? Unless
>they just want something to use for a door-stop, and don't care
>whether it is a hammer or a toaster...

You are making value judgements about the data without knowing what the
users' requirements may be. As a DBA you should aim to understand your
data well enough to justify those value judgements. But until you have
verified the requirements with the relevant user you have to avoid
jumping to unjustified conclusions.

You can't know that nobody will ever perform a search which includes
size < 10" and cost <$10. Someone somewhere may urgently want a doorstop
and not care whether it is a hammer or a toaster as long as it is
available immediately.



--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author. Will work for money.

pstnotpd
08-19-04, 01:54
--CELKO-- wrote:
>>>The entity-attribute-value system is an alternative but I don't
> believe it is much better, and I speak as one who has built such
> systems. You still need every user to have pretty much the same
> training as a DBA. If you can't ensure that all of your users are of
> that calibre then the system will rapidly become unworkable. <<
Why? As I understand it the point of EAV is to build the user interface
to represent the logical data model. You wouldn't expect 'users' to
query directly on the physical model would you?

Karel Miklav
08-19-04, 06:45
Bernard Peek wrote:
> In message <cfvhht02nig@enews3.newsguy.com>, Karel Miklav
> <karel@inetis.spppambait.com> writes
>> Bernard Peek wrote:
>>> If you are going to have the metadata evolving as the database is
>>> built then there is no alternative but to have highly trained users
>>> with an understanding of information management. There are several
>>> different ways of using them but this basic requirement is pretty
>>> much unavoidable.
>>
>> There is an alternative. Although you may not have a clue what the
>> metadata will look like you can implement its functional categories in
>> your software. Then require users to categorize the metadata as they
>> enter it.
>
> As I read it that requires you to know enough about the unknown metadata
> to be able to define categories for it. That's how the Dewey
> classification system works. You define 10 major categories and fit new
> entries into one of them. The problem is that one of these classes is
> always "miscellaneous" and that particular bin always ends up a mess.

Don't know about libraries but people classified things into discrete
categories from ancient times. I can see the problem of the
"miscellaneous" category but it's just a design error itself. Like if
you're trying to patch a roof with a sieve in case some rain might want
to come in. If you have some time to waste I suggest reading structural
anthropology, i.e. Claude Levi-Strauss.

> The difficulty with handling unknowns is that they are unknown in

No, they're not, theoretically. People can not perceive uncategorizable
miracles (ok, some can :), neither they can enter them into information
systems. I'm talking about functional categories and if you don't know
what is the use of data you're collecting then why bother, there are
random generators.

> unknown ways. You can't be certain that the next one will fit into any
> of the categories that you created. Unless, of course, you make the
> system completely generic. That's the e-a-v system.

Not generic but abstract. Users can not define functional categories on
the fly, unless you're one level more abstract then them.

>> Principle is the same, but if lots of folks will use this application
>> it might be easier to add some code than reeducate the whole industry.
>
> You can take steps to reduce the effort needed but you can't eliminate
> the work completely and all of the users still need to be fully trained.
> If you have the ten Dewey classes you still have to train all of your
> librarians about boundary conditions. For instance which class does
> computing belong in? You have a choice of mathematics, electronics,
> logic or miscellaneous.

That is simple, because users already know their domain, but are usually
unwilling to put off things for some loooong lasting IT torture. I'm
suggesting to build some abstraction rather than drop users on bare
bones of the database. But hey, we al know that :)

Regards,
Karel Miklav

pstnotpd
08-19-04, 06:47
> It reminds me of the C language.
>
> The greatest advantage of the C language is that in the hands of a
> competent programmer it can do anything.
>
> The greatest disadvantage of the C language is that in the hands of an
> incompetent programmer it can do anything.

Can't argue with that :-)

Tony
08-19-04, 07:38
"Laconic2" <laconic2@comcast.net> wrote in message news:<ybGdnVI2P9IS_77cRVn-jA@comcast.com>...
> "Tony" <andrewst@onetel.net.uk> wrote in message
> news:c0e3f26e.0408180051.2033a4fd@posting.google.c om...
>
> > I couldn't agree more. And if the issue IS coming from an OO
> > perspective, than how is having hundreds of classes and subclasses
> > good while hundreds of tables is bad?
>
> Good post Tony. I can't tell you the number of times you've said what I was
> about to say, only better. I cuts down on the number of responses I post.

Thanks - but in fact I often think the same (well, vice versa!)

> In this case, I think we got an answer from an Object perspective: it has
> to do with the way inheritance works.
<SNIP>

Yes, we have. Mind you, the OP is talking about allowing users to
define new product types with new attributes over time, and I don't
imagine an OO designer wants the users defining new classes "on the
fly" any more than we want the users defining new tables.

> It gets worse. Once a beaureaucracy gets large enough, you get people who
> operate on the data without a clue as to what it really means. The
> following dialogue is not invented:
>
> "We shouldn't have sent this guy a letter. He's a B73."
> "What the hell is a B73?"
> "I have no idea. But we never send them any letters."

I love that!

> The real problem, for me, is that what I learned, years ago, is to first
> model the data, and then build the system.
> What most people like Dilip want to do, rightly or wrongly, is build the
> system, and then model the data.

Me too, and I'm lucky enough to be working in an environment where
that is still practiced reasonably well. But I have a friend who
works as a DBA for a large oil company that uses Smalltalk and Oracle,
and she gets her requirements from the Smalltalk team: "we've designed
this new set of screens for <whatever>, and we need you to create some
tables to save the data into". Most tables (created before she
joined) have no constraints of any kind, and the majority of her time
seems to be spent writing scripts to find and fix corrupt data. But I
digress...

> But I'm sure that EAV has a downside, and a real big one.

Yup. And I must admit, maybe I have just been lucky (again) never to
have come up against such a requirement. I do remember once in my
early days in relational databases seeing a database design for a
local government inventory system. It had separate tables called
things like street_lamp, post_box, telegraph_pole, etc. etc. I
remember saying to a colleague "that's crazy, defining a separate
table for each item type". Now I seem to have been advocating that
approach here! Actually, I think my inclination now would probably
still be more pragmatic: for the most part, all those inventory items
can be considered similar: their most important attributes would be
stuff like location (grid ref), cost, date installed, date last
inspected, make, model, etc. common to all. That data should probably
reside in a single table. With a "notes" attribute, there really
might not be a need to hold further item-specific information for 80%
of the item types. Of course, I am probably over-simplifying
drastically here.

Bernard Peek
08-19-04, 07:40
In message <cg1fa1$52m$1@voyager.news.surf.net>, pstnotpd
<schoenmakers@tpd.tno.nl> writes
>--CELKO-- wrote:
>>>>The entity-attribute-value system is an alternative but I don't
>> believe it is much better, and I speak as one who has built such
>> systems. You still need every user to have pretty much the same
>> training as a DBA. If you can't ensure that all of your users are of
>> that calibre then the system will rapidly become unworkable. <<
>Why? As I understand it the point of EAV is to build the user interface
>to represent the logical data model. You wouldn't expect 'users' to
>query directly on the physical model would you?

He was quoting me there. The point of EAV is to build a database without
a logical data model. You don't need one to build an EAV system.

It reminds me of the C language.

The greatest advantage of the C language is that in the hands of a
competent programmer it can do anything.

The greatest disadvantage of the C language is that in the hands of an
incompetent programmer it can do anything.



--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author. Will work for money.

Tony
08-19-04, 07:58
pstnotpd <schoenmakers@tpd.tno.nl> wrote in message news:<cg1fa1$52m$1@voyager.news.surf.net>...
> --CELKO-- wrote:
> >>>The entity-attribute-value system is an alternative but I don't
> > believe it is much better, and I speak as one who has built such
> > systems. You still need every user to have pretty much the same
> > training as a DBA. If you can't ensure that all of your users are of
> > that calibre then the system will rapidly become unworkable. <<
> Why? As I understand it the point of EAV is to build the user interface
> to represent the logical data model. You wouldn't expect 'users' to
> query directly on the physical model would you?

Actually, Mr Celko didn't say that, he was quoting someone else in a
previous post. I think "DBA" here is meant in the "database designer"
sense: with EAV, if you allow users to define new entity types and
attribute types, then they must also define all the integrity rules.
Essentially, the user IS the (logical) database designer. This has
nothing to do with the physical level.

It is easy to imagine this becoming a mess. For example, suppose you
already have the product "light bulb" in your EAV database, and one of
the attributes is "fitting type", with values "wide bayonet", "narrow
bayonet", "wide screw", "narrow screw" (let's suppose it's as simple
as that). Then after a couple of years you start selling lamps; one
of the attributes of lamp is "bulb fitting type" with the same values
as above (of course). Of course, you will use the same "fitting type"
domain we already have, right? Assuming the user is diligent enough
to do so, they are following the same kind of thinking as an "analyst"
or a "database designer". Chances are quite high they'll just create
a new domain, with slightly different codes and names.

Laconic2
08-19-04, 08:54
"Tony" <andrewst@onetel.net.uk> wrote in message
news:c0e3f26e.0408190338.288380cc@posting.google.c om...
> "Laconic2" <laconic2@comcast.net> wrote in message
news:<ybGdnVI2P9IS_77cRVn-jA@comcast.com>...
> > "Tony" <andrewst@onetel.net.uk> wrote in message
> > news:c0e3f26e.0408180051.2033a4fd@posting.google.c om...
> > "We shouldn't have sent this guy a letter. He's a B73."
> > "What the hell is a B73?"
> > "I have no idea. But we never send them any letters."
>
> I love that!

How about:

"We shouldn't have sent this guy a letter. He's a B73."
"What the hell is a B73?"
"Say, you're new around here, aren't you?"

or:

"We shouldn't have sent this guy a letter. He's a B73."
"What the hell is a B73?"
"Hey! You're supposed to be the database guy! And you're asking me?"

Laconic2
08-19-04, 09:19
"Tony" <andrewst@onetel.net.uk> wrote in message
news:c0e3f26e.0408190338.288380cc@posting.google.c om...

> Me too, and I'm lucky enough to be working in an environment where
> that is still practiced reasonably well. But I have a friend who
> works as a DBA for a large oil company that uses Smalltalk and Oracle,
> and she gets her requirements from the Smalltalk team: "we've designed
> this new set of screens for <whatever>, and we need you to create some
> tables to save the data into". Most tables (created before she
> joined) have no constraints of any kind, and the majority of her time
> seems to be spent writing scripts to find and fix corrupt data. But I
> digress...

I don't think that's a digression at all. When the Smalltalk team says,
"We've designed this new set of screens",
that as close as that culture is going to get to a "conceptual data model".
It's not that the Smalltalk people have small minds. They think in
concepts, but they are not used to projecting their concepts onto the flat
screen of "data". So, instead, they project their concepts onto the flat
screen of "set of screens".

BTW, I've worked in environments where the shoe was on the other foot:
"We've designed this set of tables to hold the data we need. Now go off and
build a set of screens to capture it." It turns out that this is a
nightmare for the screen designer, because a set of tables isn't really a
conceptual data model, either.

The better way is to start from the requirements and derive a conceptual
data model, and a companion conceptual process model. Then, everything
else gets derived and/or designed from there. I'm convinced that OOA or UML
can express both the conceptual process model and the conceptual data model
in a single model, but I can't verify that from my own experience.


> Yup. And I must admit, maybe I have just been lucky (again) never to
> have come up against such a requirement. I do remember once in my
> early days in relational databases seeing a database design for a
> local government inventory system. It had separate tables called
> things like street_lamp, post_box, telegraph_pole, etc. etc. I
> remember saying to a colleague "that's crazy, defining a separate
> table for each item type". Now I seem to have been advocating that
> approach here!

Somewhere in this long discussion, someone asked "why have a load of tables
for what is basically the same abstraction?"

That question deserves a more thoughtful answer than I have been able to
give. Before you can deal with that question, you have to be able to
answer the question "when are two abstractions the same abstraction?" This
sounds like an easy question, but it's not.


With regard to "parts" , there is a level of abstraction at which they are
all the same. If I want to order a wheel and a tie rod,
I probably have to know exactly the same data about each to fill out the
order form. I think you've outlined that data below.

But if I'm diagnosing to determine whether a new wheel or a new tie rod (or
both) will fix the problem, I probably need to treat wheels and tie rods as
different types of entities, because they play different roles in the
diagnosis model.

Diagnosis and order taking are both "valid" ways of looking at the data.
Depending on the requirements, the system may have to support both.

>Actually, I think my inclination now would probably
> still be more pragmatic: for the most part, all those inventory items
> can be considered similar: their most important attributes would be
> stuff like location (grid ref), cost, date installed, date last
> inspected, make, model, etc. common to all. That data should probably
> reside in a single table. With a "notes" attribute, there really
> might not be a need to hold further item-specific information for 80%
> of the item types. Of course, I am probably over-simplifying
> drastically here.

The problem I have with "notes" is that "notes" are not data. Invariably,
someone down the road asks, "why can't you just search the notes field?"
If you make the mistake of trying to explain the difference between parsing
natural language and scanning structured data, you are apt to be dismissed
as a geek that doesn't understand people very well.

Laconic2
08-19-04, 09:29
"Bernard Peek" <bap@shrdlu.com> wrote in message
news:GXRUQf9QJIJBFwXx@shrdlu.com...
> In message <cg1fa1$52m$1@voyager.news.surf.net>, pstnotpd

> He was quoting me there. The point of EAV is to build a database without
> a logical data model. You don't need one to build an EAV system.

Excellent point!

>
> It reminds me of the C language.
>
> The greatest advantage of the C language is that in the hands of a
> competent programmer it can do anything.
>
> The greatest disadvantage of the C language is that in the hands of an
> incompetent programmer it can do anything.

Way back in 1980, I decided to learn a new "structured" language, instead
of ALGOL. Because of what was available on the machine I was moving towards
(a VAX), I chose PASCAL.

I got a lot of contempt from C programmers, who told me that "strong typing
is for sissies".

They got to learn how the debugger works, real well. I had already learned
that in an earlier lifetime, so I didn't want to learn it again. Instead,
I spent the time learning data normalization.

Now people are telling me that "normalization is for sissies".

Laconic2
08-19-04, 10:17
"Tony" <andrewst@onetel.net.uk> wrote in message
news:c0e3f26e.0408190358.1e4c234a@posting.google.c om...
> pstnotpd <schoenmakers@tpd.tno.nl> wrote in message
news:<cg1fa1$52m$1@voyager.news.surf.net>...
> > --CELKO-- wrote:

> It is easy to imagine this becoming a mess.

It's hard to imagine it NOT becoming a mess. Essentially the logical model
of an EAV is in the hands of a committee: the comittee of users.

This commitee never meets, and its members are not accountable to each
other, nor to its originator. Worst of all, the members aren't accountable
to the future users.

I think that 99 times out of a hundred, they will make a mess.

Gene Wirchenko
08-19-04, 15:26
"Laconic2" <laconic2@comcast.net> wrote:

>"Tony" <andrewst@onetel.net.uk> wrote in message
>news:c0e3f26e.0408190358.1e4c234a@posting.google.c om...
>> pstnotpd <schoenmakers@tpd.tno.nl> wrote in message
>news:<cg1fa1$52m$1@voyager.news.surf.net>...
>> > --CELKO-- wrote:
>
>> It is easy to imagine this becoming a mess.
>
>It's hard to imagine it NOT becoming a mess. Essentially the logical model

That was my thought, too.

>of an EAV is in the hands of a committee: the comittee of users.
>
>This commitee never meets, and its members are not accountable to each
>other, nor to its originator. Worst of all, the members aren't accountable
>to the future users.

Even more worst <G>, they are not aware that they should be
accountable.

>I think that 99 times out of a hundred, they will make a mess.

Optimist.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

Gene Wirchenko
08-19-04, 15:26
"Laconic2" <laconic2@comcast.net> wrote:

>"Tony" <andrewst@onetel.net.uk> wrote in message
>news:c0e3f26e.0408190338.288380cc@posting.google.c om...
>> "Laconic2" <laconic2@comcast.net> wrote in message
>news:<ybGdnVI2P9IS_77cRVn-jA@comcast.com>...
>> > "Tony" <andrewst@onetel.net.uk> wrote in message
>> > news:c0e3f26e.0408180051.2033a4fd@posting.google.c om...
>> > "We shouldn't have sent this guy a letter. He's a B73."
>> > "What the hell is a B73?"
>> > "I have no idea. But we never send them any letters."
>>
>> I love that!
>
>How about:
>
> "We shouldn't have sent this guy a letter. He's a B73."
> "What the hell is a B73?"
> "Say, you're new around here, aren't you?"
>
>or:
>
> "We shouldn't have sent this guy a letter. He's a B73."
> "What the hell is a B73?"
> "Hey! You're supposed to be the database guy! And you're asking me?"

Or:

"We shouldn't have sent this guy a letter. He's a B73."
"What the hell is a B73?"
"Oh, we amalgamated N45 and N46 into B73."

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

Bernard Peek
08-19-04, 15:38
In message <c0e3f26e.0408190338.288380cc@posting.google.com>, Tony
<andrewst@onetel.net.uk> writes

>Yes, we have. Mind you, the OP is talking about allowing users to
>define new product types with new attributes over time, and I don't
>imagine an OO designer wants the users defining new classes "on the
>fly" any more than we want the users defining new tables.

That's dynamic reclassification. I've mentioned here before that I'm
looking for an OO database that supports dynamic reclassification and
multiple inheritance.




--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author. Will work for money.

Bernard Peek
08-19-04, 16:05
In message <yL-dnWnPUNPANbncRVn-hA@comcast.com>, Laconic2
<laconic2@comcast.net> writes

>The better way is to start from the requirements and derive a conceptual
>data model, and a companion conceptual process model.

Then add an event model and you have SSADM.

[...]

>Somewhere in this long discussion, someone asked "why have a load of tables
>for what is basically the same abstraction?"
>
>That question deserves a more thoughtful answer than I have been able to
>give. Before you can deal with that question, you have to be able to
>answer the question "when are two abstractions the same abstraction?" This
>sounds like an easy question, but it's not.

The glib answer to the question is "because there are many entities in
the ERD."

My answer to people who want to move the physical design away from the
logical data structure is to assert that there is always a price to pay
for doing that.

>
>With regard to "parts" , there is a level of abstraction at which they are
>all the same. If I want to order a wheel and a tie rod,<