# Thread: Celko: Best fit query or T-Join

1. Registered User
Join Date
Jan 2013
Posts
355
Provided Answers: 1

## Unanswered: Celko: Best fit query or T-Join

Dr. Codd's T-Join

In the Second Version of the relational model in 1990, Dr. E. F. Codd introduced a set of new theta operators, called T-operators, which were based on the idea of a best-fit or approximate equality (Codd 1990). The algorithm for the operators is easier to understand with an example modified from Dr. Codd.

The problem is to assign the classes to the available classrooms. We want (class_size < or <= room_size) to be true after the assignments are made. The first < will allow us a few empty seats in each room for late students; the <= can have exact matches.

Th naive approaches Codd gave were: (1) sort the tables in ascending order by classroom size and then matched the number of students in a class. (2) sort the tables in descending order by classroom size and then matched the number of students in a class.

We start with the following tables:

CREATE TABLE Rooms
(room_nbr CHAR(2) NOT NULL PRIMARY KEY,
room_size INTEGER NOT NULL
CHECK (room_size > 0));

INSERT INTO Rooms
VALUES
('r1', 70),
('r2', 40),
('r3', 50),
('r4', 85),
('r5', 30),
('r6', 65),
('r7', 55);

CREATE TABLE Classes
(class_nbr CHAR(2) NOT NULL PRIMARY KEY,
class_size INTEGER NOT NULL
CHECK (class_size > 0));

INSERT INTO Classes (class_nbr, class_size)
VALUES
('c1', 80),
('c2', 70),
('c3', 65),
('c4', 55),
('c5', 50),
('c6', 40);

The “best fit” problem is to assign classes to rooms with the fewest empty chairs in the room and nobody standing. The starting point is to find all the rooms that can hold a particular class.

SELECT room_nbr, class_nbr, (room_size - class_size) AS fit
FROM Classes, Rooms
WHERE class_size <= room_size;

Now, let's try to find the best fits:

WITH RC --- all legal pairs
AS
(SELECT *, (room_size - class_size) AS fit
FROM Classes, Rooms
WHERE class_size <= room_size)

SELECT * -- best fit
FROM RC
WHERE fit = (SELECT MIN(fit)
FROM RC AS RCa
WHERE RC.class_nbr = RCa.class_nbr);

Results
==============
c1 80 r4 85 5
c2 70 r1 70 0
c3 65 r6 65 0
c4 55 r7 55 0
c5 50 r3 50 0
c6 40 r2 40 0

This is very good, but it depends on the data! Slight change in the legal pairs CTE or the data, to allow for spare seating, will invalidate the results.

WITH RC --- all legal pairs with spare room
AS
(SELECT *, (room_size - class_size) AS fit
FROM Classes, Rooms
WHERE class_size < room_size)

SELECT * -- best fit
FROM RC
WHERE fit = (SELECT MIN(fit)
FROM RC AS RCa
WHERE RC.class_nbr = RCa.class_nbr);

Results
=================
c1 80 r4 85 5 ◄ Opps!
c2 70 r4 85 15 ◄ Opps! Duplicate room number!
c3 65 r1 70 5
c4 55 r6 65 10
c5 50 r7 55 5
c6 40 r3 50 10

Anyone want to try for general solution?

2. Registered User
Join Date
Mar 2003
Posts
280
Originally Posted by Celko
Dr. Codd's T-Join

In the Second Version of the relational model in 1990, Dr. E. F. Codd introduced a set of new theta operators, called T-operators, which were based on the idea of a best-fit or approximate equality (Codd 1990). The algorithm for the operators is easier to understand with an example modified from Dr. Codd.

The problem is to assign the classes to the available classrooms. We want (class_size < or <= room_size) to be true after the assignments are made. The first < will allow us a few empty seats in each room for late students; the <= can have exact matches.

Th naive approaches Codd gave were: (1) sort the tables in ascending order by classroom size and then matched the number of students in a class. (2) sort the tables in descending order by classroom size and then matched the number of students in a class.

We start with the following tables:

CREATE TABLE Rooms
(room_nbr CHAR(2) NOT NULL PRIMARY KEY,
room_size INTEGER NOT NULL
CHECK (room_size > 0));

INSERT INTO Rooms
VALUES
('r1', 70),
('r2', 40),
('r3', 50),
('r4', 85),
('r5', 30),
('r6', 65),
('r7', 55);

CREATE TABLE Classes
(class_nbr CHAR(2) NOT NULL PRIMARY KEY,
class_size INTEGER NOT NULL
CHECK (class_size > 0));

INSERT INTO Classes (class_nbr, class_size)
VALUES
('c1', 80),
('c2', 70),
('c3', 65),
('c4', 55),
('c5', 50),
('c6', 40);

The “best fit” problem is to assign classes to rooms with the fewest empty chairs in the room and nobody standing. The starting point is to find all the rooms that can hold a particular class.

SELECT room_nbr, class_nbr, (room_size - class_size) AS fit
FROM Classes, Rooms
WHERE class_size <= room_size;

Now, let's try to find the best fits:

WITH RC --- all legal pairs
AS
(SELECT *, (room_size - class_size) AS fit
FROM Classes, Rooms
WHERE class_size <= room_size)

SELECT * -- best fit
FROM RC
WHERE fit = (SELECT MIN(fit)
FROM RC AS RCa
WHERE RC.class_nbr = RCa.class_nbr);

Results
==============
c1 80 r4 85 5
c2 70 r1 70 0
c3 65 r6 65 0
c4 55 r7 55 0
c5 50 r3 50 0
c6 40 r2 40 0

This is very good, but it depends on the data! Slight change in the legal pairs CTE or the data, to allow for spare seating, will invalidate the results.

WITH RC --- all legal pairs with spare room
AS
(SELECT *, (room_size - class_size) AS fit
FROM Classes, Rooms
WHERE class_size < room_size)

SELECT * -- best fit
FROM RC
WHERE fit = (SELECT MIN(fit)
FROM RC AS RCa
WHERE RC.class_nbr = RCa.class_nbr);

Results
=================
c1 80 r4 85 5 ◄ Opps!
c2 70 r4 85 15 ◄ Opps! Duplicate room number!
c3 65 r1 70 5
c4 55 r6 65 10
c5 50 r7 55 5
c6 40 r3 50 10

Anyone want to try for general solution?
For the trivial case (distinct room and class sizes) I believe it is sufficient to discalify rows where there exists a better fit:

Code:
```SELECT c.*, r.*, (room_size - class_size) AS fit
FROM Classes c, Rooms r
WHERE class_size <= room_size
and not exists (
select 1 from classes c2
where c2.class_nbr <> c.class_nbr
and c2.class_size between c.class_size and r.room_size
)```
If classes can have equal and size, dito for rooms we need to pick one over another as a best fit. Seems reasonable to enumerate each set with:

Code:
`row_number () over (partition over size order by <p.k.>)`
and use that in a predicate.
Last edited by lelle12; 01-21-13 at 06:38. Reason: wrong order of class_size, room_size in exists clause

3. Registered User
Join Date
Jan 2013
Posts
355
Provided Answers: 1
I tried ordering the rooms and the classes by size with ROW_NUMBER(), then match the two lists in sequence at the first legal matchingposition. It does not give an optimal solution. Try this data set:

To see how this will work, let's add another column to the table of legal (class_nbr, room_nbr) pairs to see how well they fit.

WITH Pairs (room_nbr, class_nbr, fit)
AS
(SELECT R.room_nbr, C.class_nbr,
(R.room_size - C.class_size)
FROM Rooms AS R, Classes AS C
WHERE C.class_size <= R.room_size)

SELECT P1.room_nbr, P1.class_nbr, fit
FROM Pairs AS P1
WHERE P1.fit
= (SELECT MIN(P2.fit)
FROM Pairs AS P2
WHERE P1.room_nbr = P2.room_nbr);

r01 c04 2
r02 c04 1
r03 c06 5
r04 c06 4
r05 c09 2
r06 c12 5
r07 c13 5
r08 c13 4
r09 c15 5
r10 c15 4
r11 c18 5

This time we did not have an exact fit, so let's look for (fit = 1) into a working table and remove 'r02' and 'c04' from their tables. The second step is

r02 c04 1

We can now remove the best fit rooms and classes, and look the next set of remaining best fits.

WITH Pairs (room_nbr, class_nbr, fit)
AS
(SELECT R.room_nbr, C.class_nbr,
(R.room_size - C.class_size)
FROM Rooms AS R, Classes AS C
WHERE C.class_size <= R.room_size)

(SELECT P1.room_nbr, P1.class_nbr, fit
FROM Pairs AS P1
WHERE P1.fit = 2);

r05 c09 2

Here is the step for a fit of 3, 4, 5 and 6

r01 c05 3

r04 06 4
r08 c13 4
r10 c15 4

r06 c12 5
r11 c18 5

r03 c07 6
r07 c14 6
r09 c16 6

At this point, the original tables of rooms and classes cannot be paired. Notice that as we removed rooms and classes, the fit numbers changed. This is a characteristic of greedy algorithms; taking the “big bites” can leave sub-optimal leftovers on the plate.

If you had put the two sorted lists together and matched them to the first fit, then you get ('r01', 'c04') which has a fit of 2. But when we match ('r02', 'c04') we have a fit of 1.

4. Registered User
Join Date
Mar 2003
Posts
280
Originally Posted by Celko
I tried ordering the rooms and the classes by size with ROW_NUMBER(), then match the two lists in sequence at the first legal matchingposition. It does not give an optimal solution. Try this data set:
I don't see a set here so I assume you missed pasting it. However reading the rest of your post I get the feeling that you would like to find an optimal solution for packing classes into rooms. I might have misunderstood, but AFAIK know what you describe is the knapsack problem ( Knapsack problem - Wikipedia, the free encyclopedia ). I wish I had a polynomial time solution for that, but I don't ;-) Perhaps there is a constraint I miss that makes the problem easier than KSP, is there?

5. Registered User
Join Date
Jan 2013
Posts
355
Provided Answers: 1
INSERT INTO Classes (class_nbr, class_size)
VALUES
('c01', 106), ('c02', 105), ('c03', 104), ('c04', 100), ('c05', 99),
('c06', 90), ('c07', 89), ('c08', 88), ('c09', 83), ('c10', 82),
('c11', 81), ('c12', 65), ('c13', 50), ('c14', 49), ('c15', 30),
('c16', 29), ('c17', 28), ('c18', 20), ('c19', 19);

INSERT INTO Rooms (room_nbr, room_size)
VALUES
('r01', 102), ('r02', 101), ('r03', 95), ('r04', 94), ('r05', 85),
('r06', 70), ('r07', 55), ('r08', 54), ('r09', 35), ('r10', 34),
('r11', 25), ('r12', 18);

To see how this will work, let's add another column to the table of legal (class_nbr, room_nbr) pairs to see how well they fit.

WITH Pairs (room_nbr, class_nbr, fit)
AS
(SELECT R.room_nbr, C.class_nbr,
(R.room_size - C.class_size)
FROM Rooms AS R, Classes AS C
WHERE C.class_size <= R.room_size)

SELECT P1.room_nbr, P1.class_nbr, fit
FROM Pairs AS P1
WHERE P1.fit
= (SELECT MIN(P2.fit)
FROM Pairs AS P2
WHERE P1.room_nbr = P2.room_nbr);

r01 c04 2
r02 c04 1
r03 c06 5
r04 c06 4
r05 c09 2
r06 c12 5
r07 c13 5
r08 c13 4
r09 c15 5
r10 c15 4
r11 c18 5

This time we did not have an exact fit, so let's look for (fit = 1) into a working table and remove 'r02' and 'c04' from their tables. The second step is

r02 c04 1

We can now remove the best fit rooms and classes, and look the next set of remaining best fits.

WITH Pairs (room_nbr, class_nbr, fit)
AS
(SELECT R.room_nbr, C.class_nbr,
(R.room_size - C.class_size)
FROM Rooms AS R, Classes AS C
WHERE C.class_size <= R.room_size)

(SELECT P1.room_nbr, P1.class_nbr, fit
FROM Pairs AS P1
WHERE P1.fit = 2);

r05 c09 2

Here is the step for a fit of 3, 4, 5 and 6

r01 c05 3

r04 c06 4
r08 c13 4
r10 c15 4

r06 c12 5
r11 c18 5

r03 c07 6
r07 c14 6
r09 c16 6

At this point, the original tables of rooms and classes cannot be paired. Notice that as we removed rooms and classes, the fit numbers changed. This is a characteristic of greedy algorithms; taking the “big bites” can leave sub-optimal leftovers on the plate.

6. Registered User
Join Date
Mar 2003
Posts
280
Originally Posted by Celko
INSERT INTO Classes (class_nbr, class_size)
VALUES
('c01', 106), ('c02', 105), ('c03', 104), ('c04', 100), ('c05', 99),
('c06', 90), ('c07', 89), ('c08', 88), ('c09', 83), ('c10', 82),
('c11', 81), ('c12', 65), ('c13', 50), ('c14', 49), ('c15', 30),
('c16', 29), ('c17', 28), ('c18', 20), ('c19', 19);

INSERT INTO Rooms (room_nbr, room_size)
VALUES
('r01', 102), ('r02', 101), ('r03', 95), ('r04', 94), ('r05', 85),
('r06', 70), ('r07', 55), ('r08', 54), ('r09', 35), ('r10', 34),
('r11', 25), ('r12', 18);

To see how this will work, let's add another column to the table of legal (class_nbr, room_nbr) pairs to see how well they fit.

WITH Pairs (room_nbr, class_nbr, fit)
AS
(SELECT R.room_nbr, C.class_nbr,
(R.room_size - C.class_size)
FROM Rooms AS R, Classes AS C
WHERE C.class_size <= R.room_size)

SELECT P1.room_nbr, P1.class_nbr, fit
FROM Pairs AS P1
WHERE P1.fit
= (SELECT MIN(P2.fit)
FROM Pairs AS P2
WHERE P1.room_nbr = P2.room_nbr);

r01 c04 2
r02 c04 1
r03 c06 5
r04 c06 4
r05 c09 2
r06 c12 5
r07 c13 5
r08 c13 4
r09 c15 5
r10 c15 4
r11 c18 5

This time we did not have an exact fit, so let's look for (fit = 1) into a working table and remove 'r02' and 'c04' from their tables. The second step is

r02 c04 1

We can now remove the best fit rooms and classes, and look the next set of remaining best fits.

WITH Pairs (room_nbr, class_nbr, fit)
AS
(SELECT R.room_nbr, C.class_nbr,
(R.room_size - C.class_size)
FROM Rooms AS R, Classes AS C
WHERE C.class_size <= R.room_size)

(SELECT P1.room_nbr, P1.class_nbr, fit
FROM Pairs AS P1
WHERE P1.fit = 2);

r05 c09 2

Here is the step for a fit of 3, 4, 5 and 6

r01 c05 3

r04 c06 4
r08 c13 4
r10 c15 4

r06 c12 5
r11 c18 5

r03 c07 6
r07 c14 6
r09 c16 6

At this point, the original tables of rooms and classes cannot be paired. Notice that as we removed rooms and classes, the fit numbers changed. This is a characteristic of greedy algorithms; taking the “big bites” can leave sub-optimal leftovers on the plate.
Ok, I can see where you are heading. Given this algorithm we would need a cursor, or we could probably use a module from 9.7 to create a recursive function. I don't think it will be possible to use a CTE to solve it. I'll do some more thinking and see if I can come up with some additional solution

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•