I'm new here. I stumbled on this forum in my search for information on how to realize a mediated database, i.e. a virtual database, that holds no data but can be queried, and in turn it queries a range of existing databases. This should be a fairly interesting thing to do if you'd like to integrate existing databases I imagine, and I'm sure it's quite common, e.g. when companies merge.
However, I've haven't had much success in finding either books or guides on how to do this. What I've found are scientific articles on mediated databases using methods called Global-As-View and Local-As-View, which I've familiarised myself with. The problem is that they all deal with database integration on a theoretical level, usually described in terms of Datalog language.
I'd like to run some tests but don't know how to begin implementing the theories of the above mentioned methods, and so I'm wondering if someone could point me to information about practicalities in relation to implementation of such methods, be they books, articles or any other material.
I should mention that I'm looking for implementation information within the realm of SQL DBMS.
Last edited by seron; 11-25-09 at 08:22.
Reason: added info about SQL
I am not familiar with the term mediated virtual database, but have done what you describe using a database that contains nothing but views to other databases. In SQL-Server you define a set of linked servers (connections to the other servers and databases) then build a view that selects from the linked servers. If the databases are all on the same server it is even easier, you just define views that reference the databases and tables directly.
This functionality is usually implemented by (and is better suited to) a data warehouse.
The creature you are describing makes the application subject to all the vagaries of each different database engine, application, security settings, etc.
There is most likely a good reason you cannot find much reference material on the subject.
If it's not practically useful, then it's practically useless.
@MarkATrombley: Is it the case that you create views based which relations are available in each materialized database, and if multiple databases can answer the same query posed to the view-only, i.e. virtual, database, the view queries each relevant source and integrate their answer into the view answer? If you later want to add another materialized database that might contribute with the same type of information, would you have to adjust your view to query also the new database?
@blindman: What I'm looking for are methods to integrate databases, where queries to the integrated database are propagated to the source databases momentarily. As I understand, a Data Warhouse is something that is updated periodically. It is therefore not applicable, although methods pertaining to gathering and transforming may be.
Yes, if the requirement is to retrieve something like part information from multiple databases I would create a view that was three select statements unioned together, or use one database as the driver and do joins to the other databases.
When response time is an issue I have also gone the more datawarehouse type route, created an scheduled job that creates local copies of the relevant tables from the various databases then my view selects from them. I didn't mention this before since you specified tableless, but it does make the data available when one of the systems is slow or unavailable for some reason.
Mashups offer a way to do what you want, and people will want to subscribe to the sources of data.
Thinking along those lines, you might also find it interesting to look at these Data Models on my Database Answers Web Site on a few related Topics :-
1) Mashups - Enterprise Data Mashups Data Model
Years back I produced a web based reporting system that ran on Sybase initially but would call reporting stored procs on a whole bunch of other database servers (SQL Server, Sybase and Oracle if I remember correctly). The front end looked the same to the user so they weren't aware they were working on a different db. Each database had the same set of stored procs that could be called and could be controlled via the data supplied - obviously they all produced entirely different types of reporting but always in roughly the same format ie HR reporting for some, trade reporting for others etc. The central Sybase server just contained the info on how to connect to the other servers and what systems each user could access. A fair amount of control over the reporting was added so users could have default reports and automated reports as well as the normal ad hoc reporting with drill downs etc. It worked quite well and produced around 6000 reports per day and it didn't need any looking after except when introducing a new system into the reporting system.