Pervasync - The Database Synchronization Software for Pervasive Computing
As computing devices become more and more pervasive and capable, it is getting both feasible and necessary to run a database system on the devices for the management of the large amount of data that could be stored locally on devices. The device databases would need to exchange data with a central database for backup, data collection and distribution purposes. The exchange of data between the local and central databases is called database synchronization.
Pervasync provides a sync solution for the world's most popular commercial database, Oracle, and the world's most popular open source database, Mysql. The time has come for offline applications to use the world's best databases on devices (workstations, PCs, laptops and PDAs). The benefits of using general-purpose Oracle and Mysql as offline databases over using little known, little used "mobile" databases are obvious: robustness, feature set and application portability.
Key features of Pervasync:
* Supports Oracle and Mysql
Yes you can now sync Mysql databases on personal computers with Oracle databases in your data center. Of course you can also sync Mysql with Mysql, and Oracle with Oracle.
* Publish and Subscribe Model
You publish sync objects and control who can sync what. You also can specify one-way or two-way sync on a per sync table base.
* Supports Schema Evolution
In the initial sync, the DB schema is created automatically on client. Later on, changes to the server schema are propagated to clients during synchronization.
* Supports Data Sub-Setting
From ground up, the system is designed to support data sub-setting, so that each client can synchronize its private data as well as shared data with a central server.
* Web-Based Admin Console
The web-based admin console makes it easy to monitor and control the server, view client sync history. In addition, it enables you do the publish-and-subscribe by pointing and clicking.
* Built to Perform and Scale
It is easy for everybody to say this. But we really mean it.
* Supports File Sync
A bonus. In additional to database data, you can also synchronize selected files in selected folders .
What are the differences between Database synchronization and replication?
Mysql DBMS has built-in replication support. However it does not have synchronization support as those provided by Sybase SqlAnywhere mobilink and Oracle Lite synchronization. If you want to use Mysql as an offline database, you need to sync it with a central database by yourself or find a third party sync provider.
So what are the differences between replication and synchronization? Database synchronization is closely related to database replication. In fact, sometimes people use the terms interchangeably. However, there are big differences, understanding which will help us understand the different approaches used for solving replication and synchronization problems.
Replication is mostly used in situations where identical replicas of the complete data set are maintained on two or more database instances for high availability and performance reasons. Database instances can often work independently as backups for each other. The relationships between the instances are equal, symmetric. Normally the number of DB instances is small.
On the other hand, in a database synchronization system, typically you have a big central database on the server side and a large number of small databases each residing on a device. The central database contains data for all the devices while each device's local database only contains the device's private data and some shared data.
In the non-database world, Synchronization is also often used to describe the data exchange between a more temporal sub-set of data and a more persistent full-set of data. For instance, parts of a file could be buffered in-memory by an operating system and are "synchronized" with the file on hard disk. Another example is the synchronization of the data in a CPU cache memory with the data in the main memory.
What are the different approaches used for database synchronization and replication?
In database replication schemes, physical transactions on each node are recorded and played back on all the other nodes. This technique would only work if each node has a replica of the full-set data.
There is also a stability issue with physical transaction based replications when the number of nodes goes up. Transactions on different replicas could conflict with each other. To handle this, normally cross system locking or complicated conflict resolution schemes are needed. In fact, they are used in eager replication and lazy replication respectively.
Eager replication synchronously updates all replicas as part of one atomic transaction. This is also called synchronous replication or pessimistic replication as it incurs global locking and waiting.
In contrast to eager replication, lazy replication allows updates of any replicas without locking others. The local transactions then propagate to other replicas in background. This scheme is also called asynchronous replication or optimistic replication since it is based on the assumption that conflicts will occur rarely. Here each host must apply transactions generated by the rest of the hosts. Conflicts must be detected and resolved. Since the transactions are final at their origin nodes, usually manual or complicated ad hoc conflict resolutions are required at the destination nodes.
Researchers have shown that the traditional transactional replication has unstable behaviors as the workload scales up: a ten-fold increase in node and traffic gives a thousand fold increase in deadlocks or reconciliations/conflict resolutions. A system that performs well on a few nodes may become unstable as the system scales up to even a dozen of nodes. To alleviate this problem, master-slave schemes are created where writes are only allowed on the master server and the slave servers are made read-only.
The traditional database replication schemes are clearly not suited for database synchronization, which involves hundreds or even thousands of nodes in one system. The traditional replication systems are symmetric and each node contains a full-set data, while a synchronization system is asymmetric and the client nodes contain sub-sets of the data on central server. The asymmetry, together with the instability introduced with large number of nodes calls for a different approach than propagating physical transactions to all nodes, as used in replication.
The key to a successful synchronization scheme is server-centric. Replication schemes, except for the special purpose master-slave scheme, treat all the nodes equally, symmetrically. Transactions on each node are propagated to all other nodes and conflict detection and resolution are done on the destination nodes. In contrast, database synchronization employs an asymmetric client-server paradigm. In the server centric synchronization system, the server database is the single source of truth. Client nodes only talk to the server, not each other. Conflict detection and resolution only happen on one node, the server. This ensures the system stability.
Physical transactions applied on the central database are final. In contrast, device database serves as a cache of a sub-set of central database data. Physical transactions applied on the device database are tentative until they are checked in to the central database. At check in time, the changes committed by all the local physical transactions form a single logical transaction, which is applied directly to only the central database, not directly to other devices. The checking in of logical transactions in the sync system is just like the committing of user physical transactions in a traditional client-server DB system.
The checkin of client transactions is pretty straightforward except that you cannot simply record the SQL statements on client and replay them on server. Remember the client has a sub-set of server data and the same SQL statement could do different things on server than on client. Instead, you keep track of changes to records/rows and apply the changes to server DB.
On the other hand, it is trickier to refresh client DB with server transactions. Again, you cannot simply record the SQL statements and should instead track the changes to records/rows. In addition, you need to handle logical inserts and deletes. For example, let’s say you have a tasks table on server containing all the tasks. You want each user/client to sync only the tasks assigned to him or her. In case you re-assign the task from user A to user B, user A should get a logical delete and user B should get a logical insert. It is called a “logical” insert/delete since the task wasn’t physically inserted or deleted on server.
Fortunately a good synchronization system should be able to handle all the intricacies like change tracking, logical inserts/deletes, conflict detection and resolution.
The dangers of timestamp based database synchronization
We know that in synchronization you cannot simply record the SQL statements on one database and replay them on another database. The reason is that the client database typically only has a sub-set of server database data and the same SQL statement could do different things on server than on client. Instead, you should somehow track the changes to records/rows and apply the changes to the destination DB.
The most popular change tracking method on server DB is based on timestamps. It looks very straightforward. You add a timestamp column to your table and update the timestamp column whenever you change a row. This can be done in your SQL statement or through triggers. By the way, deletes have to be tracked separately, maybe in a companion table. Then at sync time, the sync client would come in with a last sync timestamp and you select all the server changes that have a newer timestamp than the last sync timestamp.
As I said, this is a widely used technique when people have to implement their own sync logic. Also, some sync solutions put the burden of change tracking on app developers and this is the number one recommended technique. However, be aware of its pitfalls.
One obvious pitfall is system time. This is timestamp based so be careful with system time. Don’t adjust system time even it is wrong. Do not sync during time shifts between daylight saving time and standard time.
There is a more serious problem with this technique that could cause change loss. Let me try to explain it. The default isolation level for Mysql with innodb and Oracle is “Read Committed”, which means that others cannot see the changes before a transaction is committed. Let’s say at time T1 you modified a record R1. The timestamp column would have a value of T1. Then before you commit, a sync happened at T2 (T2 > T1). This sync could not pickup the change to record R1 since the transaction was not committed and the change was invisible. At T3 you committed the transaction and at T4 (T4>T3>T2>T1) you do another sync. Guess what, you still won’t get the change to R1! It was committed but it has a timestamp T1 that is older than last sync time T2. The client will forever miss the change no matter how many times you sync.
This problem is not so well known and is very hard to workaround in a production environment.
Fortunately Persasync (http://www.pervasync.com) has an innovatively designed sync engine that can take care of all the sync issues for you. You don’t need to have any timestamp column to worry about. Just do your normal DB DML operations and the system would track the changes for you and guarantee no change loss.