First I thought about storing each version in one column as gzcompressed csv.
Without making the effort to test, I imagine the search performance to be unacceptable.
Probably better to take the extra storage.
If storage is a problem you could always create a set of triggers:
UPDATE triggers will identify only the old column values that have changed
DELETE triggers will have to identify the entire entry that is being deleted.
The savings though are minimal because you will still need to identify the primary key (orderid), the column name (or a numeric representation to reduce storage) and the old value. You will need to create fields for INT, DATE and VARCHAR field types for the values. Finally you will also need a transaction number for that record. The transaction number is used to determine similar operations, for example, lets imagine columns noBoxes and weightKg are updated at the same time then we need to create a transaction number to identify to link these two fields with the same update.
If there are many deletes then every field is effectively being updated so you will end up with more information in your audit table than in the original.
I presented a simplified version as example here.
The biggest concern is performance while trying to keep storage minimized.
A row once create on tp_orders and other tables will never be deleted, only flagged as canceled in another table.
A history table including a foreign key restraint to orderId and one column for each datatype in the main table [TINYINT(1), TINYINT(3), SMALLINT(5), MEDIUMINT(8), VARCHAR, DATE, DATETIME] might be a good performance / storage compromise.
The enumeration of the main table's column names will make changes to the table-structure during development more difficult.
Creating and inserting optimized history records should probably be handled by php and not triggers because the logic needed might be a little too much for SQL.
Also performance wise I will have to compare "JOIN" syntax to separate searches on every table via php.
Actually performing the work on the database server is more efficient if the work is simple. Handling this in PHP will require the PHP engine to start up, requests will need to be sent using the client, these will need to be parsed and finally executed on the server. Using triggers performs all the work in one location without starting any new processes.
As for the queries, if you create VIEWS once you know the SELECT syntax that you would need then this would further simplify the development by calling the VIEW to get the historical data.
My main concern regarding the use of VIEWs as opposed to combining separate queries in php is: limited RAM on the server.
I do not know how MySQL behaves when rendering a VIEW of e. g. 50 resulting rows out of 4 tables holding far beyond one million rows each.
If the VIEW combines the 4 tables (in RAM) and then extracts the resulting 50 rows, I might hit a RAM bottleneck.
Anyone knows for sure how MySQL behaves while rendering small VIEWs from huge data-sets?