Archetypes storage and versionning
2008 avril 27
CMFEditions is a Plone generic versionning product and a framework for more dedicataed versioning task like staging (Iterate). It was included in Plone bundle and is activated by default on new sites. It is very simple to use and well documented. A really nice poduct in a developer point of view and a very useful tool from a customer point of view. Diggig deeper you can find something dangerous from a storage point of view.
Then, what are we calling storage?
There’s the storage from Archetypes (AttributeStorage now replaced by AnnotationStorage, FileSystemStorage, SQLStorage, etc). They are the way we want to store a field value. Here we just want to know if it is inside or outside Zope… dig deeper…
There’s the storage from ZODB. They manage how each object is stored and what structure has the ZODB:
- FileStorage: the ZODB is a single and potentialy big file called Data.fs
- DirectoryStorage: each object in the ZODB is a folder in the filesystem and every object attribute is a file
- RelStorage: object are stored in an external database (PostGresSQL)
These storages have now blob support for file object: it means that your file is stored outside the ZODB and this one just store the reference on it.
There’s also a CMFEditions storage to store versionning informations and data. It’s currently based on ZVC that store everything in the Data.fs.
How is created a document’s revision ?
CMFEditions doesn’t know anything about Archetypes. It’s a transversal tool that can be apadted quickly if we move from Archetypes to something else. And we want it to keep this independance.
When a new revision is created, the docuement is parse like a simple python object and each attribute is stored following different strategies define by some modifiers. An attribute doesn’t show which Archetypes storage is used for it. until know it supposes that AnnotationStorage or AttributeStorage is used the it just copie each attribute elsewhere in the ZODB. In the FileSystemStorage case the whole file is copied from the filesystem into the ZODB for each revision… here size does matter and smaller is better.
Do get storage information we need to use Archetypes API. Then we need an extra modifier for each Archetypes storage. But now, what to do with this information? You have to modify you specific Archetypes storage to be able to store revision informations and to restore them if needed. CMFEditions communicate a very few informations during these operations.
Novell Plone team submit a patch on FileSystemStorage to do such work and next releases would include it.
What to do to simplify storage inclosing
This approach doesn’t allow to define advanced strategies in Archetypes storages above versionning. But why do we need to defin advanced strategies in Archetypes storages?
The only real storages should be the ZODB storages. But they doesn’t allow advanced strategy creation to manage objects on different ways following some rules.
What is the advantage to have strategy implemented directly in the ZODB?
CMFEditions would not need to know anything about field storage: everything is managed by the ZODB. You don’t need to configure storage strategy in several configuration files and the developper doesn’t need to know about this. blob is one of this strategy. An Archetypes field would just indicate if it use standard or blob.