MiddleKit version 1.0.2
MiddleKit provides an object-relational mapping layer that enables developers to write object-oriented, Python-centric code while enjoying the benefits of a relational database.
The benefits of Python include: generality, OOP (encapsulation, inheritance, polymorphism), a mature language, and many libraries, both standard and 3rd party.
The benefits of a relational database include: data storage, concurrent access and 3rd party tools (such as report generators).
MiddleKit can use any of the following databases as a back-end:
All attributes in an object model must be typed. MiddleKit divides types into 4 major categories:
The basic types are easy to work with and do what you expect. They are the same as the Python types with the addition of one special case: bool
.
While MiddleKit is designed to be generic and therefore, database agnostic, concrete examples pay good dividends. So the following table includes the type equivalents for MySQL.
Basic Types | |||
---|---|---|---|
MiddleKit | Python | MySQL | Notes |
bool | int | bool | Py int = 0 or 1 |
int | int | int | |
long | long | bigint | 64-bit int |
float | float | double | 64-bit float |
string | string | varchar(*) |
(*) The MySQL type is char
if the minimum and maximum length are equal.
MiddleKit supports types of date, time and datetime. With a default Python installation and MiddleKit, these are expressed as Python strings such as '2001-02-27'
. If you have installed the mxDateTime package, then these values will be expressed as instances of DateTime and DateTimeDelta.
With a default Python installation and MiddleKit, you can pass strings:
Date/Time Types | |||
---|---|---|---|
MiddleKit | Python | MySQL | Notes |
datetime | string/DateTime | datetime | |
date | string/DateTime | datetime | |
time | string/DateTimeDelta | time |
Enumerations are provided through the enum type which is directly supported in MySQL. In Python, these enumerations are kept as case sensitive strings. The object model must specify the valid values of the enumeration. While that be done via a column named "Enums", this is more frequently specified in the "Extras" column, where less common attribute specs are usually placed:
Enums='red, green, blue'
There are two types of object references: a single reference and a list of references. In relational terminology, that would be "1-to-1" and "1-to-many" respectively.
The type for a single reference in Python is indicated by simply naming the class of the object that can be referred to. That class must be defined in the object model and will therefore have its own table in the database. User-defined classes are required to be capitalized (while other types are lower case). For example:
Attribute | Type |
---|---|
address | Address |
billingInfo | BillingInfo |
The type for a list of references is specified by list of ClassName
and represents the 1-to-many relationship. This will be an ordinary Python list, except that some invisible MiddleKit machinery in the background will perform various services (fetch objects on demand, insert new objects, etc.). For example:
Class | Attribute | Type |
---|---|---|
Contact | ||
addresses | list of Address |
Note that the the Address class referred to will need a "back reference" attribute which, for each Address instance, refers to the corresponding Contact instance. By default, MiddleKit assumes that this attribute is named the same as the class containing the list attribute ('Contact' in this example), but with the first letter converted to lowercase to match MiddleKit naming conventions (therefore, 'contact'):
Class | Attribute | Type |
---|---|---|
Address | ||
contact | Contact | |
name | string |
Using the example of 'Author' and 'Books' objects, we may have many books written a single author, and a book may have multiple authors (i.e. many-to-many). To set up this relationship in MiddleKit, use an intermediate object to represent the relationship. In many cases it makes sense to name this object using a verb, such as "Wrote":
Class | Attribute | Type |
---|---|---|
Author | ||
name | string | |
books | list of Wrote | |
Book | ||
title | string | |
authors | list of Wrote | |
Wrote | ||
author | Author | |
book | Book |
It was mentioned above that, when using a list attribute, MiddleKit requires a "back reference" and by default assumes this attribute will be named according to the name of the referencing. If you ever want to create recursive structures such as trees you may need to override the default using the "BackRefAttr" attribute. A tree node can be implemented by having a reference to its parent node, and a list of children nodes like so:
Class | Attribute | Type | Extras |
---|---|---|---|
Node | |||
parent | Node | ||
children | list of Node | BackRefAttr='parent' |
By specifying the attribute to use a back reference, you've told MiddleKit that it can fetch children of a specific node by fetching all children whose "parent" attribute matches that node.
There are two additional properties to control how object references are handled when an objects is deleted.
For the purpose of discussion, the object containing the attribute is self while the objects being referred to are the others. Now then, the onDeleteSelf property specifies what happens to the other object(s) when the self object is deleted:
There is a similar property onDeleteOther which specifies what happens to the self object when the other object(s) is deleted:
The default value of onDeleteSelf is detach, and the default value of onDeleteOther is deny. In other words, by default, you can delete an object which references other objects, but you can't delete an object which is referenced by other objects. An example specification would be onDeleteOther=cascade
.
Note: onDeleteSelf can also be specified for "list of reference" attributes, where it has the same effect as it does when applied to reference attributes.
This is the object model where classes and attributes are defined. See the Quick Start for an example.
This is an optional file containing sample data for the model. See the Quick Start for an example.
Note that a blank field in the samples will be substituted with the default value of the attribute (as specified in the object model, e.g., Classes.csv). To force a None value (NULL in SQL), use 'none' (without the quotes).
During development you will likely need to modify your MiddleKit model (Classes.csv) and re-generate the classes. If you already have some data in the database, you will need to dump and reload the data, possibly manipulating the data to fit the new schema.
You can dump your existing data into a file in the same format as the Samples.csv file. Then you can rename or remove columns in Samples.csv to match your schema changes, and then run generate, create, insert to recreate your database and reload the data (this procedure is described in the Quick Start guide).
python /Projects/Webware/MiddleKit/Run/Dump.py --db MySQL --model Videos >Videos.mkmodel/Samples.csv
If you need to pass any arguments (i.e. user/password) to the store, use the --prompt-for-args option. You can then enter any arguments you need in Python style:
python /Projects/Webware/MiddleKit/Run/Dump.py --db MySQL --model Videos --prompt-for-args >Videos.mkmodel/Samples.csv Dumping database ITWorks... Enter MySQLObjectStore init args: user='me',password='foo'
Alternatively, you can use the DatabaseArgs setting.
An MK model can have configuration files inside it that affect things like code generation.
Settings.config is the primary configuration file.
The Package setting can be used to declare the package that your set of middle objects are contained by. This is useful for keeping your middle objects packaged away from other parts of your programs, thereby reducing the chances of a name conflict. This is the recommended way of using MK.
An example Settings.config:
{ 'Package': 'Middle', }
Your code would then import classes like so:
from Middle.Foo import Foo
Don't forget to put an __init__.py in the directory so that Python recognizes it as a package.
The SQLLog setting can be used to get MiddleKit to echo all SQL statements to 'stdout', 'stderr' or a filename. For filenames, an optional 'Mode' setting inside SQLLog can be used to write over or append to an existing file. The default is write. Here are some examples:
{ 'SQLLog': { 'File': 'stdout' }, }
{ 'SQLLog': { 'File': 'middlekit.sql' }, }
{ 'SQLLog': { 'File': 'middlekit.sql', 'Mode': 'append' }, }
The Database setting overrides the database name, which is otherwise assumed to be same name as the model. This is particularly useful if you are running two instances of the same application on one host.
{ 'Database': 'foobar', }
The DatabaseArgs setting allows you to specify default arguments to be used for establishing the database connection (i.e. username, password, host, etc.). The possible arguments depend on the underlying database you are using (MySQL, PostgreSQL, etc.). Any arguments passed when creating the store instance will take precedence over these settings.
{ 'DatabaseArgs': {'user': 'jdhildeb', 'password', 's3cr3t'}, }
The DeleteBehavior setting can be used to change what MiddleKit does when you delete objects. The default behavior is "delete" which means that objects are deleted from the SQL database when they are deleted from the MiddleKit object store. But setting DeleteBehavior to "mark" causes an extra SQL datetime column called "deleted" to be added to each SQL table, and records that are deleted from the object store in MiddleKit are kept in SQL tables with the deleted field set to the date/time when the object was deleted. This setting has no effect on the visible behavior of MiddleKit; it only changes what happens behind the scenes in the SQL store.
{ 'DeleteBehavior': 'mark', }
The SQLConnectionPoolSize setting is used to create a MiscUtils.DBPool instance for use by the store. For DB-API modules with a threadsafety of only 1 (such as MySQLdb or pgdb), this is particularly useful (in one benchmark, the speed up was 15 - 20%). Simply set the size of the pool in order to have one created and used:
{ 'SQLConnectionPoolSize': 20, }
The SQLSerialColumnName controls the name that is used for the serial number of a given database record, which is also the primary key. The default is 'serialNum' which matches MiddleKit naming conventions. You can change this:
{ 'SQLSerialColumnName': 'SerialNum', # capitalized, or 'SQLSerialColumnName': '%(className)sId', # the name used by older MiddleKits # you can use className for lower, ClassName for upper, or _ClassName for as-is }
The ObjRefSuffixes controls the suffixes that are appended for the names of the two SQL column that are created for each obj ref attribute. The suffixes must be different from each other.
{ 'ObjRefSuffixes': ('ClassId', 'ObjId'), # the default }
The UseBigIntObjRefColumns causes MiddleKit to store object references in 64-bit fields, instead of in two fields (one for the class id and one for the obj id). You would only do this for a legacy MiddleKit application. Turning this on obsoletes the ObjRefSuffixes setting.
{ 'UseBigIntObjRefColumns': True, # use single 64-bit obj ref fields }
The UsePickledClassesCache setting defaults to False. This feature has proven to be unreliable which is why it now defaults to False. When True, it causes MiddleKit to cache the Classes.csv text file as a binary pickle file named Classes.pickle.cache. This reduces subsequent load times by about 40%. The cache will be ignored if it can't be read, is older than the CSV file, has a different Python version, etc. You don't normally even need to think about this, but if for some reason you would like to turn off the use of the cache, you can do so through this setting.
The DropStatements setting has these potential values:
{ 'DropStatements': 'database', # database, tables }
The CacheObjectsForever setting causes MiddleKit to retain references to each object it loads from the database indefinitely. Depending on the amount of data in your database, this can use a lot of memory. The CacheObjectsForever setting defaults to False, which causes MiddleKit to use "weak references" to cache objects. This allows the Python garbage collector to collect an object when there are no other reachable references to the object.
{ 'CacheObjectsForever': True, # keep objects in memory indefinitely }
The AccessorStyle setting can take the values 'methods'--the default--and 'properties'. With methods, your code will look like this:
if email.isVerified(): pass user.setName('Chuck')
With properties, it will look like this:
if email.isVerified: pass user.name = 'Chuck'
{ 'AccessorStyle': 'methods', }
The UseHashForClassIds setting defaults to False. When False, class ids are numbered 1, 2, 3, ... which implies that as you add and remove classes during development the class ids will change. While not a strict problem, this can cause your production, test and development environments to use different class ids. That can make data comparisons, data migration and sometimes even schema comparisons more difficult. By setting UseHashForClassIds to True, the class ids will be hashed from the class names greatly improving the chances that class ids remain consistent. Caveat 1: The class id will change if you rename the class. Caveat 2: There is no way to dictate the id of a class in the model to make the original id stick when you hit Caveat 1. Despite the caveats, this is still likely a better approach than the serial numbering.
{ 'UseHashForClassIds': False, }
For each attribute, foo, MiddleKit stores its value in the attribute _foo, returns it in the accessor method foo() and allows you to set it with setFoo(). You should always use foo() to get the value of an attribute, as there could be some logic there behind the scenes.
Note: MiddleKit 0.9 added an AccessorStyle setting which you should learn about if you prefer Python properties over Python methods.
Given an attribute of type list, with the name "bars", MK will generate a Python method named addToBars() that will make it easy for you to add a new object to the list:
newBar = Bar() newBar.setXY(1, 2) foo.addToBars(newBar)
This method actually does a lot more for you, ensuring that you're not adding an object of the wrong type, adding the same object twice, etc. Here is a complete list:
You don't have to remember the details since this behavior is both supplied and what you would expect. Just remember to use the various addToBars() methods.
Similarly to adding a new object to the list with addToBars(), you can also delete an existing object from the list with delFromBars().
Note that a setBars() method is provided for list typed attributes.
MiddleKit uses the name of the store as the name of the database. This works well most of the time. However, if you need to use a different database name, there are two techniques available:
1. You can specify the 'Database' setting in Settings.config. See Configuration for an example.
2. You can pass the database name via the object store's constructor arguments, which are then passed on to the DB API module. This technique overrides both the default model name and the model settings. For example:
store = MySQLObjectStore(db='foobar', user='prog', passwd='asdklfj')
Every once in a while, you might get a hankering for iterating over the attributes of an MK object. You can do so like this:
for attr in obj.klass().allAttrs(): print attr.name()
The klass() method seen above, returns the object's MiddleKit Klass, which is the class specification that came from the object model you created. The allAttrs() method returns a klass' list of attributes.
The attributes are instances of MiddleKit.Core.Attr (or one of its subclasses such as ObjRefAttr) which inherits from UserDict and acquires additional methods from mix-ins located in MiddleKit.Design and MiddleKit.Run. Since attributes are essentially dictionaries, you can treat them like so, although if you modify them you are asking for serious trouble.
for attr in obj.klass().allAttrs(): keys = attr.keys() keys.sort() print '%s: %s' % (attr.name(), keys)
If you had asked the klass for its attrs() instead of allAttrs(), you would have missed out on attributes that were inherited.
If you want to get a dictionary of all the attribute values for a particular object, don't roll your own code. You can already ask your middle objects for allAttrs(), in which case you get values instead of definitions (which is what Klass returns for allAttrs()).
If you need to delete an object from the object store, you can do so like this:
store.deleteObject(object)
As with other changes, the deletion is is not committed until you perform store.saveChanges().
This may raise one of these two exceptions defined in MiddleKit.Run.ObjectStore:
See Object references for the specifications of onDeleteSelf and onDeleteOther.
Sometimes it can be convenient to define an attribute in MiddleKit that does not exist in the SQL database back end. Perhaps you want to compute the value from other attributes, or store the value somewhere else outside of the SQL database. Yet you still want to be able to iterate over the attribute using the allAttrs() method provided in MiddleKit.
To do this, simply set the property isDerived on the attribute in the model file. You will have to write your own setter and getter methods for the attribute.
MiddleKit will use the Default of attributes to generate a DEFAULT sqlValue in the attribute's portion of the SQL CREATE statement, taking care to quote strings properly. This default value is also used in the Python class. But on occasion you may have a need to specify an alternative SQL default (such as GetDate()). When that happens, specify a SQLDefault for the attribute. If you do this in the Extras column, quote the SQL; for example, SQLDefault='GetDate()'. MiddleKit will pass this Python string down to the CREATE statement.
In many situations it is useful to clone a MiddleKit object. One example is to allow a user to create a copy of some record (and all of its values) without having to enter a new record from scratch. Every MiddleKit object has a clone() method which can be used to create a clone of the object. All attribute values of the clone will be set to the same values as the original object.
Depending on your model, you may or may not want sub-objects referenced by the original object to be cloned in addition to the object itself. You can control this by adding a "Copy" column in your Classes.csv file, and set the value for each object reference (single reference or list reference). The possible values are:
If there is no Copy value set for an object reference, 'shallow' is assumed.
The following example should help illustrate how this can be used. In this example we want to be able to clone Book objects. We want a cloned book to have the same author(s), shelf and publisher as the original, but the new book should have a clean loan history (we don't want to clone the loans).
Class | Attribute | Type | Copy | Comment |
---|---|---|---|---|
Book | ||||
title | string | |||
authors | list of Wrote | deep | Wrote objects should be cloned, too. | |
loans | list of Loan | none | We want cloned books to have a clean loan history.. | |
shelf | Shelf | shallow | Cloned book should reference the same Shelf object as the original. | |
publisher | string | |||
Wrote | ||||
book | Book | deep | This is a back reference for Book; it needs to be set to 'deep' so that it will be set to the new (cloned) Book object. | |
author | Author | shallow | Don't clone the actual author object. | |
Author | ||||
name | string | |||
wrote | list of Wrote | Cloning a book won't propagate this far, since Wrote.author is set to 'shallow'. | ||
Loan | ||||
book | Book | |||
borrower | string | |||
Shelf | ||||
name | string |
When you create a cloned object(s), it is possible to generate a mapping from old to new objects. It is also possible to specify a column which, if there is a value set in it for an object reference attribute, should override the value in the Copy column. See the doc string in MiddleKit.Run.MiddleObject for more details.
Model inheritance is an advanced feature for developers who wish to reuse models in other projects that are also model based. In Settings.config, you can specify other models to inhert class definitions from, which are termed parent models:
{ 'Inherit': ['/usr/lib/mkmodels/News', 'Users'], }
Note that the .mkmodel extension is assumed. Also, relative filenames are relative to the path of the model inheriting them.
The essential effect is that the classes found in parent models are available to instantiate, subclass and create sample data from, and are termed inherited classes. You can also redefine an inherited class before using it in other class declarations. Classes are identified strictly by name.
The resolution order for finding a class in a model that has parent classes is the same as the basic method resolution order in Python 2.2, although don't take that mean that MiddleKit requires Python 2.2 (it requires 2.0 or greater).
Model inheritance does not affect the files found in the parent model directories. Also, settings and sample data are not inherited from parents; only class definitions.
In MiddleKit.Core.Model, the methods of interest that relate to model inheritance are klass(), which will traverse the parent model hierarchy if necessary, and allKlassesInOrder() and allKlassesByName(). See the doc strings for more info.
Caveats:
Class Attr Type Base b int Sub(Base) c intIf instead, B declares Sub first, then it will erroneously pick up the Base from A.
The topic of object-relational mapping (ORM) is an old one. Here are some related links if you wish to explore the topic further:
Scott Ambler has written some papers on the topic of ORM and also maintains a set of ORM related links:
Apple has a very mature (and perhaps complex) ORM framework named Enterprise Objects Framework, or EOF, available in both Java and Objective-C. All the docs are online at the WebObjects page:
Other Python ORMs that we're aware of:
Here's a Perl ORM that someone recommended as interesting:
@@ 2000-10-28 ce: This should be a separate guide.
In the Tests directory of MiddleKit you will find several test case object models.
@@ 2001-02-13 ce: complete this
Known bugs and future work in general, are documented in TO DO.
Authors: Chuck Esterbrook