Plone, so where is it useful? – EXAMPLE #1
So cool, I've received a decent sum of email from people saying the same thing. "Spot on, but how do I go about using it? I need a concrete example of where it would be useful, I want to work smarter, not harder!" and then some email from people who have no idea about it asking for some clarification. So i'm going to give examples as they crop up in my own cases. The idea is that I tested some of the concepts of the talk on a control. That control being pretty much ignorant to all things content management or for that matter computer related. So here is a very relevant example:
THE WORK HARDER WAY AKA WASTE MONEY WAY!
Recently at work i've been shoveled a plate of bullshit. It involves taking data from an old content management system and moving it into an XML formatted file so that it can then be pipelined through Endeca. This data is however pretty much all over the place but essentially it exists in a database and in the XMP header space located in thumbnail images on the filesystem.
Not the biggest deal in the world extracting all of the relevant data from the thumbnails using ExifTool. Except; not all of the data existed in all of the thumbnails. Which means I had to get the remaining data from the database. Also; not the biggest deal in the world. However, the file that is supposed to be created; is supposed to be created on a regular basis for new records so that information could then be pipelined into Endeca! This means anytime new data is created in the old content management system. There needs to be a way to add that new information to a file which is then pipelined. The only caveat;Â there is no event system; so here is where your developer will hit the first snag with X content management system! The developer can't say "When a new piece of data is added, update file or send data to other system". So a smart developer will say, well I'll just use inotify (think of this as a piece of software that stands around monitoring changes that take place, if you add a file it sends a message saying: This file has been created!) to poll the filesystem for new thumbnails and snag the information as they are added and/or use Mysql's event scheduler to dump the specific data sets out at X time, or have it run every second or two. The not so smart developer will say lets run a cron job for this and recreate full data sets, over.. and over and over.. even when only one tiny piece of data has changed! The computer maybe able to do this quickly for small data sets but as the data sets grow and grow and grow. The time to completely regenerate full data sets will take that much longer.
So lets look at all of the problems with this from the developer view point:
1. If he or she goes the inotify route, they'll need to have RHEL5 or greater; your environment will probably have a production app supported on only RHEL4 or even more scary RHEL3. So they will have to fall back to fam, gamin, dnotify. (Limited by technology, check!)
2. If he or she goes the Mysql route you'll have to have resources to support a separate Mysql node. Primarily because you will not want to be running data retrieval routines on your production box. It's just not a good idea. (Additional resources, check!)
3. Running a cron job every second or polling every second to check for changes, DUMBBBBBBB. More wasted resources. (More wasted resources, check!)
4. "Hi, yeah well i'd like to have that information as soon as it happens!" (can't do that reliably?, check!)
5. Race conditions? (check!) Stupid use of resources? (check!) Money out of the window (check!) Additional bugs and possible data corruption? (check!) More money out of the window (check!) Eventually having a slow clunky unmaintainable system? (check!) More money hiring or spending time getting out of there?? (check!)
So far all the avenues end up in wasted resources. Wasted resources no matter how small add up over time. Time is money and so are resources.
THE WORK SMARTER WAY, ENJOY LIFE
In Plone there is an event system. So when a new piece of data is added your programmer can say "New data has been added, send data to other system". 1 step, 1 process because there is no event system in the old content management system. That data has to go through several steps and utilities before it can be used by another system and since this data isn't managed in anyway it doesn't matter where it ends up or how. It'll always be problematic. With Plone your programmer works smarter, less resources are wasted and everyone is happy. As a new piece of data is added a NewPieceofDataAdded event is kicked off and the file is updated or even better yet it's converted and directly pipelined into Endeca. So there is no polling, no extra systems or scripts. No cron jobs.. Just a simple routine; when X information is updated; send the info.
This concludes Example #1
December 19th, 2008 - 06:02
Hi, I stumbled on this blog looking for endeca support. I have to disagree with this. I would use the cron job. The number one concern for developers is managing complexity. By adding the code (or using a tool like inotify) to check for updates and opening the file and updating you add a significant amount of complexity to this process. I don’t know how mission critical your application is or the running time of the code to process the content and turn it into XML, but with computers these days most applications take less than a couple minutes to run. Your endeca partial updates are probably not running every minute so you can set something more reasonable for the time on the cron job. If you aren’t using a production server database (you shouldn’t!) for these dumps then the small amount of overhead it add is insignificant. And if your databases can fit in ram then its negligible.
KISS unless you are REALLY observing performance issues.
http://en.wikipedia.org/wiki/KISS_principle
December 19th, 2008 - 13:31
Hi, besides the fact that inotify was created exactly for this purpose
“Its major use is therefore arguably in desktop search utilities like Beagle, where its functionality permits reindexing of changed files without scanning the filesystem for changes every few minutes, which would be very inefficient.” via http://en.wikipedia.org/wiki/Inotify
The problem is that KISS applies to something engineered and designed well ahead of time. If you want; we can count the amount of system calls you would use; and the amount of system calls I would use. Then ontop of that it would take sufficiently LESS time to implement. I don’t know much about endeca partial updates or endeca in general but it’s a userspace application and most likely NOT using inotify. Inotify was put in the kernel to replace dnotify specifically because for large ops it would be quicker. Another overlook is that I have NFS in the mix polling that mount or running cron jobs on a mount that is already heavily overworked especially in my case is slow. Anyway, using another database isn’t really the problem.. The problem is if I did it your way.. I would be flying a plane to go to pick up some oj from the corner store. I agree that your way would work, but I don’t really feel like flying an airplane and going all the way around the world to pick up my oj. Your way is only KISS when you don’t have to worry about the system and it’s resources (ie: pilot of plane). Don’t be a sloppy developer. You could just WALK to the corner store?
Work SMARTER not HARDER.
December 19th, 2008 - 13:48
Also note that when I say my way I mean using the event system under Plone.. Inotify is under the work harder way.. It would still be faster and more simple than the crob job though. So it’d be more like driving a car to the corner store.
So what is more simple than.. When an event occurs like say; XThisHasChanged we kick off a XThingHasChangedEvent and do whatever we would do as we need to do it. It doesn’t get anymore simple than that. Cron jobs aren’t a proper replacement for an event system.
When something happens; do something. ORRRRR your way where we don’t know when something happens, so just run this job over here, over; “Nevermind me running back and forth doing the same thing over and over again on the same stuff, even though nothing has changed!”
Any Plonistas or Kernel ppl wanna comment?
December 19th, 2008 - 16:12
The way I understood your post is you can’t use the event system you only wish you could. I agree if its built in then definitely use it, but thats not the case. You would actually want to run this job in the endeca pipeline, so it runs the dump before it needs it rather than in cron. Try dumping the whole XML file and at the same time run top on your machine. Look at the resource usage.
December 19th, 2008 - 16:33
Yeah I’d need to know more about Endeca and how it works but basically what i’m talking about is what it does before that data is presented to Endeca. What Endeca does with the data I’m unconcerned with.
So how does the data get into Endeca? If it’s running a cron job that scans the entire filesystem all of the time; We agree, that’s pretty silly