30Mar/090
OpenCalais
Looks like it’s time to do some stuff with OpenCalais and my favorite content management system. OpenCalais is basically a machine-learning service making relationships from unstructured/structured text. The subsequent generated metadata allows one to better aggregate and traverse their content. I’m not too sure if I’m such a big fan of not having the source to the actual web service itself but it could be useful plugged into Plone and Wordpress.
OpenCalais is an open source project from Thomson Reuters!
OpenCalais Example
So after a bit of back and forth with the email servers I got my API-KEY for OpenCalais and toyed around with it a bit
I took this url http://nymag.com/daily/intel/2009/03/pretty_much_everybody_hates_bu.html which is an article on New York Magazine titled “Pretty Much Everybody Hates Budget Plan by Paterson, Silver, and Smith”.
Set my threshold to 0.5 in regards to relevance and spit this URL at Calais to see what I would get back:
That took about 5 minutes; so the next step is to take the entities and create resulting objects for them. The list of all the entities are available here. The higher the relevance the better so we can see that based on the entities the Article has something to do with David Paterson, Sheldon Silver and Sheldon Smith; that it was Published by New York Magazine but there is an error because the “Company” Intel doesn’t actually exist in the article. It’s actually talking about “Daily Intel”.