So I just got back from a meetup hosted by Daylife where I got to meet Ken Ellis the Chief Scientist over there and Co-founder/Chief Scientist for Endeca, Dan Tunkelang. It was a brilliant conversation and I learned a lot about the problems they have in traversing content. In regards to Daylife the way they aggregate their data is brute force even though some of the problems or "nits" as Ken called them are difficult to get through. Daylife being a service that allows websites to show content on topics or in general aggregate and display content based on topics, people, etc. Some of those nits involve defining meaning to data and such which is simply a difficult problem to solve without some sort of machine learning. For instance, Christopher Warner and Christopher Warner maybe two different people. Disambiguation would obviously be required here defining one as a race-car driver and one as a doctor. Even then, it's still an issue because you have problems of relevance and such. We then went on to talk about API's and it would certainly make their job easier if they could connect to api's of media organizations. Few of which exist except for the New York Times; So Daylife is parasitic yes but really it's useful for organizations that don't have large sums of content and exposing it to others, or exposing their content with a little bit of something else they may feel is useful to their customers. If you're a media organization licensing your content via the api to lets say Daylife which will then pass the cost onto customers who actually need or can utilize the content could be lucrative. It's really early on to tell exactly if it's viable because of the market and how relatively new this is to everyone but really really interesting stuff. I'm gonna have to have lunch with those guys because they are right around the corner from where I work. Daylife does have open api's though and you can check out their labs and widgets
After talking to Ken earlier on (since I was the first one there) I spent an hour or more talking with Endeca's Chief Scientist, Daniel Tunkelang and it was extremely informative and as usual it's refreshing to talk to people who understand what you are talking about. So basically we discussed the New York Times Corpus and exposing of content for exploratory search. Unfortunately; Endeca can't do anything with that stuff yet due to the license issues surrounding the New York Times corpus. However, the developers at the New York Times want it more open and it's only a matter of time before the rest of the company gets it. Soon as they do he'll be all over it. Supposedly they do exactly what I wanted to do by having all of their content in objects. This allows things like Endeca to drill down into the content based on the corpus and provide better enterprise search simply based on the customers own corpus! It would also allow Endeca to other cool things in general because the relationships are already made through the objects! We also talked about more hits of advertising through exploratory search by keeping the customer focused on the site by simply exposing more of your content; which is good business because it's positive for the customer and for your business. The best way to do this? Content objects and attributes!! Everyone was saying this all night I was floored. Other developers seem to get it!! This makes it easy to make relationships based on tags etc etc without Endeca having to figure it out on it's own or you having to build some pipeline. You can still do all of that but it's obviously going to be more difficult. Rationality!
The ACM (you should become a member if you aren't) which uses Plone and also uses Endeca has lots of content and the traversal is based on the authors content themselves. Which is all pretty much standard due to the way papers are generally formatted. Dan explained to me that he wanted to do a lot more but due to constraints from the organization wasn't able to. It's unfortunate to say the least.. Anyway, overall the meetup was cool and I was invited to another conference this one in July in Boston so I may check that out and the Franz talk which I'm already rsvp'd up for. Gave me his card and as usual no cards. I'm gonna get some cards done; I guess; saying just google Christopher Warner isn't really a good response. He told me to add him on linked.in so i'll do that in a bit.
Met a couple of other interesting news organization people from other newspapers and magazines, a military security company and some people who actually have read my stuff! A New York Times editor was there taking notes most likely for a story I suspect, or maybe just to understand this stuff better. It was stated that this community is entirely small because everyone keeps bumping into each other and we all seem to be on the edge. I didn't see anyone I knew there but I bet money i'll bump into some heads on the Sun Microsystems campus for sure.
Endeca and Daylife
So I just got back from a meetup hosted by Daylife where I got to meet Ken Ellis the Chief Scientist over there and Co-founder/Chief Scientist for Endeca, Dan Tunkelang. It was a brilliant conversation and I learned a lot about the problems they have in traversing content. In regards to Daylife the way they aggregate their data is brute force even though some of the problems or "nits" as Ken called them are difficult to get through. Daylife being a service that allows websites to show content on topics or in general aggregate and display content based on topics, people, etc. Some of those nits involve defining meaning to data and such which is simply a difficult problem to solve without some sort of machine learning. For instance, Christopher Warner and Christopher Warner maybe two different people. Disambiguation would obviously be required here defining one as a race-car driver and one as a doctor. Even then, it's still an issue because you have problems of relevance and such. We then went on to talk about API's and it would certainly make their job easier if they could connect to api's of media organizations. Few of which exist except for the New York Times; So Daylife is parasitic yes but really it's useful for organizations that don't have large sums of content and exposing it to others, or exposing their content with a little bit of something else they may feel is useful to their customers. If you're a media organization licensing your content via the api to lets say Daylife which will then pass the cost onto customers who actually need or can utilize the content could be lucrative. It's really early on to tell exactly if it's viable because of the market and how relatively new this is to everyone but really really interesting stuff. I'm gonna have to have lunch with those guys because they are right around the corner from where I work. Daylife does have open api's though and you can check out their labs and widgets
After talking to Ken earlier on (since I was the first one there) I spent an hour or more talking with Endeca's Chief Scientist, Daniel Tunkelang and it was extremely informative and as usual it's refreshing to talk to people who understand what you are talking about. So basically we discussed the New York Times Corpus and exposing of content for exploratory search. Unfortunately; Endeca can't do anything with that stuff yet due to the license issues surrounding the New York Times corpus. However, the developers at the New York Times want it more open and it's only a matter of time before the rest of the company gets it. Soon as they do he'll be all over it. Supposedly they do exactly what I wanted to do by having all of their content in objects. This allows things like Endeca to drill down into the content based on the corpus and provide better enterprise search simply based on the customers own corpus! It would also allow Endeca to other cool things in general because the relationships are already made through the objects! We also talked about more hits of advertising through exploratory search by keeping the customer focused on the site by simply exposing more of your content; which is good business because it's positive for the customer and for your business. The best way to do this? Content objects and attributes!! Everyone was saying this all night I was floored. Other developers seem to get it!! This makes it easy to make relationships based on tags etc etc without Endeca having to figure it out on it's own or you having to build some pipeline. You can still do all of that but it's obviously going to be more difficult. Rationality!
The ACM (you should become a member if you aren't) which uses Plone and also uses Endeca has lots of content and the traversal is based on the authors content themselves. Which is all pretty much standard due to the way papers are generally formatted. Dan explained to me that he wanted to do a lot more but due to constraints from the organization wasn't able to. It's unfortunate to say the least.. Anyway, overall the meetup was cool and I was invited to another conference this one in July in Boston so I may check that out and the Franz talk which I'm already rsvp'd up for. Gave me his card and as usual no cards. I'm gonna get some cards done; I guess; saying just google Christopher Warner isn't really a good response. He told me to add him on linked.in so i'll do that in a bit.
Met a couple of other interesting news organization people from other newspapers and magazines, a military security company and some people who actually have read my stuff! A New York Times editor was there taking notes most likely for a story I suspect, or maybe just to understand this stuff better. It was stated that this community is entirely small because everyone keeps bumping into each other and we all seem to be on the edge. I didn't see anyone I knew there but I bet money i'll bump into some heads on the Sun Microsystems campus for sure.
Related Posts:
About Christopher Warner
No description. Please complete your profile.