Sunday, November 20, 2005

Syndicating Ping State with OPML

This is a followup on my previous post about RSS Reading lists. The recent increase in activity around OPML has a lot of people thinking of useful extensions for it (Syndication of attention data, OPML Extensions, Identity systems) - much of the this is related to subscribing (rather than importing) OPML files. I'll suggest some OPML extensions of my own and why I think they are useful.

Add Time To Live (TTL) Sub-element of <head> (optional)

If clients are going to subscribe to OPML files then there needs to be some way to indicate how long the file should be cached for. This would work basically the same way as RSS 2.0 does - the value of the TTL element indicates the lifetime of that document in minutes and that the server should not poll any more frequently that. Like RSS, there needs to be a reasonably accepted default (60 mins is considered a reasonable default for RSS).

Add etag attribute to <outline> element (optional)

RSS aggregators keep track of the current state of a subscription using either HTTP ETag header or the HTTP Last Modified header. This information is used to conditionally GET the subscription from the server only if the contents have changed relative to the version the client has. Many services are capable of publishing OPML files dynamically - some services such as server based RSS news readers such as Bloglines also syndicate the feed contents. In this case, the OPML publisher already knows the state of the feed (i.e. the ETag) and could publish that information into the OPML file as an ETag using the etag attribute. How does this help the OPML subscriber? The aggregator now checks for updates on the OPML subscription file. For each RSS outline in the OPML file, if there is a etag attribute it's compared to the current etag for the subscription. If the etags match then the subscription is considered current and doesn't need to be downloaded. If the etags do not match then the subscription needs to be downloaded. An RSS Reading list might potentially have 100s of subscriptions contained within it - rather than have N subscribers plus the publisher check the status of M subscriptions in the reading list for (N + 1) x M total pings, the N subscribers could check the status of the publishers OPML file while the publisher checks the status of the M subscriptions for a total of N + M total pings.

Issues

This system is a centralized cache for RSS ping state which have two well known problems: 1) when to invalidate the cache and 2) the cache is a single point of failure. The first problem seems less severe and could have some sort of user override. In the second case, if the server ceases to publish etag information in the OPML or ceases to publish the OPML file at all, the aggregator could revert to polling the subscriptions directly. Another possibility is that an OPML publisher could only publish ping state for subscriptions that the publisher is responsible for - for example, a server based RSS aggregator like bloglines could republish it's cached copies of subscriptions acting as an RSS Proxy server.

Notes

The WebDAV PROPFIND verb uses basically the same mechanism as a way of performing directory enumeration. OPML can also be used to implement virtual directory structures so the same scheme could be used in that use case as well.

1 Comments:

Anonymous Anonymous said...

Before I even read the RSS Ping article that I came here for, I have to say: you have the best blog banner caption and title that I"ve run across yet. Both are awesome, thx for the lol!

2:43 PM  

Post a Comment

<< Home