Extreme Geek Concepts Ahead - The Snow In The Summer or So-So

5September

Building a Better Social Graph

Mr. GB points out Bradley Fitzpatrick's latest project. It's a curved piece of wood, with spokes linking the rim to a central shaft. He's calling it the whyle...

Once again, Mr. Fitzpatrick has spent a whole load of time on something that has some limited potential, but is designed to be abused by unscrupulous companies, and he's needlessly repeating work that's already been done. We've alluded to it before, and there's now enough of an excuse to present our initial vision of a distributed social network, one half of the thing that has the capability to kill Facebook as it is currently known, plus the reason why it might not. (The other half is RSS becoming a commodity, and that's when, not if.)

This idea from Mr. Fitzpatrick is the social graph, and it boils down to a familiar problem: how do people know who you are when your identity is different across many services? Behemoths like Yahoo solve this problem by amalgamating all identities, by force if necessary, but Mr. Fitzpatrick proposes a slightly more gentle solution.

We can do better. Let's start with something we can all agree on - each person has a bit of storage space somewhere. It doesn't need to be a lot, 1MB should suffice for most people, 5MB for all but the most voracious consumers. In that bit of storage is a file (or a series of files), and in that file are data. We model the data falling into two distinct sections:

1. A list of a person's publicly-claimed identities on social networking and other internet sites.

2. A list of the identities that a person knows on other services.

Section 1 would (effectively) be statements of the form, On the BBC, I am daWeaver. On Iziblog, I am daweaver. On OldKidsTvRPG, I am Perkin Flump. These identities would be repudiable - if someone closed, did not wish to publicise, or wished to disown their account on (say) Livejournal, they could remove that from their list. And these identities would not necessarily be complete - if someone wished not to publish On Last FM, I am MadeUpNameNumber36, that is their call.

Section 2 would be statements of the form, On Iziblog, I read and trust sir_quirky_k. On the BBC, I trust Paddy O'Connell. On OldKidsTvRPG, I read and trust Father Flump. These relationships would be updated as and when relationships on the main site changed, but only if there was a corresponding section 1 entry.

Effectively, what we have is a file that forms a Friend of a Friend statement. This isn't as trivial an observation as it might sound, because there is already a well-defined structure for FOAF files. While the format we're proposing here need not follow FOAF conventions, we would not re-invent the wheel lightly. It may well be necessary to enhance the existing FOAF definition, and this should be done through consultation rather than by each implementation imposing its own syntax.

Now, how would this system work? Suppose that we come to read and trust another Iziblog account, Vaguelyperfect. Our computer goes off to query VaugelyPerfect's file, and finds that they are also Ganord on OldKidsTvRPG. With this information, it is easier to make connections between various systems.

The little example above only uses section 1 data. Section 2 could be used to determine a popular-with-friends metric, or to traverse from one node to another without depending on a particular system; for instance, if we wanted to get in contact with someone whose wallet we'd found, when all we had was their 3-2-1 Online handel.

The data must be stored as a human-readable XML file. Indeed, the data must be stored as a human-editable XML file, so that the person can verify that their information is as they want it to be. It's not clear whether performance would be best served by splitting into two files, a short Section 1 file of identities and a longer Section 2 file of relationships; or having a Section 2 for each Section 1 entry; or combining both sections into one file. So long as all the information is accessible from the Section 1 entry, that doesn't particularly matter.

It is both necessary and required for the data to be in a secure space of the user's choosing, and not of the service's choice. We would very strongly argue against the web service keeping the Section 2 data on its own servers. Decentralisation is a necessary evil, for reasons we'll come to later. FTP, or secure FTP, is a mature technology that may prove sufficiently secure for this purpose. It is probably going to be necessary to use Javascript to perform updates, and for the user to authenticate the change with their password, but there must always be an option for the more paranoid user to update the file by hand, using code generated automatically - or even code they've written themself.

Each service will need to know the location of the files it should be updating, and to which it should refer incoming requests. This should be determined by the internet supplier and customer, in the same way that the user's email address is determined. Something along the lines of http://foaf.isp.net/username/ is our concept. When signing up to a new application, it should be a case of asking for username, password, email address, and (optional) FOAF location. Savvy ISPs will allocate a distinct username and password to the FOAF account, ensuring that it can only update its own area; and will employ content filtering to ensure that it can only contain XML files (or whatever format the standard follows). We freely admit that the security model needs further work.

Earlier, we alluded to a reason why Mr. Fitzpatrick's proposal won't work. It's certain to be broken by commercial pressures from people who don't fully understand it. We've chronicled at some length the way his second creation, Livejournal, has been broken by people who didn't share the same vision as the site's customers. Mark Kraft has already given a comprehensive history of Freevote, Mr. Fitzpatrick's first idea.

And it's happened to his third idea, YADIS, precisely because it's been forced onto customers by managers interested in the bottom line. For instance, every Livejournal customer receives a YADIS account. And every Livejournal customer has been deemed to approve a number of third-party sites by authentication through YADIS. The only way a Livejournal customer can prevent this authentication is to completely disable YADIS; in turn, that task can only be done by slinging money to the new owners, and creating a style of one's own - not a task for the faint of heart. It's a literal example of identity theft, it's a gross abuse of the system, and there is more than a little irony that the flaws inherent in Mr. Fitzpatrick's third significant project should be exposed by someone else trying to make a few pennies out of his second.

We never trusted YADIS in the first place, and would go so far as to suggest that any system that is inoperable without YADIS's approximation to authentication is insufficiently secure, and needs to be taken back to the drawing board for redesign. (Indeed, that link reminds us of Mr. Fitzpatrick's sell-em-short-and-flee approach to privacy.)

Given Mr. Fitzpatrick's previous record of failure, we must approach his ideas with great care. We see the storage of the data as key. Anyone can access it, it is intended to be publicly readable, but it must be possible for the user to control exactly what goes into the file.

The proposal outlined in this document relies on each person having a secure space of their own. By designing the system to be decentralised, and by using a limited-access account, it's possible to ensure that the damage from a break-in is greatly minimised. If a YADIS server is compromised - it hasn't happened yet, but it's inevitable that a YADIS server *will* be compromised - then every account stored there will be (at best) suspect. At worst, valueless. When Livejournal was compromised last year, every account stood to lose its identity, through no fault of the account holder. The files required under the plan outlined here can be stored on a common-or-garden web server, and securing those is not difficult.

On this point, we completely disagree with Mr. Fitzpatrick, who proposes (in goal 1b) a primarily-centralised system. There are other points of critique, but none so fundamental to our view of his proposal. We are not entirely convinced that finding all incoming edges (goal 2b) should be handled by this system, rather than the application where the relationship was created. We believe that the remaining goals - particularly the user interface in section 3 - are worthwhile, and suggest that human-readable XML is an excellent method of meeting goal 3c (portablity of graph data) while adding an additional benefit of ensuring the data's accuracy. We strongly disagree with the start of Mr. Fitzpatrick's assumption 4, that a single format is a bad idea, but the weight of reasoned argument is against him, and he appears to contradict himself before the end of the paragraph.

The reason why the method outlined here won't be implemented is as simple as it is depressing: no-one makes money out of it, and everyone loses a little. In the profit-driven web, the only things that get done are those that make a fast return on investment. Things that are designed for resilience simply don't happen; bodges, such as YADIS, gain traction far in excess of their quality.

| Permanent link