?

Log in

No account? Create an account

Social Networks, Databases, Brad's post

« previous entry | next entry »
Aug. 20th, 2007 | 02:51 pm

Finally got around to reading Brad's article on Social Networks.

http://bradfitz.com/social-graph-problem/

I am not going to comment too much on his ideas. I thought he presented them in a very straight forward manner and I can notfind much of a reason to disagree with what he has said. I have concerns about how to control the information, but part of me just thinks that we shouldn't be too concerned with the privacy issue. Sign up and use it if you want to, otherwise reenter your information into each system you want to use.

Ease of use wins over privacy for a lot of people.

When I implemented Zoo for Slashdot back in 2003 (or was it 2002?) I made a point of making all of the social graph public:
http://slashdot.org/~krow/zoo/

You can even "pull" the data via rss:
http://slashdot.org/~krow/zoo/rss

I had hoped that people would find that interesting enough to build applications around, but Slashdot doesn't have the sort of leverage in the consumer market to make that happen. One thing you can note about Slashdot though, is that it exposes the entire range of relationships:

Friends -> People you like
Foe -> Someone you dislike like

And then the perception of the relationship:
Fans -> People who like you (sometimes known as stalkers)
Freaks -> People who dislike you

The "Freaks" part is still unusual for social systems, since almost none of the systems understand the concept of perception. I find that I still have to explain this to people :)

I have been thinking more about the concept of mapping these relationships. Brad's paper talks about they why, but the how to is more my thing. A simple system to say "give me all of user A" is pretty simple. Any database that can handle a lot of rows, can handle this. Mapping the relationship though is a bit different.

Graphs are not really native to relational databases. They can be made to work in them, but it is far from perfect (and some kudo's to Oracle for their CONNECT BY implementation).

Friendster during its days as a six degree site, developed a graph engine for the storing and retrieving of this sort of data (and they eventual made this a storage engine under MySQL). Whether an engine like this needs to be a part of MySQL is open for debate, but what is needed is an open source graph engine that can handle the data.

It should be query-able for searches on the relationships. It will also need to be designed to be/have:

  • Distributed (multiple nodes)
  • Highly available
  • Be Able to Globally Replicate.

    The last piece I believe is important, since I can see sites who build tools wanting to be able to efficiently suck up the content from the system as it comes in. Sure, it could be done via "web pings" or polling, but that won't work for the largest systems. I also like the idea of many sites having the "central data". One thing which has bothered me about Wikipedia and other "shared" sorts of sites is the ability to replicate and fork the site. While forking is understood in the Open Source world, I think the concept is not understood just yet in the collaborative web projects.

    Distributed is a given. We don't live in a day and age where we can build "super" single computers to handle everything. Shared nothing architectures have worked very well for the web, and this sort of methodology is the clear winner.
  • Link | Leave a comment | Share

    Comments {8}

    Egor Egorov

    (no subject)

    from: egorfine
    date: Aug. 20th, 2007 10:46 pm (UTC)
    Link

    Is there any implementations of graph database, well, at least approaching the mentioned functionality? or just ANY open source graph storage implementations?

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Aug. 21st, 2007 01:14 am (UTC)
    Link

    I am not familiar with a commercial version. Open sourcewise I don't believe one exists.

    Reply | Parent | Thread

    Mark Atwood

    (no subject)

    from: fallenpegasus
    date: Aug. 21st, 2007 01:19 am (UTC)
    Link

    The slashdot links are really slow.

    Is the Zoo function hard on the database?

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Aug. 21st, 2007 01:46 am (UTC)
    Link

    No. The portion of the user object which contains FOF information is serialized into a blob and restored as needed.

    There is a table representation, but it is just there to rebuild the user object if relationships changed (when a user changes a state all possible users that could be effected are marked dirty and a task comes along and creates new objects).

    The code is pretty much unchanged since I wrote it.

    I also find it funny to look at, since when I look at Brad's Gearman stuff I see the same elements :)

    Reply | Parent | Thread

    (no subject)

    from: jamesd
    date: Aug. 21st, 2007 04:27 am (UTC)
    Link

    You'd be wrong about at least some shared sites no understanding forking. But now look one step beyond: what are the interests of the hosting body or company and its incentives?

    You'll find your answer there if you look a the history of how "ownership" of collaborative works has changed over time, from the owners of the collaborative works to the owners of the host. Earlier this year the Foundation changed its charter to eliminate the class of authors as members with voting rights. Now ask why the only common name for wikipedia the work is Wikipedia the name trademarked by the Foundation that hosts it ... and controls what name must be used for the work on its servers. It's following the usual path of taking control over time as the money starts to be significant.

    It is still possible to fork, though - you can technically still get the database... but without the user information you need to fork the history and let people continue to contribute at the fork. Though that could be worked around via notification edits on original user pages to prove the link between two accounts on the different sites.

    Reply | Thread

    Brian "Krow" Aker

    (no subject)

    from: krow
    date: Aug. 21st, 2007 05:06 pm (UTC)
    Link

    I am thinking of forking in a positive manner, not in a negative. This is something that I don't believe is understood well.

    So the revision history of Wikipedia is not available? That would not surprise me. Notice that in my thoughts on how this should work for social networks, I have included a "replication" component. AKA lets let people hook a pipe up to the changes so there are complete copies of the revisions available to anyone who wants them.

    Reply | Parent | Thread

    (no subject)

    from: jamesd
    date: Aug. 21st, 2007 09:04 pm (UTC)
    Link

    I also think of it as a positive event.

    At the moment the full revision history of articles in the encyclopedia is available, so it is still possible to produce a fork. You can't get the user database, though, so porting the users is a greater hassle.

    I don't currently know of even a complete (all revision history) mirror.

    Reply | Parent | Thread

    Lover of Ideas

    Corrections... :-)

    from: omnifarious
    date: Aug. 21st, 2007 09:18 pm (UTC)
    Link

    Foe -> Someone you dislike
    Freak -> Someone who dislikes you

    Reply | Thread