Databases, The End User Experience

« previous entry | next entry »
Jul. 29th, 2010 | 10:11 am

Does it matter if the end user knows what the database is?

Recently I got a wonderful view of a database from the end user perspective.

While I was traveling I had found a restaurant where I had decided to let friends who live locally know where I was at. Part way through my food I got a message from a local friend that said "Don't eat there, their food always makes people sick!"

"Always" is a word that I would think would be a little too strong when applied to a restaurant, right?

Nope, the next day I got to feel the full truth of the word.

A couple of days later I am telling some friends about this and a local asked me "Where was this, I want to avoid them." I didn't get asked this question once, I got it asked a dozen times.

I don't know where the place is. Why is that? Because the system I was using lost the entire day worth of my data. I don't know how often they loose data, but from asking a few other folks it appeared to be that it is more frequent then not.

It came up in casual conversation the other day that the site had moved off Postgres to another system recently. Which suddenly made everything make sense, because the particular solution they moved too is not very durable.

We talk about databases being "transactional" or not. We talk about them being "durable". What matters in the end, to me as an end user, is that when I put my data in a system, I want a confirmation that the system stored it. I don't want to retype my data, and I don't want to collect it again. If I was the operator for the site? I certainly wouldn't want to be losing my users data.

In the MySQL world? MyISAM is the most abandoned storage engine in the stack. People will pick it initially because it is fast, but the first time they discover data corruption or have to deal with multiple hours of recovery time they quickly move away from it.

As an operator I wouldn't want to be having to explain to my users or my boss, that we had to wait 12 hours until the database recovered itself (or that it had corrupted itself). "Transactional" systems know how to handle recovery. People will wave their arms about and talk about disk controllers, disk failures, etc... That is hand waving. A properly configured system is redundant and sure it can be hit by lighting, but the real issue is most likely going to be that a plug gets pulled or a program crashes.

I look at, and even work with, some of the "no-sql" solutions. Some of them I recommend, and other's of them I don't. I look at scale out needs, usage patterns, and a wide variety of other details.

As end user though?

I would like to know that my data was stored, and that I will reliably be able to retrieve it when I want. I don't like outages. Of the services online that I pay for or that I have integrated into my life? I can't imagine wanting to deal with a system which was unreliable. A free service which does not work most of the time, is not free. It will consume my time whenever it is not available.

There is an end user experience for the database, site operators ought to remember this.

Link | Leave a comment | Share

Comments {4}

Jay

(no subject)

from: jayp39
date: Jul. 30th, 2010 05:26 am (UTC)
Link

I would very much like to hear which nosql products you recommend and which ones you don't recommend, as we're in the middle of porting some of our data over to one of them. :)

It's worth pointing out that of course if your SQL database is having trouble keeping up with load or concurrency requirements, you will also have reliability issues. Maybe user data doesn't get lost once it's written, but maybe it doesn't get written at all because your DB stopped accepting connections for a little while because it ran out of connections due to locking/concurrency issues. It is, unfortunately, all about trade offs. I'm hoping this move will relieve some pressure from our DB and between the two products we will be losing less user data than we have been recently. :)

Reply | Thread

Mark Atwood

(no subject)

from: fallenpegasus
date: Jul. 30th, 2010 08:21 pm (UTC)
Link

One of the things I took away from NoSQL Live in Boston is that the standard "MySQL master/slave with replication" is even worse than most of the NoSQL solutions out their already, not even having an "eventual consistancy" guarantee, instead having a "wishful consistancy".

On the other hand, some of the very popular NoSQL solutions right now are really bad. The one that you mentioned without naming, 10gen's MongoDB underlying Foursquare, makes no attempt at all to be durable.

I sometimes quip that MySQL 3.11 became popular a decade ago because it fit very well to quickly and poorly written PHP3 apps. MongoDB may be the modern version of that, fitting very well to quickly and poorly written Ruby apps.

But I fear that because 10gen MongoDB is the one that has VC mindshare over other, technically better, document stores and other NoSQL stores, it might win on the "worse is better" principle.

Reply | Thread

dormando

(no subject)

from: dormando
date: Jul. 30th, 2010 10:40 pm (UTC)
Link

It really bothers me when data disappears, it bothers you, but I wonder if the masses will ever care.

Facebook loses data often... temporarily, sometimes permanently. twitter eats things rarely but does. A lot of these users have never really used much of a dynamic/social webservice before. Most of them are coming from microsoft products which lose as a guarantee.

Hell, facebook threw out my whole fucking profile on purpose with barely any warning. Didn't seem to slow their growth rate.

Do users actually care more about the service being generally available and fast, more than they care about their data being there tomorrow? :/

Reply | Thread

harryh

(no subject)

from: harryh
date: Jul. 31st, 2010 09:32 pm (UTC)
Link

Hi,

If you are, in fact, referring to foursquare when you say "Because the system I was using lost the entire day worth of my data." I'd be very very interested in hearing more specifics about the problem you ran into. While we do store checkin data in MongoDB, we are paranoid so we store a copy of the data to PostgreSQL as well. I just ran an integrity check on your account and am seeing no differences between the data in Mongo and Postgres.

Scaling a system growing as rapidly as foursquare is very complicated, requiring careful judgement about many engineering tradeoffs. Rest assured though that we take data integrity very very seriously.

-harryh, server lead @ foursquare

Reply | Thread