?

Log in

No account? Create an account

AIO, Sparc/Intel, Archive, maybe posix after all...

« previous entry | next entry »
Sep. 1st, 2007 | 10:56 am

I'm looking at the final patch for adding Asynchronous IO to the
Archive engine in MySQL. The current patch relies on an AIO library I
wrote myself a while ago to play around with the problem. It works,
but it adds more ongoing threads to MySQL, and I have mixed feelings
about doing that in user space (though its all the rage among engine
writers).

Last night I got it in my head to try out the posix AIO callers. In
part because I've never used them, and in part because I like the
thought of reusing code that someone else wrote (aka support my own
AIO means well... supporting it...).

I decided to do a test using 1024 byte buffers on a 1 gig file:

First Intel:

read()
real 0m0.789s
user 0m0.041s
sys 0m0.744s

aio()
real 0m2.487s
user 0m3.067s
sys 0m1.798s


Now Sun:


read()
real 0m21.764s
user 0m1.597s
sys 0m20.121s

aio()
real 0m31.098s
user 0m38.586s
sys 0m42.436s


Eww.... AIO is slower, much slower with 1024 byte buffers. Archive
uses a much larger read buffer, using its default (48K) things change
the other way around:

Intel:

read()
real 0m0.789s
user 0m0.053s
sys 0m0.733s

aio()
real 0m0.360s
user 0m0.382s
sys 0m0.335s


Sparc:

read()
real 0m15.493s
user 0m0.065s
sys 0m8.493s

aio()
real 0m15.242s
user 0m15.511s
sys 0m9.625s


So the AIO system does perform better with a larger buffer on Linux,
and for some reason the Intel 8way is faster then the Sparc T1000
(though they have nearly identical disks). I reran the test several
times, and for reads Linux was always faster with AIO, and for
Solaris it was sometimes faster and sometimes slower. This surprises
me and makes me think there must be some secret to enabling AIO on
Solaris that I am not aware of.

For Solaris I tried upping the buffer to 128K, and the AIO and normal
tied. Moving to something ridiculous like 512K, just made it slower
again (which is not a surprise). Something has to be wrong, or AIO on
Solaris must just suck. Most likely we can blame this on me though.

I am aware of the aio_read() posix calls over the kernel
implementation in Linux. I hate writing to OS specifics though. I
know that each vendor likes to trick out their own particular calls
because they disagree with the standard, but for application design
this creates a portability nightmare.

I need to benchmark my homebuilt solution now against the above. If
it turns out that my homebuilt is either slower, or only a little
faster, I am just going to go with the posix AIO calls and assume
vendors are putting effort into making AIO calls faster. I've got
better things to do with my time then hack AIO implementations. I can
make it an option that can be enabled/disabled by users so they can
try it out themselves.

Link | Leave a comment | Share

Comments {10}

Kytty

(no subject)

from: kytty
date: Sep. 1st, 2007 07:40 pm (UTC)
Link

This is all still quite a bit over my head but I enjoyed reading the process you went through.

Reply | Thread

(no subject)

from: andythms
date: Sep. 1st, 2007 08:55 pm (UTC)
Link

You mentioned the Sun and Intel kit have nearly identical disks, so I'm wondering how each disk's write cache and read cache is configured? The Sun likely has the read cache enabled and write cache disabled, which is what UFS file systems need. (Quickest way to check: run "format -e", select each drive in turn and use the "cache" command.) Intel kit can vary, which might help explain the much slower Sun numbers you're seeing if the Intel disks have write cache enabled.

Reply | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Sep. 1st, 2007 09:21 pm (UTC)
Link

Hi!

The sun does indeed have both its read and write cache enabled. The Intel has write cache disabled (which is the norm for tests I do). Keep in mind that this is a read test, so I am not sure that the write_cache would make much of a difference.

Reply | Parent | Thread

(no subject)

from: andythms
date: Sep. 1st, 2007 10:06 pm (UTC)
Link

Ah, somehow I'd completely missed that this was about reading, so you make a very good point! Sorry about that..!

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Sep. 1st, 2007 10:24 pm (UTC)
Link

Anything is worth asking. I am still looking for a reason as to why Solaris was so slow.

Reply | Parent | Thread

peter_zaitsev

(no subject)

from: peter_zaitsev
date: Sep. 3rd, 2007 11:24 am (UTC)
Link

Check VMSTAT output for both of them (and of course make test longer so you can do it)

Solaris Loves to do prefetches even if you're doing random IO which can bring IO performance well down.

This is also why Direct IO on solaris is such major boost for Innodb.

Reply | Thread

(no subject)

from: jeffr_tech
date: Sep. 4th, 2007 05:04 am (UTC)
Link

Be weary of aio on linux. It can only actually do the io asynchronously in very few circumstances. The rest of the time it just issues synchronous io in the context of the caller, blocking them as if it were a normal read. One major caveat is that it only works with O_DIRECT files if you use the kaio interface. Using posix aio I believe just falls back to threads in user-space but I'm not certain about that.

The posix interface really sucks anyway. Every reasonable operating system has an alternative to avoid the overhead of posix.

Reply | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Sep. 4th, 2007 03:50 pm (UTC)
Link

Hi!

I've been finding that it does pretty well overall. The kernel is changing pretty rapidly around aio right now, so a lot of the literature available on the web (aka what Google drags up) is just wrong (much of what people think about O_DIRECT and AIO is out of date at this point). This misinformation is one of the strongest reasons I see to version the kernel.

I am not finding the posix API to be that horrible. I wouldn't touch the OS specific alternatives, I don't have the time to write a specific patch for every OS just because the vendors want to create some specific lock-in. I find it pretty obnoxious that they do this, instead spending the time to make the POSIX interface the best interface.

Reply | Parent | Thread

(no subject)

from: jeffr_tech
date: Sep. 4th, 2007 06:57 pm (UTC)
Link

The real problem with posix aio is aio_waitcomplete(). You have to specify a list of all aiocbs and the kernel has to scan that list. This is not only inconvenient but also inefficient. Various operating systems have addressed this in different ways. In FreeBSD you can use kqueue to wait for any aio completion and it sends the address of the completed cb along with the event.

Linux decided to make an entirely incompatible aio layer all together. The kernel aio interfaces definitely still only work asynchronously with O_DIRECT. There are patches floating around to support buffered io on some filesystems but they haven't been merged yet. This is definitely for the kernel aio.

The posix aio on linux uses a thread pool. You can look at glibc sources to verify this. This, of course, works with buffered io and without. It's more overhead than doing it properly in the kernel but it's also a lot simpler. I believe this is what solaris has switched to as well.

I understand not wanting to implement OS specific layers for all of these things. It's just a compromise. Do you want the absolute best performance? Or do you want good performance with lower development cost. It's a difficult balance to strike especially since you support so many operating systems.

Reply | Parent | Thread

Brian "Krow" Aker

(no subject)

from: krow
date: Sep. 4th, 2007 07:35 pm (UTC)
Link

Hi!

Yes, I was referring to the POSIX aio on Linux not requiring O_DIRECT. I've looked at the direct Linux Kernel API, but I am not likely to implement it anytime soon.

For me it is a bit about bang for the buck. Implementing the POSIX call gets me performance (most likely) on all platforms. I could cheery pick platforms to write faster IO implementations against, but my laundry list for features is long. I also know that over time the vendors are likely to improve their POSIX interface, so whatever work they do there I can harness.

Sun has its own thread library as well as the POSIX based pthread implementation. For a while the Solaris interface was more efficient, but today? Pthreads is just as good.

You can think of POSIX interface as leveraging the wisdom of the crowds. More people will look and test them, then the vendor specific interfaces.

Reply | Parent | Thread