May 13th, 2004


ApacheBench testing of XML Parsing

So, I've been working on a project called PeopleAggregator, and we've been talking about integrating with a lot of different platforms, among them Drupal. (For the record, this is completely unrelated to the MT stuff that went on today. I may write on that later, but really, everyone else has said what I would in a million different ways.) Anyway, we were talking about RAP and how it's too bulky and slow to work for what we need.

So, we got a guy on the team - Joel De Gan, who's working on the PeoplesDNS project for us, and he offered to write us a parser. This is going to be a replacement for RAP, for those of us who can't deal with the slowness of RAP.

Now, I don't know much about RAP. And I don't know much about PHP, or parsing XML, or really anything - I pick up the bits I need to know as I go along. So I'm just kind of standing on the sidelines, but today, I got a demo of what Joel's parser can do.

LiveJournal FOAF files are typically big. Mine is no exception - over 100 friends, random contact data, etc. All in all, a 40KB document about me. I want to parse this data. So I attempt to using both RAP and Joel's parser.

To alleviate network traffic conditions, I copy the file I want locally. To simulate the action of opening a file and reading it, I did keep it on the webserver, so I will admit there may be some kind of bias in that, but I used the exact same method to open the file in both cases (fopen) so I don't think that's an issue that would cause any major difference. I also disabled all printed output.

Anyway, I used this file to check the parsers. Using ab (apache benchmarking utility - fetches a page a bunch of times and tells you how long it took). Using a 50 request check, I got averages on the two parsing utilities:

Joel De Gan's XML parser, parses data into a multileveled array as displayed at (source available):

Requests per second: 11.25 [#/sec] (mean)
Time per request: 88.92 [ms] (mean)
Time per request: 88.92 [ms] (mean, across all concurrent requests)

(Full Stats)

RAP, parses into RDF models. (source, + RAP. The parser isn't actually here):

Requests per second: 1.35 [#/sec] (mean)
Time per request: 739.82 [ms] (mean)
Time per request: 739.82 [ms] (mean, across all concurrent requests)

(Full Stats)

So, we've got a parser that to a guy like me seems simpler to use (advanced data structures are part of the limited experience I did get from LiveJournal), is lightweight (one file, as opposed to 256 in RAP), and faster by an order of magnitude.

That, to me, sounds like a winner. Props to Joel for his great work. His next step is to implement OWL capabilities into RDF parsing, and that's going to kick even more ass. As Eric said at one point about this: "Be still my beating heart."