caedes' Notes

2015-03-23

Comparing Performance: stb_image vs libjpeg(-turbo), libpng and lodepng

Filed under: Linux,Programming — caedes @ 02:06
Tags: , , , , , , , , ,

I recently tried out Sean Barrett’s stb_image and was blown away by how fucking easy it is to use.
Integrating it into your project is trivial: Just add the header and somewhere do:

#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"

That’s all. (If you wanna use it in multiple files you just #include "stb_image.h" there without the #define.)

And the API is trivial too:

int width, height, bytesPerPixel;
unsigned char* pixeldata = stbi_load("bla.jpg", &width, &height, &bytesPerPixel, 0);
// if you have already read the image file data into a buffer:
unsigned char* pixeldata2 = stbi_load_from_memory(bufferWithImageData, bufferLength, &width, &height, &bytesPerPixel, 0);
if(pixeldata2 == NULL)
printf("Some error happened: %sn", stbi_failure_reason());

There’s also a simple callback-API which allows you to define some callbacks that stb_image will call to get the data, handy if you’re using some kind of virtual filesystem or want to load the data from .zip files or something.
And it supports lots of common image file types including JPEG, PNG, TGA, BMP, GIF and PSD.

So I wondered if there are any downsides regarding speed.
In short: (On my machine) it’s faster than libjpeg, a bit slower than libjpeg-turbo, twice as fast as lodepng (another one-file-png decoder which also has a nice API) and a bit slower than libpng. For smaller images stb_image’s performance is even closer to libpng/libjpeg-turbo. GCC produces faster code than Clang. All in all I find the performance acceptable and will use stb_image more in the future (my first “victim” was Yamagi Quake II).

The average times decoding a 4000x3000pixel image in milliseconds for GCC and clang with different optimization levels:

JPEG

libjpeg, libjpeg-turbo

I used libjpeg binaries from distributions, so compilers and optimization flags on my end didn’t make a difference.

  • Debian Wheezy’s libjpeg8 8d1-deb7u1, no turbo: 130ms
  • Ubuntu 14.04’s libjpeg-turbo8 1.3.0-0ubuntu2: 69ms

stb_image 2.02, using SSE intrinsics

  • clang -O0: 436ms
  • gcc -O0: 402ms
  • clang -O1: 179ms
  • gcc -O1: 97ms
  • clang -O2: 151ms
  • gcc -O2: 93ms
  • clang -O3: 150ms
  • gcc -O3: 85ms
  • gcc -O4: 85ms

Results for JPEG decoding

For JPEG, if you use clang stb_image is a bit slower than libjpeg (and a lot slower than libjpeg-turbo). If you use GCC (and at least -O1), the performance is between libjpeg and libjpeg-turbo.
Using optimization (at -O1 or more) yields significantly faster decoders than unoptimized (-O0) code (>4x as fast for GCC, almost 3x as fast for clang).

This also shows that GCC seems to optimize this much better than Clang.

So stb_image has competitive performance for loading jpegs.

Update: Test with a smaller image

I also did some tests with a 512x512pixel jpg image:

  • libjpeg-turbo: 3.21ms
  • stb clang -O0: 14.92ms
  • stb gcc -O0: 14.24ms
  • stb clang -O2: 5.19ms
  • stb gcc -O2: 3.72ms
  • stb gcc -O4: 3.33ms

libjpeg-turbo is still faster, but stb_image only takes about 16% (-O2) or 3% (-O4) longer – so it’s much closer than with the big image.

PNG

I converted the 4000x3000pixel JPEG used above to PNG with compressionlevel 9, using Gimp.
The PNG is pretty big, about 16MB.

libpng 1.2

I used Ubuntu 14.04’s libpng12 (1.2.50-1ubuntu2), so again the compiler and optimization flags didn’t matter.

  • libpng12: 293ms

stb_image 2.02

  • clang -O0: 905ms
  • gcc -O0: 923ms
  • clang -O1: 455ms
  • gcc -O1: 457ms
  • clang -O2: 432ms
  • gcc -O2: 408ms
  • clang -O3: 424ms
  • gcc -O3: 394ms
  • gcc -O4: 393ms

lodepng version 20150321

  • clang -O0: 1902ms
  • gcc -O0: 1862ms
  • clang -O1: 862ms
  • gcc -O1: 814ms
  • clang -O2: 698ms
  • gcc -O2: 680ms
  • clang -O3: 676ms
  • gcc -O3: 587ms
  • gcc -O4: 581ms

Results for PNG decoding:

  • stb_image is a lot faster than lodepng, with and without compiler optimization.
  • gcc produces faster code than clang, but the difference is smaller than in the JPEG case
  • stb_image/lodepng decoders built with with -O1 are more than twice as fast as ones built without optimization (-O0)
  • libpng is fastest, optimized stb_image takes about 33-40% longer, optimized lodepng takes about 100-130% longer
  • See below: For smaller images stb_image’s performance is much closer to libpng.

So, I think stb_image’s png decoding speed is still acceptable.. however png in general is kinda slow and should probably not be used for games if you have lots of (big) textures.
If you have the same picture as JPG and PNG (as I had in my tests), decoding from JPG (with stb_image) is more than 4x as fast as decoding from PNG (also stb_image; similar for libjpeg-turbo vs libpng).

If you need lossless quality and/or alpha, maybe just try zipping TGAs (stb_image can read TGA too!), if not JPG might be a good option – or even DDS or a similar format that is compressed and can be directly uploaded to the GPU (can be used with and without alpha channel). With DDS or similar you’ll get additional performance gains because the GPU/GPU driver don’t have to convert them to some internal format and don’t have to generate mipmaps (they’re included).
Rich Geldreich’s Crunch might be of interest: https://code.google.com/p/crunch/

Anyway, if PNGs work for you, using stb_image instead of libpng is feasible and might simplify both your code and your build process.

Update: Test with smaller images

I did some additional tests with a 512x512pixel png that has an alpha-channel, which is probably closer to game development requirements. Because it’s much faster to decode this, I ran 300 decode iterations instead of 100.
Furthermore, I only tested this with -O0 and -O2, which should be most relevant in practice (for debug and release builds).

  • libpng: 6.07ms
  • stb gcc -O0: 19.17ms
  • stb clang -O0: 19.96ms
  • stb gcc -O2: 6.53ms
  • stb clang -O2: 7.00ms
  • stb gcc -O4: 6.22ms
  • lodepng gcc -O0: 31.25ms
  • lodepng gcc -O2: 10.81ms

So while for the huge 24bit RGB png stb_image took about 33-40% longer to decode than libpng, for the small 32bit RGBA png it was less than 10% longer (in the optimized cases).

And some more for a 512×512 RGB picture without alpha-channel:

  • libpng: 5.00ms
  • stb gcc -O2: 4.99ms
  • stb gcc -O4: 4.69ms

In this case stb_image even is a bit faster than libpng!

How I tested:

I wrote a hacky test-program that loads an image file into a buffer and then measures how long it takes to decode that buffer 100 times with the tested codec and divided the result by 100, see imgLoadBench.c
I ran that 3 times in a row and used the best result.

I used a random 4000x3000pixel JPG (about 2.6MB) image taken with a digital camera.
For the png tests I converted it to png (about 16MB) with Gimp, using highest compression level (9).
(I also tried compression level 1 – encoding with that is faster and the resulting file is slightly bigger, but decoding actually takes longer.)

I used clang 3.6 1:3.6.1~svn232753-1~exp1 from http://llvm.org/apt/trusty/ llvm-toolchain-trusty-3.6/main and Ubuntu 14.04’s gcc 4.8 4.8.2-19ubuntu1.
Tests were executed on a Intel Haswell i7-4771 system running Linux Mint 17.1 x86_64 with Kernel 3.16.0-29-lowlatency #39-Ubuntu SMP PREEMPT.

Yeah, all this is not highly scientific, but should give a rough idea of the performance of stb_image and lodepng compared to the “normal” libjpeg, libjpeg-turbo and libpng.

stb_image: Sean Barrett’s stb_ libs on Github
lodepng: Lode Vandevenne’s LodePNG
libjpeg-turbo: Project Homepage
libpng: Project Homepage
RBDoom3BFG: I stole the code to use libpng and libjpeg for comparison there

imgLoadBench.c: My crappy test program

Cheers,
Daniel

Advertisements

3 Comments »

  1. A comparison without WebP and BC7 is poor. For game developrs, PNG is history with these 2 formats.

    Comment by SmallStepForMan — 2015-05-12 @ 09:37 | Reply

    • The main topic of this post was evaluating stb_image’s performance and stb_image supports neither WebP nor BC7.

      I don’t see what WebP should be good for here, as it’s not natively supported by GPUs and quite uncommon in general.
      I mentioned DDS and crunch as a side note; BC7/BPTC fits into the same category, so I guess it’s a good thing it’s mentioned in the comments now.
      While we’re at it, ASTC http://en.wikipedia.org/wiki/Adaptive_Scalable_Texture_Compression also looks interesting, but it seems like it’s not supported by desktop GPU drivers

      Comment by caedes — 2015-05-18 @ 18:19 | Reply

  2. […] However, compared to libpng and LodePNG both don’t compress very well – the resulting images are 29%-78% bigger. LodePNG on the other hand produces almost as good results as libpng (only 5%-8% bigger) and is significantly easier to use – integrating it in your project is easy (just drop the source and the header file to your project) and using it is about as easy as stbi_write_png(). (For image loading however I found stb_image much better than LodePNG, as it loads PNGs much faster, see my other article) […]

    Pingback by Comparing png compression ratios of stb_image_write, LodePNG, miniz and libpng | caedes' Notes — 2015-07-18 @ 04:29 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: