Compressed KTX load is extremely slow.
I have asked this question first at StackOverflow.com .Got no answer there.
So the problem is that I tried to switch from DDS to KTX compressed textures. KTX(ETC1/ETC2) .I used Khronos libktx to load the files.The difference vs DDS (sing NVidia DDS loader) is huge!
On twitter I asked Khronos about this issue and they advised to check if my GPU supports KTX uncompression on which I really have no idea(Using OpenGL4.3 on NVidia Quadro4000 ).So any one here has a clue where is my problem.Or maybe KTX is indeed so slow! I event stripped off libktx code from needless branching but haven't solved the problem.To me it seem to be somewhere on the driver level.
I don't hang out at stackoverflow.
The first question is "are you comparing apples with apples?" That is are the images in the KTX and DDS files compressed the same way? I suspect the answer is no and that you are comparing a DDS file containing images compressed with one of the DXTC variants against a KTX file containing images compressed with ETC. I have been told by the NVIDIA OpenGL driver team that the Quadro 4000 does not support ETC in hardware while it does support DXTC. This means the ETC-compressed images will be decompressed by the OpenGL driver in software then loaded into GPU memory while the DXTC-compressed images will simply be loaded into GPU memory. I believe that is the source of the performance difference you are observing.
To truly compare the performance of the DDS and KTX file formats you should create a KTX file containing DXTC-compressed images or a DDS file containing ETC-compressed images. I do not know if the latter is possible. Unfortunately toktx does not support converting DDS to KTX at this time. However the source is available and the underlying ktxWriteKTXF function in libktx accepts data in any format known to OpenGL so it would not be difficult to add the feature.
In the interest of making ETC ubiquitous, the working groups made a concious decision to let older hardware provide support with software decompression with the understanding that it could lead to poor comparisons like this.
Where is the needless branching in the libktx code that you stripped off?
First,thanks for the clarification that is is actually done in the driver for my card.Is there a list of GPUs that actually support it on the hardware level?Does Grid or Tesla cards support it?Do Kepler hardware cards support KTX ?For the "needless branching" - I removed all sort of safety check conditions as in my case I was looking to optimize the loading process.For example I constrained mipmap levels to be only one as in my scenario I need no mipmapping.
According to my sources at NVIDIA the only NVIDIA products with ETC2 decoding in hardware are the Logan family products. Those also have ASTC decoding in hardware.
Originally Posted by sasmaster
Sorry it took me so long to track down the information.