Does ATI or the iPad's GPU require 32-bit alignment or might something else be the problem?
Most cards are heavily optimized for floating point and may not support short or half float and the like in hardware, which would force the driver to vertex shade in software.