On Fri, Nov 4, 2011 at 7:23 PM, John Tamplin <email@example.com>
On Fri, Nov 4, 2011 at 6:22 PM, Glenn Maynard <firstname.lastname@example.org>
No, because decoding a non-Unicode encoding requires table lookups.Â UTF-8 requires multiple branches per byte.Â strnlen can be optimized to less than one branch per byte (typically; glibc's optimization is somewhat more complex than that), and no memory access aside from linear access to the string itself.
The way I usually write it is one switch of the byte anded with 0xF8, so there is one branch (aside from validation checks, which you still have to do when computing the length). ÂSingle-byte encodings can obviously be done in constant time, but I wouldn't expect them to be used much anyway.
You need an exit branch, too.Â You don't need any of that, or validation, or codepoint reconstruction, just to find the first null terminator.
Anyhow, the point was that strnlen is extremely fast.Â strnlen() processes about 2GB/sec on my system on data in cache.Â 16-bit equivalents for UTF-16 are also fast.Â I don't think having a separate call to find the null terminator is an actual performance issue.