Because, processing of tails in vectored (tiled) operation is very big and common part. Therefore, to facilitate dicision the problem. Better way to introduce new feature: auto shaffle with scater/gather for memory access, and permit unaligment access to memory or to do over auto shaffle

float8 r1.
int t = calculate.

1. auto shaffle
r1.s(0,3,4,t,ZERO,ONE,MINUSONE,NONE) = *(float8*)y;

2. auto shaffle over const vector
r1.s({0,3,4,t,ZERO,ONE,MINUSONE,NONE}) = *(float8*)y;

3. auto shaffle over variable vector array
const int8 v[3] ={ {0,1,2,3,4,5,6,7}, //no tail
{0,1,2,3,4,5,ZERO,ZERO},//simple tail
{0,3,4,1,ZERO,ONE,MINUSONE,NONE}//complex tail};
for(....)
//decision of k=1..3
r1.s(v[k]) = *(float8*)y;

as
const number is like y.s0, y.s1... elements
const ZERO means that this element is set into zero, and no read from memory for this elements
const ONE means that this element is set into 1, and no read from memory for this elements
const MINUS_ONE means that this element is set into -1, and no read from memory for this elements
const NONE means that this element is not change, and no read/write from/to memory for this elements
the elements can be in variables (see t)