* added bitwise operation based sqrt16
- replacement for fastled, it is about 10% slower for numbers smaller 128 but faster for larger numbers. speed difference is irrelevant to WLED but it saves some flash.
* updated to 32bit, improved for typical WLED use
- making it 32bits allows for larger numbers
- added another initial condition check for medium sized numbers
- increased the "small number" optimization to larger numbers: the function is currently only used to calculate sqrt(x^2+y^2) which even for small segments is larger than the initially used 64, so optimizing for 1024 makes more sense, although the value is arbitrarily chosen