What the code above does can be described with the pseudocode:
> for 8-element chunks in x
> load 8-element chunk into 256-bit variable origx
> instantiate 8-element chunk to store results
> calculate numerators and denominators of the 8-element chunk
> save results into the original array
Specifically, __mm256_mul_ps means vectorized multiplication, and __mm256_div_ps means vectorized division, and __mm256_add_ps means vectorized summation.
Note that AVX intrinsics, like _mm256_load_ps(&x[i]), assume the address, i.e. x + i, is 32-byte-aligned; otherwise, there will be a segment fault.
Similarly, for SSE intrinsics, like _mm_load_ps, the address should be 16 bytes aligned.
x + i