How would the compiler handle multiple loop iterations updating the same location in memory? For example, if the last line were "result += value," would we still be able to parallelize this without extra code?
@c0d3r The assumption/restraint is that each loop iteration must be independent. This means that they cannot be accessing some sort of shared memory because then it could not be purely parallelized.
This example generate AVX instructions since parallel semantics also generates AVX instructions
I have some thoughts derived from this example. As we use parallel primitives like forall, we are shifting the load of parallelization from application programmers to parallel library programmers, since the application programmers get the primitives for free.
In this course, are we more focused on programming parallel applications based on the primitives, or more on programming the primitives (e.g. forall) themselves?
When we say vector instructions, my understanding is that we are referring to instructions designed to operate on n-element data registers. Is this understanding correct? In addition, is it possible to concatenate n instructions into a vector to have vectorized vector instructions?
@haboric, your idea about vectorized vector instructions is interesting. As far as I know, instruction set is well defined and needs hardware support. Since you cannot modify the circuit in the runtime, it is not easy to have vectorized instructions.
Are there potential disadvantages to abstraction? While I think it is possible that abstraction can lead to some loss of performance compared to explicitly managing the processors, the complexity of the code with abstraction is significantly decreased, which outweighs the deficiencies.