Previous | Next --- Slide 61 of 66
Back to Lecture Thumbnails

Can someone explain the middle part of left bottom figure. Why an automatically generated schedule without autotuning can beat a 3-day autotuning schedule? I don't think it is coincidence (for there are 3 days for tuning).


It's a tiny benchmark, so that's noise in measurement (the benchmark completes in a very small amount of time). The two data points would be the same if the measurement harness ran the two codes for longer. I cannot recall the specific measurement details but this slide shows preliminary results that were recorded via average (or min) of (I think) 5 runs.


Is the reason why the auto-scheduled versions of LOCAL_LAPLACIAN and CAMERA are worse than the human versions the same as the reason why the humans were able to beat the auto-scheduler in MAXFILTER on the next slide? (I.e. the auto-scheduler hasn't implemented an optimization that the human has done.)

I'm also curious about what the auto-scheduler did that makes LOCAL_LAPLACIAN run slower on 12 threads than it does on 6 theads.


Is the preprint of this paper available somewhere?


Yes, it's now available here: