Fixed #34 Customizing scipy's oaconvolve#35
Conversation
|
Review these changes at https://app.gitnotebooks.com/stumpy-dev/sliding_dot_product/pull/35 |
The challenger is the customized version of scipy's oaconvolve.
Observations:
For me, the important one is the first bullet point. Of the four optimization opportunities mentioned in this comment, I've addressed 1, 2, and 3 in this PR. The last item, which is about adjusting the number of multiplication for real-valued arrays, can be explored next. |
|
As a gentle reminder, even if we can do things faster, we will never (??) remove the public |
|
Good reminder. It makes sense!! |
|
|
||
|
|
||
| def test_oaconvolve_sdp_blocksize(): | ||
| from sdp.challenger_sdp import sliding_dot_product |
There was a problem hiding this comment.
This line needs to be modified if, at a later time, we decide to move the proposal to a new file (module).
@seanlaw |
There was a problem hiding this comment.
@NimaSarajpoor I've left some comments but would still like another pass after you've cleaned things up further
I do agree that, for the most part, things look clean. I think it still lacks clarity as to what is happening or why the logic is coded in this way
| if conv_block_size is None: | ||
| # Find optimal block_size based on m and n | ||
| if m >= n / 2: | ||
| conv_block_size = n # i.e. no blocking |
There was a problem hiding this comment.
Not clear why this condition is needed ?
|
|
||
| def setup(Q, T): | ||
| return | ||
| def _compute_block_size(m, n, conv_block_size=None): |
There was a problem hiding this comment.
According to my exploration, replacing this with scipy's function _calc_oa_lens results in a considerable performance hit. I am passing real=True in next_fast_len(math.ceil(opt_size), real=True) in the function here; however, the scipy's function (internally) uses the default real=False.
There was a problem hiding this comment.
I've checked the performance of the challenger with real=False to see its impact on the performance. As shown in the figure below, challenger (in which real=True is passed to next_fast_len) shows a better performance.
rm -rf sdp/__pycache__
./timing.py -timeout 1.0 -pmin 7 -pmax 20 pyfftw challenger challenger_real_false > timing.csv
rm -rf sdp/__pycache__


This PR is to address #34.