Fix: Make concurrent CSV Read function really thread safe#6694
Fix: Make concurrent CSV Read function really thread safe#6694FSchumacher wants to merge 1 commit intoapache:masterfrom
Conversation
| while (true) { | ||
| int current = nextRow.get(); | ||
| int next = (current + 1) % size; |
There was a problem hiding this comment.
WDYT of going with AtomicLong.getAndIncrement and java.lang.Math#floorMod(long, int)?
Then it could work without a loop.
E.g.
private final AtomicLong nextRow = new AtomicLong();
long row = nextRow.getAndIncrement();
int result = (int) Math.floorMod(row, size);There was a problem hiding this comment.
Theoretical we could still run into an overflow and in that case, we would get wrong values ;)
And probably more to the point: the name nextRow is misleading. It should probably be nextRowToReturn or such, as it contains the number, the function should return and the function sets it to the value the next user should read.
Therefore I think the loop is not that badly written.
If we want to take the AtomicLong-road, we should probably start with a value of -1 instead of 0, so that the first row is still read. But in that case we would have to check all uses of the value of nextRow outside of the changed function.
There was a problem hiding this comment.
Math.floorMod resolves the overflow, so I don't understand what you mean by "wrong values"
There was a problem hiding this comment.
floorMod should work well within the range of long. But when our long wraps around, the calculation will be wrong.
It might be clearer, when we use a short instead of a long.
There we have (x % 255) % FILE_SIZE as our current value. Further assume that FILE_SIZE is 200.
Then for x == 254 we get 54 and for x == 255 we get 0 (wrap around of short).
The same will happen (really unlikely, as a long can get quite big) with an int or a long. (That might will happen with a long after 286.967.718 years, when we read about 1000 rows a second (for that complete period) :) ) (and that will be one wrong range, that we skip, in the above short example 55..255)
There was a problem hiding this comment.
Please clarify what exactly would go wrong.
0 is a valid value, isn't it?
There was a problem hiding this comment.
Nothing wrong with the zero per se, but it should have been the 55 in the example I gave.
And we still would have the other problem, that we would have to either decrement the value before returning or start with -1 and look at other uses of nextRow, which doesn't feel right to me, either.
| nextRow = 0; | ||
| int size = fileData.size(); | ||
| if (size <= 0) { | ||
| throw new IllegalStateException("CSV file has no data rows"); |
There was a problem hiding this comment.
Please include the file name, otherwise it would be hard to troubleshoot. An alternative option would be returning 0 and making sure the caller would throw an appropriate error with the file name of the problematic file.
Current implementation could hand out the same row to more than one thread at the same time and could loose track when wrapping around. To fix apache#6673 change to a AtomicInteger and a compareAndSet operation to allow thread safe wrap around. To guard against an endless loop in the case of an empty CSV file, we now throw an ISE in such a case. This fixes apache#6673
2db5f65 to
fff7414
Compare
Current implementation could hand out the same row to more than one thread at the same time and could loose track when wrapping around.
To fix #6673 change to a AtomicInteger and a compareAndSet operation to allow thread safe wrap around. To guard against an endless loop in the case of an empty CSV file, we now throw an ISE in such a case.
This fixes #6673
Description
Motivation and Context
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: