OneSdsc usability: specifying locations (popeye/SDSC) by kwarg, rest caching disabled by default#121
OneSdsc usability: specifying locations (popeye/SDSC) by kwarg, rest caching disabled by default#121grg2rsr wants to merge 4 commits into
Conversation
… use from popeye or from datauser/SDSC
…tantiate an incompatible combination of location and cache_dir kwargs
| # Ensure parquet tables downloaded to separate location to the dataset repo | ||
| kwargs['tables_dir'] = one.params.get_cache_dir() # by default this is user downloads | ||
| kwargs["tables_dir"] = one.params.get_cache_dir() # by default this is user downloads | ||
| if disable_rest_caching: |
There was a problem hiding this comment.
Better to overload the original kwarg rather than adding a new one to avoid parameter conflicts. You could simply add cache_rest=None to the init signature.
There was a problem hiding this comment.
previously, rest caching needed to be disabled explicitly in case of use on popeye because of lack of writing permissions. This extra keyword was intended for a) not interfering with backward compatibility, and b) making common and working default behavior for both SDSC/datauser usecase and popeye, while still giving the user the option to diverge from this default is needed
| raise ValueError("both location and cache dir are specified, and they are incompatible") | ||
| cache_dir = CACHE_DIR_SDSC | ||
|
|
||
| def __init__(self, *args, cache_dir=CACHE_DIR, **kwargs): |
There was a problem hiding this comment.
Looks like you've changed the default to popeye. You should make others aware of this.
Also all previous code will raise a value error because neither cache_dir, nor location are None by default so if they pass in any other cache dir path without passing location=None they're get a value error. Personally I find having two related parameters confusing and a little redundant given that you've already renamed the cache dir variables to clearer names. If this location field is not used for anything else I'd rather remove it
There was a problem hiding this comment.
Thanks for the comment, and indeed you spotted a crucial break of downward compatibility. And I agree with you that the situation with the two parameters is confusing and not ideal. However, this was done on purpose in order to guarantee downward compatibility and not break previous code ... let me explain:
Previously, users would use the cache_dir argument, which also previously defaulted to popeye, see below the code on the main branch, so this is in place. The error checks after were put in, in order to prevent someone in the future to mess it up and use the arguments in an unintended way.
So this entire confusing situation of the two kwargs was in order to keep things running as they are. Ideally we would only have a location argument, but in order to keep downward compatibility we need to keep the cache_dir setting to popeye.
What then happened, is that by introducing too many checks I actually introduced a case in which users can end up with a ValueError, by passing a cache_dir for SDSC and not passing a location argument, which in the past nobody of course did.
The proposed fix would be to default the location to None. This way, old code will work (as people will have passed either passed cache_dir of SDSC, popeye or None (=popeye). New users can ignore the cache_dir argument and simply use the location which will take care of everything.
this is on the main branch
CACHE_DIR = Path('/mnt/sdceph/users/ibl/data')
CACHE_DIR_FI = Path(SDSC_ROOT_PATH)
class OneSdsc(OneAlyx):
def __init__(self, *args, cache_dir=CACHE_DIR, **kwargs):
if not kwargs.get('tables_dir'):
# Ensure parquet tables downloaded to separate location to the dataset repo
kwargs['tables_dir'] = one.params.get_cache_dir() # by default this is user downloads
super().__init__(*args, cache_dir=cache_dir,
|
The original intention was to use this only for Popeye, as the cache directory is read only. Is there a problem using just the conventional ONE on the datauser machine when doing maintenance? |
The
OneSdscclass does, as it is now, not work out of the box for datauser on SDSC because of differences in the default cache dir.This patch:
SDSCvspopeye, and handles the setting of the correct cache dircache_rest=Nonehas to be passed when usingOneSdscon popeye (as users on popeye do not have write permission on the cache directory). Datauser on SDSC does, and rest caching can be enabled by setting a kwarg explicitly if it is required.