Skip to content

OneSdsc usability: specifying locations (popeye/SDSC) by kwarg, rest caching disabled by default#121

Open
grg2rsr wants to merge 4 commits into
developfrom
OneSDSC_locations
Open

OneSdsc usability: specifying locations (popeye/SDSC) by kwarg, rest caching disabled by default#121
grg2rsr wants to merge 4 commits into
developfrom
OneSDSC_locations

Conversation

@grg2rsr
Copy link
Copy Markdown
Contributor

@grg2rsr grg2rsr commented Jan 15, 2026

The OneSdsc class does, as it is now, not work out of the box for datauser on SDSC because of differences in the default cache dir.

This patch:

  • exposes a new keyword argument to select the location SDSC vs popeye, and handles the setting of the correct cache dir
  • disables rest caching by default. Currently, cache_rest=None has to be passed when using OneSdsc on popeye (as users on popeye do not have write permission on the cache directory). Datauser on SDSC does, and rest caching can be enabled by setting a kwarg explicitly if it is required.

@grg2rsr grg2rsr requested a review from k1o0 January 15, 2026 11:10
…tantiate an incompatible combination of location and cache_dir kwargs
Comment thread deploy/iblsdsc.py
# Ensure parquet tables downloaded to separate location to the dataset repo
kwargs['tables_dir'] = one.params.get_cache_dir() # by default this is user downloads
kwargs["tables_dir"] = one.params.get_cache_dir() # by default this is user downloads
if disable_rest_caching:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to overload the original kwarg rather than adding a new one to avoid parameter conflicts. You could simply add cache_rest=None to the init signature.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously, rest caching needed to be disabled explicitly in case of use on popeye because of lack of writing permissions. This extra keyword was intended for a) not interfering with backward compatibility, and b) making common and working default behavior for both SDSC/datauser usecase and popeye, while still giving the user the option to diverge from this default is needed

Comment thread deploy/iblsdsc.py
raise ValueError("both location and cache dir are specified, and they are incompatible")
cache_dir = CACHE_DIR_SDSC

def __init__(self, *args, cache_dir=CACHE_DIR, **kwargs):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you've changed the default to popeye. You should make others aware of this.

Also all previous code will raise a value error because neither cache_dir, nor location are None by default so if they pass in any other cache dir path without passing location=None they're get a value error. Personally I find having two related parameters confusing and a little redundant given that you've already renamed the cache dir variables to clearer names. If this location field is not used for anything else I'd rather remove it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, and indeed you spotted a crucial break of downward compatibility. And I agree with you that the situation with the two parameters is confusing and not ideal. However, this was done on purpose in order to guarantee downward compatibility and not break previous code ... let me explain:

Previously, users would use the cache_dir argument, which also previously defaulted to popeye, see below the code on the main branch, so this is in place. The error checks after were put in, in order to prevent someone in the future to mess it up and use the arguments in an unintended way.

So this entire confusing situation of the two kwargs was in order to keep things running as they are. Ideally we would only have a location argument, but in order to keep downward compatibility we need to keep the cache_dir setting to popeye.

What then happened, is that by introducing too many checks I actually introduced a case in which users can end up with a ValueError, by passing a cache_dir for SDSC and not passing a location argument, which in the past nobody of course did.

The proposed fix would be to default the location to None. This way, old code will work (as people will have passed either passed cache_dir of SDSC, popeye or None (=popeye). New users can ignore the cache_dir argument and simply use the location which will take care of everything.

this is on the main branch

CACHE_DIR = Path('/mnt/sdceph/users/ibl/data')
CACHE_DIR_FI = Path(SDSC_ROOT_PATH)


class OneSdsc(OneAlyx):

    def __init__(self, *args, cache_dir=CACHE_DIR, **kwargs):
        if not kwargs.get('tables_dir'):
            # Ensure parquet tables downloaded to separate location to the dataset repo
            kwargs['tables_dir'] = one.params.get_cache_dir()  # by default this is user downloads
        super().__init__(*args, cache_dir=cache_dir,

@oliche
Copy link
Copy Markdown
Member

oliche commented May 26, 2026

The original intention was to use this only for Popeye, as the cache directory is read only. Is there a problem using just the conventional ONE on the datauser machine when doing maintenance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants