Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ jobs:
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10.20"
- name: Install scalr requirements
run: |
pip install -r requirements.txt
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-to-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.9"
python-version: "3.10.20"
- name: Install pypa/build
run: >-
python3 -m
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/run_isort.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ jobs:

steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
- name: Set up Python 3.10.20
uses: actions/setup-python@v2
with:
python-version: 3.9
python-version: "3.10.20"
- name: Install isort
run: pip install isort
- name: Run isort
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/run_pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9"]
python-version: ["3.10.20"]

steps:
- uses: actions/checkout@v3
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,10 @@
## Pre-requisites and installation scaLR


- ScaLR can be installed using git or pip. It is tested in Python 3.10 and it is recommended to use that environment.
- ScaLR can be installed using git or pip. It is tested in Python 3.10.20 and it is recommended to use that environment.

```
conda create -n scaLR_env python=3.10
conda create -n scaLR_env python=3.10.20

conda activate scaLR_env
```
Expand Down Expand Up @@ -374,5 +374,4 @@ Performs evaluation of best model trained on user-defined metrics on the test se

## Citation

Jogani Saiyam, Anand Santosh Pol, Mayur Prajapati, Amit Samal, Kriti Bhatia, Jayendra Parmar, Urvik Patel, Falak Shah, Nisarg Vyas, and Saurabh Gupta. "scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery." bioRxiv (2024): 2024-09.

Jogani, S., Pol, A. S., Prajapati, M., Samal, A., Bhatia, K., Parmar, J., ... & Gupta, S. (2025). scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery. Briefings in Bioinformatics, 26(3), bbaf243.
7 changes: 3 additions & 4 deletions config/config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Config file for pipeline run.

# DEVICE SETUP.
device: 'cuda'
device: 'cpu'

# EXPERIMENT.
experiment:
Expand All @@ -16,8 +16,7 @@ data:
num_workers: 1

train_val_test:
full_datapath: '/path/to/anndata.h5ad'

full_datapath: 'path/to/adata.h5ad'
splitter_config:
name: GroupSplitter
params:
Expand All @@ -35,7 +34,7 @@ data:
# params:
# **args

target: Cell_Type
target: cell_type


# FEATURE SELECTION.
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@ build-backend = "hatchling.build"

name = "pyscaLR"
version = "1.1.0"
requires-python = ">=3.10"
requires-python = ">=3.10.20"
authors = [
{ name="Infocusp", email="saurabh@infocusp.com" },
]
description = "scaLR: Single cell analysis using low resource."
readme = "README.md"
classifiers = [
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Operating System :: OS Independent",
"Intended Audience :: Science/Research"
]
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
anndata==0.10.9
anndata>=0.11.2,<0.12
isort==5.13.2
loky==3.4.1
memory-profiler==0.61.0
Expand All @@ -15,4 +15,4 @@ tensorboard==2.17.0
toml==0.10.2
torch==2.4.1 --index-url https://download.pytorch.org/whl/cu118
tqdm==4.66.5
yapf==0.40.2
yapf==0.40.2
1 change: 0 additions & 1 deletion scalr/analysis/dge_lmem.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@

from anndata import AnnData
from anndata import ImplicitModificationWarning
import anndata as ad
from anndata.experimental import AnnCollection
from joblib import delayed
from joblib import Parallel
Expand Down
7 changes: 3 additions & 4 deletions scalr/analysis/dge_pseudobulk.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
from os import path
from typing import Optional, Tuple, Union

from anndata import AnnData
import anndata as ad
from anndata import AnnData, concat
from anndata.experimental import AnnCollection
import matplotlib.pyplot as plt
import numpy as np
Expand Down Expand Up @@ -94,15 +93,15 @@ def _make_design_matrix(self, adata: AnnData, cell_type: str):
for sum_sample in condition_subset.obs[self.sum_column].unique():
sum_subset = condition_subset[condition_subset.obs[
self.sum_column] == sum_sample]
subdata = ad.AnnData(
subdata = AnnData(
X=sum_subset[:].X.sum(axis=0).reshape(
1, len(sum_subset.var_names)),
var=DataFrame(index=sum_subset.var_names),
obs=DataFrame(index=[f'{sum_sample}_{condition}']))
subdata.obs[self.design_factor_no_undrscr] = [condition]
design_matrix_list.append(subdata)

design_matrix = ad.concat(design_matrix_list)
design_matrix = concat(design_matrix_list)
return design_matrix

def get_differential_expression_results(self, design_matrix: AnnData,
Expand Down
2 changes: 1 addition & 1 deletion scalr/nn/dataloader/simple_metadataloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def collate_fn(
x = torch.cat(
(x,
torch.as_tensor(self.metadata_onehotencoder[col].transform(
adata_batch.obs[col].values.reshape(-1, 1)).A,
adata_batch.obs[col].values.reshape(-1, 1)).toarray(),
dtype=torch.float32)),
dim=1)
return x, y
Expand Down
1 change: 0 additions & 1 deletion scalr/nn/dataloader/test_simple_metadataloader.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
'''This is a test file for simplemetadataloader.'''

import anndata
import numpy as np
import pandas as pd

Expand Down
2 changes: 1 addition & 1 deletion scalr/utils/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def get_random_samples(
random_background_data = data[random_indices].X

if not isinstance(random_background_data, np.ndarray):
random_background_data = random_background_data.A
random_background_data = random_background_data.toarray()

random_background_data = torch.as_tensor(random_background_data,
dtype=torch.float32)
Expand Down
6 changes: 3 additions & 3 deletions scalr/utils/file_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
from typing import Union

from anndata import AnnData
import anndata as ad
from anndata.experimental import AnnCollection
from anndata.io import read_h5ad
from joblib import delayed
from joblib import Parallel
import numpy as np
Expand Down Expand Up @@ -142,7 +142,7 @@ def transform_and_write_data(data: AnnData, chunk_number: int):
if transform:
data = AnnData(data.X, obs=data.obs, var=data.var)
if not isinstance(data.X, np.ndarray):
data.X = data.X.A
data.X = data.X.toarray()
data.X = transform(data.X)

write_data(data, path.join(dirpath, f'{chunk_number}.h5ad'))
Expand Down Expand Up @@ -262,7 +262,7 @@ def read_csv(filepath: str, index_col: int = 0) -> pd.DataFrame:

def read_anndata(filepath: str, backed: str = 'r') -> AnnData:
"""This file returns the Anndata object from filepath."""
data = ad.read_h5ad(filepath, backed=backed)
data = read_h5ad(filepath, backed=backed)
return data


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,13 @@

from anndata import AnnData
from anndata import ImplicitModificationWarning
import anndata as ad
from anndata.experimental import AnnCollection
from anndata.io import read_h5ad
from joblib import Parallel, delayed
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import DataFrame
import scanpy as sc
from scipy.optimize import OptimizeWarning
import statsmodels.api as sm
import statsmodels.formula.api as smf
Expand All @@ -31,7 +30,7 @@


def main(config):
test_data = sc.read_h5ad(config['full_datapath'], backed='r')
test_data = read_h5ad(config['full_datapath'], backed='r')
dirpath = config['dirpath']
dge_type = config['dge_type']
assert (dge_type == 'DgeLMEM') and ('lmem_params' in config), (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,21 @@
from typing import Optional, Union, Tuple
import yaml

import anndata as ad
from anndata import AnnData
from anndata.experimental import AnnCollection
from anndata.io import read_h5ad
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import DataFrame
from pydeseq2.dds import DeseqDataSet
from pydeseq2.ds import DeseqStats
import scanpy as sc

from scalr.analysis import DgePseudoBulk


def main(config):
test_data = sc.read_h5ad(config['full_datapath'], backed='r')
test_data = read_h5ad(config['full_datapath'], backed='r')
dirpath = config['dirpath']
dge_type = config['dge_type']
assert (dge_type == 'DgePseudoBulk') and ('psedobulk_params' in config), (
Expand Down
6 changes: 3 additions & 3 deletions tutorials/pipeline/config_celltype.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Config file for pipeline run for cell type classification.

# DEVICE SETUP.
device: 'cuda'
device: 'cpu'

# EXPERIMENT.
experiment:
Expand All @@ -15,8 +15,8 @@ data:
sample_chunksize: 20000

train_val_test:
full_datapath: 'data/modified_adata.h5ad'
num_workers: 2
full_datapath: 'path/to/adata.h5ad'
num_workers: 4

splitter_config:
name: GroupSplitter
Expand Down
4 changes: 2 additions & 2 deletions tutorials/pipeline/config_clinical.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Config file for pipeline run for clinical condition specific biomarker identification.

# DEVICE SETUP.
device: 'cuda'
device: 'cpu'

# EXPERIMENT.
experiment:
Expand All @@ -15,7 +15,7 @@ data:
sample_chunksize: 20000

train_val_test:
full_datapath: 'data/modified_adata.h5ad'
full_datapath: 'path/to/adata.h5ad'
num_workers: 2

splitter_config:
Expand Down
8 changes: 4 additions & 4 deletions tutorials/pipeline/scalr_pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -358,7 +358,7 @@
"outputs": [],
"source": [
"#Gene expression values of first 5 cells and 10 genes.\n",
"adata.X[:5,:10].A"
"adata.X[:5,:10]\n"
]
},
{
Expand All @@ -385,7 +385,7 @@
"source": [
"# Verifying normalized values in X\n",
"# Getting the sum of gene expression values for the first 10 cells (should be floating-point values).\n",
"adata.X[:10,:].A.sum(axis=1)"
"adata.X[:10,:].sum(axis=1)"
]
},
{
Expand All @@ -411,8 +411,8 @@
"outputs": [],
"source": [
"# Getting the maximum and minimum gene expression values for the first 1000 cells.\n",
"max_val = np.max(adata.X[:1000, :].A)\n",
"min_val = np.min(adata.X[:1000, :].A)\n",
"max_val = np.max(adata.X[:1000, :])\n",
"min_val = np.min(adata.X[:1000, :])\n",
"print(f'Max value : {max_val} | Min value : {min_val}')\n",
"# Raising a warning if the values are outside the 0-10 range\n",
"if max_val > 10 or min_val < 0:\n",
Expand Down
Loading
Loading