- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the master branch of scanpy.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Hi, everyone:
Many users probably do not rely on pp.normalize_total for downstream analysis, but I found a strange default behavior that I think is worth mentioning.
pp.normalize_total() normalized my .layers['counts'] as well
The documentation is a bit murky; not sure if that is the expected behavior when layer is unspecified, but
such default behavior would undermine anyone who wishes to save the count information before RPKM normalization.
Minimal code sample (that we can copy&paste without having any data)
# Your code here
adata = sc.datasets.pbmc3k()
adata.layers['counts'] = adata.X
cell = adata.obs.index[1]
adata.var['mt'] = adata.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
print("Run 1: initial values after simple processing: ")
print('sum of count layer in designated cell: ', adata[cell,:].layers['counts'].sum())
print('obs[total_counts] value in cell: ', adata[cell,:].obs['total_counts'][0])
print('.X.sum() value in cell: ', adata[cell,:].X.sum())
print('sum of count layer of MALAT1 in cell: ', adata[cell,'MALAT1'].layers['counts'])
print('.X value of MALAT1 in cell: ', adata[cell,'MALAT1'].X)
print("\nRun 2: after sc.pp.normalize_total: ")
sc.pp.normalize_total(adata, target_sum=1e4)
print('sum of count layer in designated cell: ', adata[cell,:].layers['counts'].sum()) # Note that this changed too
print('obs[total_counts] value in cell: ', adata[cell,:].obs['total_counts'][0])
print('.X.sum() value in cell: ', adata[cell,:].X.sum())
print('sum of count layer of MALAT1 in cell: ', adata[cell,'MALAT1'].layers['counts'])
print('.X value of MALAT1 in cell: ', adata[cell,'MALAT1'].X)
adata = sc.datasets.pbmc3k()
adata.layers['counts'] = adata.X
cell = adata.obs.index[1]
adata.var['mt'] = adata.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
print("\nRun 3: normalization, specifing argument layer=None")
sc.pp.normalize_total(adata, target_sum=1e4, layer = None)
print('sum of count layer in designated cell: ', adata[cell,:].layers['counts'].sum())
print('obs[total_counts] value in cell: ', adata[cell,:].obs['total_counts'][0])
print('.X.sum() value in cell: ', adata[cell,:].X.sum())
print('sum of count layer of MALAT1 in cell: ', adata[cell,'MALAT1'].layers['counts'])
print('.X value of MALAT1 in cell: ', adata[cell,'MALAT1'].X)
#Output:
Run 1: initial values after simple processing:
sum of count layer in designated cell: 4903.0
obs[total_counts] value in cell: 4903.0
.X.sum() value in cell: 4903.0
sum of count layer of MALAT1 in cell: (0, 0) 142.0
.X value of MALAT1 in cell: (0, 0) 142.0
Run 2: after sc.pp.normalize_total:
normalizing counts per cell
finished (0:00:00)
sum of count layer in designated cell: 10000.049
obs[total_counts] value in cell: 4903.0
.X.sum() value in cell: 10000.049
sum of count layer of MALAT1 in cell: (0, 0) 289.61862
.X value of MALAT1 in cell: (0, 0) 289.61862
Run 3: normalization, specifing argument layer=None
normalizing counts per cell
finished (0:00:00)
sum of count layer in designated cell: 10000.049
obs[total_counts] value in cell: 4903.0
.X.sum() value in cell: 10000.049
sum of count layer of MALAT1 in cell: (0, 0) 289.61862
.X value of MALAT1 in cell: (0, 0) 289.61862
Versions
[Paste the output of scanpy.logging.print_versions() leaving a blank line after the details tag]
anndata 0.8.0
scanpy 1.9.1
PIL 9.2.0
anndata2ri 1.1
annoy NA
backcall 0.2.0
backports NA
bbknn NA
beta_ufunc NA
binom_ufunc NA
cffi 1.15.1
cloudpickle 2.2.0
colorama 0.4.6
cycler 0.10.0
cython_runtime NA
cytoolz 0.12.0
dask 2022.02.0
dateutil 2.8.2
debugpy 1.6.3
decorator 5.1.1
defusedxml 0.7.1
deprecated 1.2.13
entrypoints 0.4
fsspec 2022.11.0
future_fstrings NA
google NA
h5py 3.7.0
igraph 0.9.1
ipykernel 6.14.0
ipython_genutils 0.2.0
ipywidgets 8.0.2
jedi 0.18.1
jinja2 3.1.2
joblib 1.2.0
kiwisolver 1.4.4
leidenalg 0.8.10
llvmlite 0.39.1
louvain 0.7.2
markupsafe 2.1.1
matplotlib 3.5.3
matplotlib_inline 0.1.6
mpl_toolkits NA
natsort 8.2.0
nbinom_ufunc NA
numba 0.56.3
numpy 1.21.6
packaging 21.3
pandas 1.3.5
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prompt_toolkit 3.0.31
psutil 5.9.3
ptyprocess 0.7.0
pycparser 2.21
pydev_ipython NA
pydevconsole NA
pydevd 2.8.0
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.13.0
pynndescent 0.5.7
pyparsing 3.0.9
pytz 2022.5
pytz_deprecation_shim NA
rpy2 3.5.1
scib 1.0.4
scipy 1.7.3
seaborn 0.12.1
session_info 1.0.0
six 1.16.0
sklearn 1.0.2
statsmodels 0.13.2
storemagic NA
texttable 1.6.4
threadpoolctl 3.1.0
tlz 0.12.0
toolz 0.12.0
tornado 6.2
tqdm 4.64.1
traitlets 5.5.0
typing_extensions NA
tzlocal NA
umap 0.5.3
wcwidth 0.2.5
wrapt 1.14.1
yaml 6.0
zipp NA
zmq 24.0.1
zope NA
IPython 7.33.0
jupyter_client 7.4.4
jupyter_core 4.11.1
notebook 6.5.1
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) [GCC 9.4.0]
Linux-5.4.0-131-generic-x86_64-with-debian-buster-sid
Session information updated at 2022-12-28 13:52