UPDATE:
This PR now implements a number of quality of life changes and solves issue #31. The proposed multi-atom solute changes will be implemented in another PR. See PR #72 for the new changes!
Todos
Notable points that this PR has either accomplished or will accomplish.
- [x] add new testing data with a multi-atom solute
- [x] remove foolish internal numbering scheme of solvated_atoms
- [x] allow solutes to also act as solvents
- [x] replace all column name strings with variables stored in a column_names.py file
Status
Description
This is intended to be a major PR to handle multi-atom solutes. It relates to issues #47, #31, #66, and #58.
I would like to propose the following outline of new functionality. In this description, I will focus on the outward facing API. I'll use the somewhat trivial case of water as an example.
-
Solution
will be renamed to Solute
. All references to solute
in the current documentation will be renamed to solvated atom
or solvated atoms
. I think this better captures what the Solute
class really is, especially as we expand to multi-atom Solute
s.
-
The default initializer for Solute
will take only a single atom per residue. It will not support multiple identical atoms on a residue. This will be handled by the more general case. As a result, instantiating a Solute for a single atom remains the same. (note that I have already fixed the case with self-solvation identified in issue #31)
water = u.select_atoms(...)
water_O = u.select_atoms(...)
water_O_solute = Solute(water_O, {"water": water})
- Additional initializers will be added to instantiate a
Solute
, these will support multi-atom solutes.
The first will allow the user to stitch together multiple solutes to create a new solute.
water_H1 = u.select_atoms(...)
water_H2 = u.select_atoms(...)
solute_O = Solute(water_O, {"water": water})
solute_H1 = Solute(water_H1, {"water": water})
solute_H2 = Solute(water_H2, {"water": water})
multi_atom_solute = Solute.from_solutes([solute_O, solute_H1, solute_H2]) # maybe this should be a dict?
The second will allow users to simply instantiate a solute from an entire residue (or part of a residue). There may be technical challenges here so this behavior is not guaranteed.
multi_atom_solute = Solute.from_residue(water, {"water": water})
-
To support this, the solvation_data
dataframe will have two additional columns added, a "residue" column and a "solute_name" column. All analysis classes will be refactored to operate on the "residue" column rather than the "solvated_atom" column. This will make no difference for single-atom solutes but will allow the analysis classes to generalize easily. I'm not completely sure the "solute_name" column is necessary, but it would be convenient to have.
-
When a multi-atom solute is created all of the solvation_data
dataframes from each constituent single-atom solute will be merged together. The "residue" column will group together solvated atoms on the same residue such that the analysis classes can operate on the whole solute. The API for accessing the residence classes will be identical.
multi_atom_solute.coordination_number["water"] # valid property
- We will retain all of the single atom
Solute
s as a property of the multi-atom Solute
. This would amount to a rough doubling of the memory footprint, but it would make follow up analysis easier. I'm a bit torn here and there may be a better way.
>>> print(water.atoms) # what should this be called?
>>> [solute_O, solute_H1, solute_H2] # maybe this should be a dict?
For a single atom solute the atoms
list would still be present but the data within would be identical to the solvation_data
of the solute itself. Single atom solutes are now just a special case of multi-atom solutes.
water_O_solute.atoms[0].solvation_data = water_O_solute.solvation_data
I'm sure there are many things I am not considering that will come up later, but as a start, I think this plan will allow the package to be generalized with maximum code reuse. I'd love feedback or suggestions on any aspect of the outline above.
Todos
Notable points that this PR has either accomplished or will accomplish.
- [x] add new testing data with a multi-atom solute
- [x] remove foolish internal numbering scheme of solvated_atoms
- [x] allow solutes to also act as solvents
- [x] replace all column name strings with variables stored in a column_names.py file
- [ ] make solutions composable so that multi-atom solutes can be constructed systematically
- [ ] put guardrails in place to prevent misuse
Status
enhancement core high-priority