Just finished revamping my chemistry data readers for STM4. Ang again I'm thinking about why a lot of creativity is misplaced in inventing new, similar un-parseable file formats!
Here are the formats STM4 supports for now. After the list I try to collect some of the nice (groan!) features of each of them. Why chemistry programs developers do not try to converge on a single format?
- CHGCAR - VASP format containing also volume data
- Gaussian cube - Gaussian containing also volume data
- DCD - FMD, add to PDB trajectory data
- DL_POLY - DL_POLY HISTORY file
- FpStudio - FullProf Suite
- Gulp - GULP input file
- MOL - From MDL
- MOL2 - From Tripos
- PDB - Protein Data Bank
- PDB-Q - An old version of PDB
- POSCAR - VASP, also as concatenated file
- SHEL-X - Shel-X crystallography program
- Siesta - Siesta
- XDATCAR - Another VASP animated format
- XYZ - The simple xyz format, animated also
- XYZ plus unit cell - idem plus a supporting file containing the unit cell
And now some of the complains:
- CHGCAR - Could contain one or two sets of volumetric data. Why should be so difficult and unreliable to find the start of the second block? And why a division of the values by the cell volume based on the file name? And why no sensible extension to the file? And no atom type in the file.
- Gaussian cube - Measurement units: Angstrom or Bohr?
- DCD - It is a binary format, so almost works
- FpStudio - Why two structures in the same file with different methods to describe symmetry? Why contains rendering options mixed with structural options?
- Gulp - It is more a human readable input format
- MOL - At least it is documented, but why uses fixed width numeric fields?
- MOL2 - Documented
- PDB - Atom numbers in fixed width fields, number of atoms limited to 99999, creativity a go-go in the atom name field (obviously without putting the element type in the appropriate field)
- PDB-Q - A column is 10 bytes shorter, so another reader is needed
- POSCAR - Simple format, but the kind of atoms is not in the file
- SHEL-X - No big problems, except understanding symmetry definitions
- Siesta - No problems
- XDATCAR - No problems
- XYZ - No problem
- XYZ plus unit cell - No problem
The saga continues...
Labels: chemistry visualization, data formats