Preferred Deposit Structures
While there is a great amount of flexibility in deposit structure in order to accommodate diverse workflows, MorphoSource has preferences and recommendations for what to include and not to include in terms of "raw" and "derivative" data. These recommendations are based on maximizing meaningful reproducibility and reuse potential while minimizing deposit storage requirements. When contributors pay for their data storage, MorphoSource is more open to inclusion of large, but rarely critical raw files, though we may still ask contributors to think about whether it is in their best interest to use their storage space on them. Whether contributors are paying for storage or not, when special conditions exist requiring more extensive and larger or more limited, derivative deposits, MorphoSource will accept these if the contributors make a strong case for deviating from the preferred structures described below.
There is a lot of information on this page, and if you are feeling overwhelmed you may want to first review this page on Example Deposit Structures.
We describe preferences here in terms of an anticipated derivative chain while being somewhat ambiguous about whether data are technically "raw" or "derived"
PRIMARY data: the "least processed" or "closest to raw" data deposited.
SECONDARY data: the first derivative of the primary data
TERTIARY data: derivatives of the secondary data.
WHAT ARE ACCEPTABLE DEVIATIONS FROM THE PREFERRED DEPOSIT STRUCTURES?
It is very often the case that between data considered to be preferred-PRIMARY and preferred-SECONDARY in our schema below, there will be a lot of intermediates (see this example). In these scenarios the SECONDARY file is more precisely referred to as a more TERTIARY derivative in the workflow, but this is fine. The main concern is when preferred primary or secondary datasets are lacking such that a more derived file type is filling that role instead of its more appropriate tertiary role.
WHAT WILL HAPPEN IF MY DEPOSITS DEVIATE FROM THE PREFERRED STRUCTURES IN UNACCEPTABLE WAYS?
Most likely a MorphoSource representative will contact you for justification or revision.
Modality | Non-preferred PRIMARY data | Rationale for non-prefered PRIMARY data | preferred PRIMARY data | non-preferred SECONDARY data | preferred/acceptable SECONDARY data | TERTIARY data |
---|---|---|---|---|---|---|
Ct/MRI scans | Too raw:
Too derived:
Other:
| Too raw: These files (1) are very large and may be 2-5 times the size the "preferred" image stacks; (2) raw scanner output may often include multiple specimens that were scanned together for efficiency; (3) scanner raw data cannot be visualized 3-dimensionally without further processing and critical metadata values to allow successful processing are not reliably available; (4) our user communities do not request these files or (as far as we know) work with them aside from when they first process scanner output into image stacks. Too derived: These files have been processed too much to effectively communicate: (1) the quality or limitations of the raw data and more primary derivatives, (2) to have very good reuse potential Other: Reconstructed image stacks that (1) include lots of "empty space" are a waste of server space and should be cropped to minimize this prior to upload; (2) include multiple specimens break the data model of MorphoSource and are forbidden. (3) Proprietary formats have poor preservation and reuse potential. Their use also deepens inequities between users who can afford expensive software and those who cannot. |
| Too derived:
Other:
|
|
|
Photogrammetry | Too raw:
Too derived:
| Too raw: (1) raw format uncropped digital photographs may take up 2 orders of magnitude more space than "preferred" compressed, cropped images with no significant effect on quality of 3D models. Too derived: These files have been processed too much to effectively communicate: (1) the quality or limitations of the raw data and more primary derivatives, (2) to have very good reuse potential, including (3) regenerating 3D models from photo collections with updated algorithms or simply to check reproducibility. |
| Too derived:
Other:
|
|
|
Surface scans |
| Proprietary formats have poor preservation and reuse potential. Their use also deepens inequities between users who can afford expensive software and those who cannot. |
| Too derived:
Other:
|
|
|