Organism Information - BioSample - NCBI (2023)

  • valid organism
  • organism warning
  • Metagenomas (microbiomas)
  • Assembled genomes in metagenome

valid organism

In general, NCBI BioSample requires that each record have only one organism, with a valid taxonomy name at the species level.

(Video) Webinar: Submitting BioSample Data to NCBI

In most cases, the binomial scientific name must be entered, with complete genus and species. This includes model organisms and species with well-known common names. Examples:

  • Homo sapiens instead of "human"
  • Mus musculus instead of "mouse"
  • Escherichia coli instead of "E. coli"

If the name is published and is in theNCBI Taxonomy Database, it will be processed automatically.

organism warning

If the name is not in theNCBI Taxonomy Database, you will see a message that says:

Warning: Submission processing may be delayed due to necessary curator review. Check the spelling of agency; the current information could not be resolved automatically and will require a taxonomy query.

First, check that you have spelled the agency name correctly. For example, "Homo spaiens" instead of "Homo sapiens". Make any necessary corrections and click "Continue" again to resubmit the page.

(Video) NCBI Minute: Finding Data for your Research Organism: Plants and RNA-Seq

If the name is spelled correctly, there are valid reasons for the name to trigger the Warning message, including:

  • A validly published name may not yet be in the NCBI taxonomy database. The database is authoritative but not exhaustive. We only add species names when we receive a sequence submission for the organism, so not all valid species are there.
  • You may be submitting a new taxonomy name or a new combination that is not yet published.
  • You may be submitting an organism that has identified a genus or higher taxa, but not a species.

If the submitted name is a valid name but is not yet in the NCBI taxonomy database, enter the binomial name and our taxonomists will add it to the database.

For unpublished or unidentified species, our taxonomists assign a provisional tax ID for the organism based on genus and strain or isolate name.

If your organism is unpublished or unidentified, be sure to include a unique identifier, such as isolate or strain name or a supporting sample number. The identifier must be unique among your samples and must not include the name of the organism or an abbreviation of the name of the organism. It must be a series of letters or numbers that serve as an identifier for your item, for example:

(Video) How to submit RNA seq raw reads data in NCBI | step by step guide

  • Staph sp.
  • cepa = abc123

For an organism not identified by genus, use the lowest taxon (phylum, class, order, or family) known to you.

For bacterial or archaeal taxa, add "bacteria" or "archaa" to the name of the organism, for example:

  • enterobacteria bacteria
  • Archeon Nanoarchaeales

For all other organisms, add "sp." to the name, for example:

  • A termite identified only in the family "Rhinotermitinae" but not in the genus would be entered as "Rhinotermitinae sp."

If you submit multiple "[taxon] sp." samples and are believed to be from more than one putative species containing multiple isolates or strains, please present them as "[taxon] sp. 1", "[taxon] sp. 2", etc., or with other unique identifiers to group them. The common format in the NCBI taxonomy ensures that they are unique and involves adding the senders initials and year, for example:

(Video) How to submit the Sequence Read Archive (SRA) data in NCBI

  • [taxon] sp. 1ABC-2021
  • [taxon] sp. 2ABC-2021

We consider any environmental or clinical sample that may contain multiple organisms to be a "metagenome." Metagenome samples are typically composed of microbial organisms, including but not limited to archaea, bacteria, and fungi. The terminology is partly historical, as the first instances of this type of sample were for genomic sequencing, but it now includes any sample of this type, regardless of the type of data to be generated. NCBI's "metagenome" taxonomy nodes can be thought of as meaning "microbiomes."

If you submit streams of such a sampler, you must use one of a special set of taxonomy nodes. Hemetagenome taxonomy nodesthey are under "unsorted sequences" as there is no specific lineage. These are mostly divided intoecological metagenomesyorganism metagenomes. These are created as needed, so not every conceivable type is present. Current practice is to use an existing node whenever possible and provide more detailed information in the source_isolation and/or host attribute. The names of the metagenomes reflect the source, not the organisms to be identified. You must use the same name regardless of the type of sequence you will use. For example, if you are using 16S RNA primers to target bacterial species or ITS primers to target eukaryotic species, the metagenome name to use remains the same.

Browse the list and use the taxonomy name that best describes your sample. Some judgment is required when choosing names and the intent of the study must be considered. There are minimal controls over the name you choose, to allow maximum flexibility, but it must be in the NCBI taxonomy database. Examples include:

  • A soil sample might use "soil metagenome" or perhaps "rhizome metagenome".
  • A plant sample would use "plant metagenome" or could use more specific names like "root metagenome" or "leaf metagenome" if only those tissues were sampled.
  • If the sample is from a specific organ of an animal, use a tissue-specific name when available, such as "skin metagenome" or "liver metagenome," or you can use one of the organism-specific names, such as "skin metagenome." mouse". " or "human metagenome", or the generic "insect metagenome" or "mollusc metagenome".
  • A goat rumen or mouse ceca sample would use "intestinal metagenome". In such cases, be sure to include the host organism in the "host" field.
  • Stool or stool samples would also use "gut metagenome" if the target of the study is the organism's gut microbiota. Alternatively, if the objective of the study is the bacterial community that develops in the eroded feces outside the animal, use the "faeces metagenome". In either case, provide the name of the originating organization in the host field.
  • Note that there are some commonly studied organism-specific intestinal metagenomes, including the "human intestinal metagenome", the "mouse intestinal metagenome", and others.
  • A cyanobacterial enrichment culture would normally use "freshwater metagenome".
  • An artificial community formed from a set of known organisms should use a "synthetic metagenome" as a test sample.

Assembled genomes in metagenome

Metagenome assembled (MAG) genomes represent individual organisms computationally assembled from samples containing a mixture of one or more organisms. Organism names for MAG are often assigned using clustering tools such as SILVA orGTDBthat may use unpublished taxonomic names. Convert unpublished names to equivalent NCBI taxonomic names. We want names that are taxonomically significant, at the lowest rank that is reliable (division, phylum, class, order, family, genus). Use "bacteria" or "archaeon" or "Eukaryota sp." if you don't have more information. Look at thisFrequently asked questionson submitting prokaryotic or eukaryotic MAGs to GenBank.

(Video) How to submit RNA-seq data to NCBI

MAGs also require a unique alphanumeric code to distinguish each organism. The identifier should be added as an isolated name, but we realize that each organism was computationally pooled and not isolated. The identifier must be unique and is usually the same as the sample_name. It should not include the name of the agency or an abbreviation of the name of the agency. It should be a series of letters or numbers that serve as an identifier for your organism assembly.

For any of the above cases where a warning message is received, you can click the "Continue" button again and the shipment will continue to the next step. In the final step of your submission, you will be notified again that the submission will be delayed for manual review. In most cases, this review takes approximately two business days.


1. Webinar: Finding Data for Your Research Organism
(National Library of Medicine)
2. NCBI Minute: How to Submit Your 16S rRNA Data to NCBI
(National Library of Medicine)
3. How to download sequencing data from SRA NCBI | Bioinformatics 101
4. NCBI Minute: A New Way to Prepare Genome Submissions Using Genome Workbench
(National Library of Medicine)
5. Webinar: A Practical Guide to NCBI BLAST
(National Library of Medicine)
6. NCBI Minute: Running the NCBI PGAP on Your Own Data
(National Library of Medicine)


Top Articles
Latest Posts
Article information

Author: Msgr. Refugio Daniel

Last Updated: 17/08/2023

Views: 6588

Rating: 4.3 / 5 (54 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Msgr. Refugio Daniel

Birthday: 1999-09-15

Address: 8416 Beatty Center, Derekfort, VA 72092-0500

Phone: +6838967160603

Job: Mining Executive

Hobby: Woodworking, Knitting, Fishing, Coffee roasting, Kayaking, Horseback riding, Kite flying

Introduction: My name is Msgr. Refugio Daniel, I am a fine, precious, encouraging, calm, glamorous, vivacious, friendly person who loves writing and wants to share my knowledge and understanding with you.