Import
Many projects start with imports, and target Exports, the former are detailed here.
For strategies on migrating whole projects see Migrate to TaxonWorks. This includes an overview of the many ways that data can be added to TaxonWorks.
Tips
If you're running TaxonWorks locally (i.e. not in production or on a sandbox) then you'll likely need to manually trigger processing of the background jobs created during import. See Delayed Jobs for details.
Batch loaders
There are various batch importers available within the UI. These are polished to differing degrees and have various benefits and limitations. The required format, and often an example spreadsheet, is provided in the UI. All batch loaders are two-step, allowing for (and requiring) a preview of results before inserting them into the database.
- To explore available batch loaders click on a Data card in the Hub. If batch loader(s) are available then then the batch load link will be enabled.
- Batch importers largely target tab-separated text files, though this is not exclusively the case.
- Notable batch loaders are found in the TaxonNames, Otus, and Sources data cards, though others exist.
- Explore various batch loaders (each data card highlighted in yellow has associated batch loaders at this writing).
Try a batch loader
In your test project,
- Go to the data tab
- Select the Otu Data card
- Click “batch load”
- See instructions in the UI for expected / accepted data types and format.
- Create your own file or use this test file Header column = otu_name Blank lines are skipped Tab-delimited format, UTF-8 encoding, Unix line-endings required
- Browse to your file to select it, click preview
- If data looks as expected, browse to select that file again and click create.
Batch loaders (as of March 2022) include:
- OTUs operational taxonomic units
- simple batch load
- data attributes
- simple batch file
- OTU with identifier batch load
- collecting events
- gpx (collecting events with georeferences)
- castor
- collection objects
- castor
- buffered strings
- descriptors
- qualitative descriptors
- modify gene descriptor
- sequences
- Genbank
- Genbank batch
- primers
- sources
- BibTeX
- taxon names
- simple
- castor
- asserted distributions
- simple
- namespaces
- simple
- sequence relationships
- primers batch
Darwin Core Archive (DwC-A) import
Checklist Data
To upload checklist data, this method supports simple and somewhat more complex taxon name lists. Below you will find examples to guide how to create your own datatset and datasets you can use to try in a sandbox.
Preparing a Checklist
- Please check the table below for terms (fields) the importer recognizes and whether or not certain fields are required or have dependencies (e. g. formatting, identifiers). This is the mapping step.
- Identifiers are required in the following columns for this method to work (
taxonID,acceptedNameUsageID,parentNameUsageID).- The
originalNameUsageIDcolumn must be present for the dataset to import. The software will generate the numbers for you for this column if you don't fill it out (it duplicates the number in the taxonID column).
- The
- Running the exact same dataset in twice will not duplicate names in the case where
- a) the
parentNameUsageIDis null AND - b) you use the
Settingsoption to match on existing names where theparentNameUsageIDis null. - IF
parentNameUsageIDis null and you do not use theSettingsoption, the names will be entered (again) as children ofRootand will say[GENUS Unspecified]. These would need to be cleaned up by hand after import.
- a) the
- Your data in your spreadsheet first goes through a
Stagingstep. You will be able to edit data in each cell at the point, if need be, before you click onImport. - Each name you want to import must have its own record row in your dataset. For example, if you will be including higher classification data, each of those higher taxa must have their own row. If not, your higher classification data for each taxa will not import.
- Your dataset needs to be in xlsx, comma (csv), or tab-separated (txt, tsv) format.
- For best results for how diacritics are handled (like umlauts or tildas), ensure your data are UTF-8 encoded.
Tips
- This method imports only. It does not update (e. g. fix typos) on a re-try or add more data to a given exiting object in the database.
- Case of
taxonRankvalues doesn't seem to matter. - IF you want to match on existing names, what matters are [TO BE VERIFIED]: only the
sciName+scientificNameAuthorship. (Note a change to higher classification doesn't seem to matter to making the match).
| Term | Mapping |
|---|---|
taxonID | REQUIRED - a unique identifier for the taxa in this record row |
parentNameUsageID | REQUIRED - a unique identifier for asserting the correct parent |
parentNameUsage | |
acceptedNameUsageID | REQUIRED - if the name is a valid one, this matches the taxonID |
scientificName | REQUIRED |
kingdom | |
class | |
order | |
family | |
genus | REQUIRED |
subgenus | |
specificEpithet | REQUIRED |
infraspecificEpithet | |
taxonRank | REQUIRED - Family, Genus, Tribe, Subtribe, Species, etc (not case sensitive) |
scientificNameAuthorship | REQUIRED* - Must provide IF you want to match on existing names in the db (and same format) |
originalNameUsageID | REQUIRED - Column must be present. IF all cells empty, software will populate them with taxonID at Staging step |
nomenclaturalCode | ICZN, ICN - This can be selected in the importer; does not have to be in the spreadsheet |
TW:TaxonNameClassification:Latinized:Gender | note maps directly to the TW datamodel; see TW:<data model>:... |
TW:TaxonNameClassification:Latinized:PartOfSpeech | note maps directly to the TW datamodel; see TW:<data model>:... |
TW:TaxonNameRelationship:incertae\_sedis\_in\_rank | note maps directly to the TW datamodel; see TW:<data model>:... |
TW:TaxonNameClassification:Iczn:Fossil | note maps directly to the TW datamodel; see TW:<data model>:... |
| | need to search codebase to see if these are supported on import | | taxonomicStatus | valid, incertae sedis, obsolete combination | | originalNameUsage | | cultivarEpithet | | nameAccordingTo | | nomenclaturalStatus | | taxonRemarks | | references |
The Checklist Importer
What follows are the simplest steps when uploading names into an empty database. It is possible to match on existing names in your TW project in the event you are importing children of those names, for example.
- From the
Tasklist selectDarwin Core Archive (DwC-A) import
- In the importer interface, enter a
Descriptionfor your dataset
Next, select the
Dataset type. In this case,ChecklistThen, select the relevant
Nomenclature codeOnce you prepare your dataset, click to upload it by picking or drag and drop the file.
- Depending on the file type (xlsx, csv, txt, tsv) you will need to verify the separator (delimiter) for the fields and strings. With xlsx files, the importer figures this out. With csv (comma) and txt (tab) you will get a pop-up asking you to confirm or pick the correct options.
- In either of these delimiter pop-ups, after you pick or verify, click
upload.
Checklist CSV file delimiter verification Checklist TXT file delimiter verification The software will
Stageyour data now (it will take a few seconds or a bit longer depending on the size of the import).
Staging step- Note at this point, you can sort on the columns and replace values in all or any of the cells if necessary (you cannot edit the header rows).
- Note your original dataset is stored permanently, but not with values you change after
Staging.
- Note your original dataset is stored permanently, but not with values you change after
- Next click
Import
- Names will import and you can click on
Browsefor a given row in your dataset to see the data in TW.
Browse after upload- If you get error messages, rows with errors don't upload. You can click where it says
Errorto get the error message.
- For some errors, you can fix them in the spreadsheet and then try to
Importthat row/s again. - For example, you might discover an error message unparsed tail for a given cell. Sometimes, it might indicate their is an encoding (diacritic) issue or a hidden character. Try retyping the value for that cell and then click to try re-import of that errored row.
- In the
Importpop-up, note you can selectRetry errored recordswhere you've changed the data in the relevant cells and then clickStart import.
- You can always download your original dataset.
Sample Datasets
We offer five different example datasets (in various file formats) differing in complexity and source (e. g. one of them is from the DwC-A file from a Plazi Treatment Bank Treatment). Please use them to try out the DwC-A Checklist Importer and as models for your own dataset tests and uploads.
Simplest Basic Checklist
This dataset inserts a genus and 5 species in that genus. We provide this sample dataset in 3 file formats, csv, txt, xlsx. It was used to upload names into an empty project (no records in the database).
Tips
Import Settings did not seem to matter in this case since we were not trying to macth on any existing names in the database.
A Published Genus with many new species
In this use case, we take advantage of the Darwin Core Archive formatted treatment files that Plazi produces when it pulls names out of existing published literature. With these treatment files you need to add or adjust very few fields (term) headers and the identifiers you need are already in place. This dataset adds 300 names, one new genus and 299 new children of that genus. We did test where the validly published genus was also NOT already in the database. We then also tested how to match on an existing Genus already in the database. See the process below.
If you are adding new children to an existing genus in the database, then be sure to
- Use the
Settingsoption to match on existing names in the database. Note well that in order to match on existing, thescientificNamestring andscientificNameAuthorshipin the dataset must match the database.
Here is one simple version (derived from Plazi Treatment Bank taxa.txt from inside the DwC-A file for a given treatment). This file will import 300 names. NOT all fields in this file are imported.
From the original taxa.txt file
- we removed all the synonyms, just leaving new species
- we added a row for the Genus, Galeopsomyia, to match the parent in the TW database
- in the genus row, we put a
1fortaxonID,acceptedNameUsageID, andoriginalNameUsageID. - in the
parentNameUsageIDcolumn we added a1for all the species - for the
scientificNameAuthorshipfor the genus row, we made sure to match the Author name as it appears in the database. - we edited the combinationAuthor field to match the paper (there was a parsing error in the Plazi Treatment which has been fixed)
Dataset
- So, if you have names to upload, it can pay to check Plazi Treatment Bank to see if they have already parsed the names of interest from that published literature.
To test the entire scenario, have a look at the Modified taxa.txt file and try using it to import (into a sandbox account). Columns not recognized by the importer will be ignored.
Note there are other usefule files in the Plazi Treatment DwC-A pkg
- The (references.txt) that specifies the page numbers for each new taxon name.
- With some work, we could adjust the importer to add or match on an existing source
- We could imaging, on import, adding a citation for that name inside that source on the specific page.
- With the multimedia.txt data we could link to images (figures) that Plazi processing has deposited in Zenodo as part of creating the treatment.
- Using the occurrences.txt we could pull in data from the materials examined information for each specimen cited in the treatement.
Meanwhile, you can use Citations by Source to easily add the source page numbers provided in the treatment to each citation record in TW.
Tips
Do check the page numbers that the treament file asserts to ensure the paper was parsed correctly.
Bryozoa names from a website
In this example set, we started with names we could see on the web (bryozoa.net) for the year 2008. The following files differ only in file format. Each will import 171 names. Note that to create this file, we had to create the identifier columns for (taxonID, acceptedNameUsageID, parentNameUsageID, and originalNameUsageID). (Some testing suggests that you can leave `originalNameUsageID empty and the upload will work. The column must be present however).
More Complex Checklists
Delving into more complex scenarios (synonyms for example) here are some examples for you to look at as you plan your name upload strategy. Note that in these datasets, the names existed in a source database. So the identifiers were from their own database. This set of upload test files comes from work done by the developer who wrote the Checklist Importer code.
Source data from Checklist Bank
See recent work to show how you can use / modify datasets from Checklist Bank for importing in to Taxonworks.
Occurrence Data
To upload occurrence data, TW offers the ability to use a DwC Archive file format. For occurrences, the importer is presently limited to vouchered specimen data records.
To use this approach you must have your specimen data in a single spreadsheet-style format that can be exported as "CSV".
Preparing for an import follows the following general procedures:
- Map your data (provide a column header) for each column of data to be imported
- Configure TaxonWorks for your DwC import by creating records that will be used during the import process
Tips
As part of your process you may need to go back and forth between mapping and configuring
Map your data
The DwC importer provides flexibility in importing diverse data. These fall in to several types:
- DwC terms
- User customizable data attributes
- User customizable biocuration classes
- TaxonWorks' model specific attributes
As headers, these will look like this:
catalogNumber | TW:DataAttribute:CollectionObject:color | caste | TW:CollectingEvent:verbatim_collectors |
|---|---|---|---|
| A DwC term mapping | A user customizable data attribute | A TW biocuration attribute | A TW specific attribute |
Tips
A first step is to go through your data and figure out which column header types you'll need. Start by matching to supported DwC terms, then go on from there.
DwC term mapping
When going from DwC, a flat format, to TaxonWorks you're moving your data from rows to Things. We can group the DwC terms into classes to reflect where they end up in TaxonWorks.
Of the terms described below, the three required for occurrence data import are occurrenceID, scientificName, and basisOfRecord.
Record-level class
| Term | Mapping |
|---|---|
type | It is checked that it equals PhysicalObject before allowing the record to be imported. If the value is empty or term not present it is assumed it is a PhysicalObject |
institutionCode | Selects the repository for the specimen that is registered with an acronym equal to this value |
collectionCode | Paired with institutionCode it is used to select the namespace for catalogNumber from a user-defined lookup table in import settings, the value itself is not imported. |
basisOfRecord | It is checked that it equals an expected valid value for term, e.g. PreservedSpecimen or FossilSpecimen before allowing the record to be imported. If the value is empty it is assumed it is a PreservedSpecimen. For compatibility with GBIF datasets, PRESERVED_SPECIMEN is also allowed. |
Occurrence class
| Term | Mapping |
|---|---|
occurrenceID | Must be unique within your import; for reference/filtering purposes preferably universally unique, though that is not required. |
catalogNumber | The identifier value for a Catalog Number local identifier. The namespace is selected from the namespaces lookup table in import settings queried by institutionCode:collectionCode pair. If you require several records to share the same Catalog Number identifier, you may do so by enabling Containerize specimen with existing ones when catalog number already exists import setting or by distinct recordNumber value. |
recordNumber | The identifier value for Record Number local identifier. If not empty the record requires to have the short name of the Namespace to use in a TW-specific column named TW:Namespace:RecordNumber. This DwC term enables the re-use of the same catalogNumber of both existing collection objects and records in the dataset, as the importer assigns related specimens to a container to allow sharing the same Catalog Number identifier. |
recordedBy | It is imported as-is in verbatim collectors field of the collecting event. Additionally, the value is parsed into people and assigned as collectors of the CE. Previously existing people are not used unless the data origin is the same dataset the record belongs to, otherwise any missing people are created. |
individualCount | The total number of entities associated with the specimen record (e.g. this record may be for a "lot" containing 6 objects). |
sex | Selects the biocuration class from the "sex" biocuration group to be assigned as biocuration classification for the specimen. |
preparations | Selects an existing preparation matching the name with this value. |
Event class
| Term | Mapping |
|---|---|
eventID | The identifier for the Collecting Event. If not empty the importer requires a Namespace for it. You may specify a Namespace in a TW-specific column named TW:Namespace:EventID by either using a global identifier type (e.g. Identifier::Global::Uuid, Identifier::Global::Lsid, etc.), or the short name of the Namespace for the Event local identifier. If no namespace is provided, the importer assigns a dataset-specific one with a synthetic name that you can later change. When an existing Collecting Event already has this identifier, the importer re-uses it and the event-related data is ignored. |
fieldNumber | The identifier value for Field Number local identifier. If not empty the record requires to have the short name of the Namespace to use in a TW-specific column named TW:Namespace:FieldNumber. The verbatim trip identifier is also populated by this DwC term. When an existing Collecting Event already has this identifier, the importer re-uses it and the event-related data is ignored. IMPORTANT: if a Collecting Event is already matched by eventID, this identifier must exactly match the existing one, otherwise the importer will reject the record. Same rejection will occur if mismatch happens the other way around. |
eventDate | The ISO8601-formatted date is split into start year, month and day collecting event fields. If the value is composed of two dates separated by /, then rightmost date is used as end date and split in the same way as start date. If data contradicts dates from other non-empty date-related terms the record will fail to import |
eventTime | Time is split into time start hour, minute, and second of collecting event |
startDayOfYear | Using year and the value for this term month and day are calculated and stored in start year, month, and day collecting event fields. If the computed value contradicts dates from other non-empty date-related terms the record will fail to import. |
endDayOfYear | Using year and the value for this term month and day are calculated and stored in end year, month and day collecting event fields. If the computed value contradicts dates from other non-empty date-related terms the record will fail to import. |
year | The start date year of the collecting event. If the value contradicts dates from other non-empty date-related terms the record will fail to import |
month | The start date month of the collecting event. If the value contradicts dates from other non-empty date-related terms the record will fail to import. |
day | The start date day of the collecting event. If the value contradicts dates from other non-empty date-related terms the record will fail to import |
verbatimEventDate | Verbatim date of the collecting event |
habitat | Verbatim habitat of the collecting event |
samplingProtocol | Verbatim method of the collecting event |
fieldNotes | Field notes of the collecting event |
Location class
| Term | Mapping |
|---|---|
fieldNumber | Verbatim trip identifier of collecting event |
Identification class
| Term | Mapping |
|---|---|
identifiedBy | A list (concatenated and separated) of names of people, groups, or organizations who assigned the Taxon to the subject. If possible, separate the values in a list with space vertical bar space | (known as a pipe). (e.g. Theodore Pappenfuss | Robert Macey) |
dateIdentified | The date on which the subject was determined as representing the Taxon. Best practice is to use a date that conforms to ISO 8601-1:2019 see examples. |
Taxon class
| Term | Mapping |
|---|---|
nomenclaturalCode | Selects the nomenclatural code for the taxon ranks used when creating protonyms. The value itself is not imported |
kingdom | Creates (unless already present) a protonym at kingdom rank |
phylum | Creates (unless already present) a protonym at phylum rank |
class | Creates (unless already present) a protonym at class rank |
order | Creates (unless already present) a protonym at order rank |
family | Creates (unless already present) a protonym at family rank |
genus | Ignored. Extracted from scientificName instead |
subgenus | Ignored. Extracted from scientificName instead |
specificEpithet | Ignored. Extracted from scientificName instead |
infraspecificEpithet | Ignored. Extracted from scientificName instead |
scientificName | Several protonyms created (only when not present already) with their corresponding ranks and placements |
taxonRank | The taxon rank of the most specific protonym |
higherClassification | Several protonyms created (only when not present already) with their corresponding ranks and placement. In case a protonym was not already present, only family-group names will be created, names with classsification higher than family-group not previously registered will result in error. Names at genus rank or lower are ignored and extracted from scientificName instead |
scientificNameAuthorship | Verbatim author of most specific protonym |
TaxonWorks mappings
The DwC importer task includes some TW-specific mappings that are neither DwC core terms nor in any DwC extension term lists but instead, direct mappings to predicates in your projects imported as data attributes for collection objects and collecting events, biocuration groups and classes, and as an advanced-use feature you may have direct mappings to model fields.
Warning
If submitting an actual DwC-A zip file and not tab-separated text file or spreadsheet, these TW-specific mappings have to be placed as headers in the core table, and not in meta.xml. If you are replacing a mapping from meta.xml, you must make sure to comment it out and also if inserting columns make sure you do the appropriate adjustments to avoid collision.
See Configure TaxonWorks for your DwC import for how to create the records referenced in these mappings.
Mappings to project predicates
In cases where you need to import predicate values targetting the imported collection object or collecting event you may do so by naming the column with a pattern like TW:DataAttribute:<target_class>:<predicate_identifier>. <target_class> may be CollectionObject or CollectingEvent, and the <predicate_identifier> may be either the name of the predicate or its URI. As an example if you have a predicate registered with name ageInDays and URI http://rs.gbif.org/terms/1.0/ageInDays, both TW:DataAttribute:CollectionObject:ageInDays and TW:DataAttribute:CollectionObject:http://rs.gbif.org/terms/1.0/ageInDays can be used to refer to the same predicate.
Mappings to biocuration groups and classes
The importer is able to map sex into the appropriate biocuration group and select the appropriate class according to the value. For additional mappings you may use a special column name pattern to select a biocuration group like TW::BiocurationGroup:<group_identifier> where <group_identifier> can be the name of the biocuration group or its URI. In addition the values must match an existing biocuration class and you may use either its name or URI. For example, if you have a biocuration group registered with name Caste and URI urn:example:ants:caste and biocuration class with name Queen and URI urn:example:ants:caste:queen the following examples do all create the same biocuration classification:
| Caste | urn:example:ants:caste |
|---|---|
| Queen | urn:example:ants:caste:queen |
| urn:example:ants:caste:queen | Queen |
Mappings to DwC predicates
Whenever the importer sees that your project has custom attributes for collecting events and/or collection objects that match Darwin Core URI terms (http://rs.tdwg.org/dwc/terms/<term>), they will be imported as data attributes regardless of any existing mapping of the same field. This allows to preserve verbatim dataset values for reference and also to import data from terms not supported by the importer.
Direct mapping to TW model fields
This is an advanced mapping and requires knowledge of the underlying TW models. The pattern is TW:<model_class>:<field> where model can be either CollectionObject or CollectingEvent, and <field> can be the ones listed below.
| Class | fields |
|---|---|
CollectionObject | buffered_collecting_event, buffered_determinations, buffered_other_labels, total, |
CollectingEvent | document_label, print_label, verbatim_label, field_notes, formation, group, lithology, max_ma, maximum_elevation, member, min_ma, minimum_elevation, elevation_precision, start_date_day, start_date_month, start_date_year, end_date_day, end_date_month, end_date_year, time_end_hour, time_end_minute, time_end_second, time_start_hour, time_start_minute, time_start_second, verbatim_collectors, verbatim_date, verbatim_datum, verbatim_elevation, verbatim_geolocation_uncertainty, verbatim_habitat, verbatim_latitude, verbatim_locality, verbatim_longitude, verbatim_method, verbatim_trip_identifier |
Configure TaxonWorks for your DwC Occurrence data import
To import your DwC you many need to create several types of things in TaxonWorks. These include namespaces and controlled vocabulary terms.
Namespaces
In the context of the DwC importer namespaces allow TW to
- Assign an Identifier as a CatalogNumber
- Track uniqueness of each object during the import, helping TW to normalize your data, turning it from rows to Things
- Group your Identifiers (and therefor the CollectionObjects they reference) as coming from a specific place
Controlled vocabulary terms
There are several kinds of CVTs that may be used in the import process.
Tips
All CVTs are created and managed via the Manage controlled vocabulary terms task.
Predicates
Think of Predicates as your custom column headers. Predicates are referenced in DataAttributes. Use a Predicate when you want to assign many different values (have rows with many different values) under one heading.
Biocuration classes
Think of biocuration classes as custom attributes for your collection objects, things like 'male', 'pupa', or 'larva'. These let you assign values useful for your curation of your specimens in a controlled way, ensuring problems like 'M.', 'MALE', 'ale' don't happen in what might otherwise be a "Sex" field. [TODO: reference groups?]. This approach is used when your rows have only a few specific values across the dataset.
Unmapped columns
Column headers that can't be linked via one of the 3 mechanisms are ignored during the import process. This means it's important to do some trial runs in a sandbox, or with a smaller dataset to see that your values are mapping over. The Browse collection object task is a good place to check this.
Tips
You can augment your data after import with batch update functionality inside TW. Carefully planning your overal import process can lead to a more efficient overall approach. Sometimes it's easier to work in spreadsheets, sometimes within a database.
Drag and drop
Drag and drop loading of images and documents are accessible in various places in including the Radial annotator, and, notable, Tasks -> New image.
Record by record
When first learning TaxonWorks, entering records one-at-a-time offers you the opportunity to learn about more of the features in TW and get a feel for how you and others experience the UI.
For example, you want to enter a specimen record. You have two Tasks enabling you to do this. Choose to use Comprehensive Specimen Digitization Task or the Simple New Specimen Task.
Try Simple New Specimen
In your project, try creating a simple new specimen record.
Note you will need to select a namespace. You may find you need to add a namespace before you can do this TW task. Adding a value for namespace ensures your uploaded data records will be unique inside your TW project and across TW projects. In your project, you may also need more than one namespace. [Use Tommy’s INHS Insect Collection as an example, with 12 different namespaces that effectively group the various collections housed at INHS ENT].
If you tried the OTU batch loader you can pick one of your OTUs for the name to assign to this specimen.
Add an image if you wish
Select the Preparation type for this specimen. You may need to add a new value to the dropdown using the New preparation type task.
Coming from other software
Scratchpads
We are in the process of exploring two routes to come from Scratchpads to TaxonWorks.
- The DwC import should work well for occurrence data that is based on collected objects.
- The SFG team is has worked with a select number of individual Scratchpad curators to script the process of transferring their datadata. Contact us if you are interested in what this approach entails. Note that this process takes programming effort that is a limited resource within the SFG.
Importing shapes (GIS)
Background
TaxonWorks comes with its own fixed set of geographic shapes, called Geographic Areas. These consist solely of geopolitical shapes/boundaries at the county, state/province, and country levels. Frequently though individuals will have a need for more specialized shapes of, for example, a national park in which they study, or a particular water body, or one particular island. For these needs TaxonWorks has Gazetteers, which are individual named-shapes created/imported as needed by users for their own use cases. Gazetteers:
- have a name, such as "Mediterranean Sea" or "Awaji Island" (rarely will these be geopolitical names, as opposed to Geographic Areas)
- have a shape/boundary which is either created by the user or imported from a data file (typically a shapefile)
- are project-level objects, as opposed to Geographic Areas which are community data
Creating Gazetteer shapes
The New Gazetteer task
We'll only touch briefly on the New Gazetteer task, which is geared more toward user-drawn shapes:
The name field is required, ISO 3166 A2/A3 are optional. You can see here that we've selected 2 of our projects to add the Gazetteer we're creating to.
Warning
You can't currently copy shapes between projects after importing (though that option will be available in the future), so this is your one chance to do so.
Creation options are:
- From Leaflet: draw a shape on a map using your mouse
- WKT coordinates: import a shape using the well-known-text format
- Enter coordinates of a point
- Add an existing Geographic Area or Gazetteer to your shape, by union or intersection
The Import Gazetteers task
The Import Gazetteers task allows you to import many shapes at once with whatever precision the original shapes were created with, via a shapefile. Typically in this case the shapefile will be something you or a colleague found online and then downloaded.
Tips
At this time we only support import of shapefiles. If you find the shapes you would like to import in a different format, you can likely use software such as ArcGIS (commercial), QGis, or GDAL (more advanced) to convert to the shapefile format.
Warning
We strongly encourage you to perform an initial test import on a sandbox to make sure everything turns out as expected; otherwise you may wind up with duplicated or missing shapes if there are issues with the import process and you have to run the import multiple times.
Tips
Similarly to Taxon Name imports, it can be helpful to examine your gazetteer data prior to attempting an import. If you have GIS software experience with ArcGIS (ESRI) or QGis or similar, it can save time to examine your data to check for issues such as:
- shapes with invalid geometry
- a column that provides names for your shapes (can be called anything, doesn't need to be
name) - duplicate names on different shapes
- missing shapes or names
- misspelled names or names with extra information you're not interested in
Some of these issues can be dealt with in an ad-hoc manner in TaxonWorks, but in any case will likely be easier to resolve in GIS software made precisely for dealing with such data.
For these instructions we'll go through the process of importing the 8 Biogeographical Realms of the World as provided by a shapefile we can download online. The original data link is https://data-gis.unep-wcmc.org/portal/home/item.html?id=f196d94a226d430fa214947d51dad35a but we won't be working directly with that download file since it has one of the data issues mentioned above. The shapes for Nearctic, Oceanic, and Palearctic each cross the prime-meridian line, so those regions are provided by that shapefile as two polygons each, each with the same name.
In our case we'd like to be able to filter by the entire region using just one polygon, not two, so we have two choices:
- Import that shapefile as is and then use the Create or Edit Gazetteers task to create the union of each of those pairs (followed by deleting the originals)
- Perform those unions using ArcGIS or QGis or similar and then import that new shapefile instead
We'll go with option 2 here, where we've already created the new shapefile for you, available here. Download that shapefile and unzip it - TaxonWorks doesn't currently support importing the zip file directly.
Importing shapefile files
Shapefiles provide their data spread over several different files - the mandatory ones for import into TaxonWorks are the .shp, .shx, .dbf, and .prj files (visit the Wikipedia link above for more on what each of those files contributes).
The easiest way to get our shapefile files into TaxonWorks is to click on the New tab in the 'Shapefile documents' section:
Either drag your files into the drop region, or click in the drop region and select your files from the file selector (you can select them all at once by clicking the first one, then pressing the shift key and clicking on the last one). You should now see reds turned to green (as well as the yellow .cpg file, which is optional but should be included when you have one):
Note that your required shapefile documents have been auto-selected for you: they appear below the select in rows with trash can icons.
Selecting the Name column
It turns out shapefile data can be thought of a lot like we think of spreadsheet data: as columns and rows. We need to tell TaxonWorks which column has the names of our shapes - to do so click on the 'Select from shapefile fields' (i.e. columns) for the shapefile field containing the Gazetteer names:
You'll see the list of all of the shapefile columns that might be your name column:
These names were created by whoever made the shapefile, not necessarily with you in mind: in this case the likely choices are Realm or RealmCode, and RealmCode sounds more like a shorthand (it is), so we'll try 'Realm': click 'Realm' and then 'Select name field'.
Tips
See the preview step described below for checking that the field you selected contains the data you expect.
The name field is required for gazetteers; the next two, Iso 3166 A2 and A3, are not. They're not included by our shapefile, so we'll skip them.
Selecting a source to cite your imported gazetteers with
Next up you can enter a Source (the paper in which this data was described, e.g.) to cite each gazetteer you'll create. We'll pass this time.
Importing gazetteers to multiple projects
You also have the opportunity to choose which project(s) you'd like to import your shapes into - all of the projects you're currently a member of should be available as options:
Warning
You can't currently copy shapes between projects after importing (though that option will be available in the future), so this is your one chance to do so.
Previewing your import data
Before importing, you have the opportunity to check that the choices you made above are actually giving reasonable looking data. It's always a good idea to at least glance at the preview before importing. Here we see:
We see the eight expected regions, each with a single shape - that's what we want.
TaxonWorks does not include a shape preview.
Warning
If we had worked with the original unaltered shapefile instead then at this point we would see Nearctic, Oceanic, and Palearctic each listed twice, each with a count of 2: a red flag that there was an issue requiring our attention.
Tips
The preview rows are ordered by the Name column. The first column of the preview, Record number, gives the row number of that name in the shapefile in case you need to make changes to your shapefile.
Running the import job
The preview contains the data we're expecting, so click on Process shapefile to submit the import request. Imports run in the background: you should now see your job listed in the Import Job Status section:
Press the refresh button to track your job's status as the job runs. Very detailed very large shapes have been known to take tens of minutes to import; less detailed shapes can take under a second.
Here we see the job status is Completed, and we've imported all 8/8 shapes with no errors in about 2 seconds:
Checking the results
Click on the Gazetteers link to see a list of some/all of your new Gazetteers, and click one to check the imported shape:
Note that the Palearctic shape here displays as two pieces, split across the prime meridian, as expected.