Let's start the discussion towards creating a universal standard for delivering genealogical source content in digital formats...


Existing Data Models

What other existing data models can we use to mold GenContent after (MARC, MODS, etc.)? There are existing models and much discussion that has already reached further than the list below. See the BetterGEDCOM wiki for analysis and discussion. FHISO is now taking this forwards as a managed project with consensus -building. [Tony Proctor (FHISO)].

[Josh - we need to clearly identify how GenContent is different than the GEDCOM format and how they interrelate. Does anyone want to take a stab at it?]
[Robert - Please refer to the short description of the GEDCOM source model and the BetterGEDCOM description of the GEDCOM source model.]

Some Essential Element Areas
While far from complete, the areas enumerated below demonstrate essential elements to ensuring the success of the universal schema among content providers. A key component to the structure will be the integration and addition of the genealogical module. The following are meant to serve as a point for beginning the discussion:

1.0 Titles: Incorporating a record's official title as well as the addition of a standard referenced title to ensure consumer accessibility.
1.1 - Authorized Title
1.2 - Published Title
1.3 - FHL Title (is this field needed?) / A good question. Because so much of the data we use can be found at the FHL, it might be a nice crossover into the FHLC and other tools to help users find the data easier. What are other's thoughts on this?
1.4 - Variant Title (can be repeated)

2.0 Author/Creator: The author/creator of the record set, including provisions for individuals, unknown authors, government entities, and others.
This element should probably use a controlled vocabulary.

3.0 Coverage
3.1 Coverage Date(s):Providing for an accurate description of the date (likely years) included in the record. [Can be repeated]
3.1.1 - Earliest Record
3.1.2 - Latest Record
3.1.3 - Known Missing Dates [?]
3.2 Coverage Locality:Standardizing the localities applicable to the document. [Can be repeated]
This element should probably refer to a standard place authority in some way.
3.2.1 - City/Township
3.2.2 - County
3.2.3 - State/Province
3.2.4 - Country
3.2.5 - Geocoding
3.3 Coverage Type(s):Categorizing the types of records in the record set. [Can be repeated]
This element should probably use a controlled vocabulary.
3.3.1 - Civil Birth
3.3.2 - Civil Marriage
3.3.3 - Civil Death
3.3.4 - Divorce
3.3.5 - Church
3.3.6 - Cemetery
3.3.7 - Bible
3.3.8 - Funeral Home
3.3.9 - Pauper/Poor House
3.3.10 - Passenger List
3.3.11 - Court
3.3.12 - Naturalization
3.3.13 - Slavery/Freedmen
3.3.14 - Hospital
3.3.15 - Military
3.3.16 - Land/Property
3.3.17 - State Census
3.3.18 - Native Race
3.3.19 - Tax
3.3.20 - Notarial
3.3.21 - Obituary/Newpaper Clipping
3.3.22 - Voter Registration/Voting/Poll Tax
3.3.23 - Probate
3.3.24 - Pension
3.3.25 - Other Vital
3.3.26 - School
3.3.27 - Oral History
3.3.28 - Genealogical Collection
3.3.29 - Lineage Society

4.0 Original Repository: The original, official repository of the data set, such as a government archive or civil repository.
Should this element use any sort of a controlled vocabulary?
4.1 - Company/Organization
4.1.1 - Company Name
4.1.2 - Address (Street)
4.1.3 - Address (City)
4.1.4 - Address (State/Province)
4.1.5 - Address (Country)
4.1.6 - Address (Zip Code)
4.2 - Collection Title
4.3 - Original Call Number: Reference to microfilmed or printed versions of the data sets including call numbers from major repositories such as the Family History Library, National Archives and Records Administration, etc.

5.0 Subject
5.1 Description: Description of any secondary information about this source, the how, what why where?
5.2 Authorized Headings (draw from LOC authorities?) [Can be repeated]

6.0 Data Owner/Copyright: Detailing the owner of the data, and the holder of its copyright (for all levels, as needed).
6.1 - Copyright Status
6.2 - Copyright Holder

7.0 Original Citation: Citations of the original data set from universally accepted formats, such as the Chicago Manual of Style and Evidence Explained
7.1 - Chicago
7.2 - Evidence Explained
7.3 - Turabian

8.0 Content Holding: Providing key details in regards to the content's current online/offline location(s) and its provider(s). [Can be repeated]
8.1 - Holding Type - (What type of a holding is this)
8.1.1 - Original
8.1.2 - Copy of Original
8.1.3 - Digital Image
8.1.4 - Microfilm
8.1.5 - Microfiche
8.1.6 - Transcription
8.1.7 - Index
This element should probably use a controlled vocabulary.
8.2 - Holding Company/Organization - (
8.2.1 - Holding Company Name
8.2.2 - Address (Street)
8.2.3 - Address (City)
8.2.4 - Address (State/Province)
8.2.5 - Address (Country)
8.2.6 - Address (Zip Code)
8.2.7 - URL(Corporate Page)
8.3 - Holding Title
8.4 - Date of Creation
8.5 - File Format
8.6 - Digitization Method [???]
8.7 - Holding Call Number
8.8 - Holding Citation
8.9 - Holding URL (Page to search online repository)

9.0 File Names: Must provide key information for incorporating page numbers, retakes, etc.
10.0 Language: A language of the intellectual content of the resource.

Genealogical Module: An intricate optional module encompassing genealogical data to ensure the schema can be integrated with family history software and other uses as deemed necessary by genealogists. An expanded version of this module may become the next generation of the GEDCOM format and could include specific details such as individual facts (birthdates, places, events, etc.) as well as analytical genealogical data.


Comments added:
(1) FHISO supports fully the move toward a universal standard for delivering genealogical content in digital formats. We are looking forward to your participation with us in these discussions. --The team from Family History Information Standards Organisation (FHISO)
(2) Added 'Content Type' and 'Description', don't want it to get too large, but it seemed to need those two.
(3) Josh here, began inputting elements that might need a controlled vocabulary and added some numbering to help us track and organize ideas.
(4) Would description be a mapped element to subject, or should that be an additional element? JH
(5) Titles - Should there be an addition and/or clarification to distinguish between the published title, quoted title, common variant (generic) and generic? GJ
(6) Josh here, condensed a few of the elements for clarification and also added more details to the title element per GJ.
(7) Robert - Combined those pieces of data that inform how the data covers history(Area, Time, Type) Coverage is a well known term in FamilySearch.org
(8) Robert - Added Holding data, trying to collect everything that isn't about the original source into a list of derivatives
(9) Dallan - I wonder if OpenLibrary could be modified to suit. I'd also be happy to extend how sources are modeled on WeRelate, which is open-content (example)
(10) Josh - Dallan, that would be terrific. Can you copy over the OpenLibrary standards, etc. in a section to help us work? It would also be helpful to work with the source model for WeRelate.
(11) Dallan - I created a page for the model that WeRelate uses; I don't know much about OpenLibrary -- I'll check it out and create a page for its model over the weekend
(12) Dallan - regarding 1.3, I spent several weeks massaging and importing the FHLC data into WeRelate, and our users spent months cleaning it up afterward. A crosswalk is possible. Better to use the FHLC titleID (a unique number) than the title though; titles are not unique. We use titleID at WeRelate to point back to the FHLC.
(13) Dallan - the WeRelate source database is open-content, if people here want to use it.
(14) Dallan - I created a new page showing the OpenLibrary model as best as I could determine from their API documentation and examples
(15) GeneJ -Perhaps an aside at this stage, but a discussion about one issue in contentDM (issue thought derived from Dublin Core) is active the Archives & Archivists mailing list. In brief, when a contentDM field contains no information, the practice is not to display the field (ala, this field is irrelevant/null). The relevance of the issue is apparent if you consider the case of a date. To archivists and historians (incl, genealogists), knowing an item is undated is material (not an irrelevant field). Posters to the noted thread cite DACS. One writes, "Undated or n.d. intentionally placed in a field is different from a null field. DACS uses 'undated' (quotes in source - Element 2.4.16) as the prescribed term when the materials are not dated and the institution chooses not to give estimated dates. The date element is a required element for minimum descriptive records in DACS ..." (Thank you for allowing me to post. GJ)