Skip to content

Extendedgedcom#153

Merged
smasongarrison merged 10 commits into
dev_mainfrom
extendedgedcom
Jun 7, 2026
Merged

Extendedgedcom#153
smasongarrison merged 10 commits into
dev_mainfrom
extendedgedcom

Conversation

@smasongarrison

Copy link
Copy Markdown
Member

This pull request introduces several improvements to the GEDCOM parsing functionality, focusing on more robust and flexible event parsing, better support for different GEDCOM versions, and enhanced usability. The changes include a new version detection utility, improved parsing of event subfields (especially for birth and death events), new utilities for handling GEDCOM latitude/longitude values, and updates to documentation and tests.

GEDCOM Parsing Improvements

  • The event parser in readGedcom now uses level-aware sub-block parsing instead of fixed line offsets, making extraction of DATE, PLAC, CAUS, LATI, and LONG fields more robust and compatible with GEDCOM 5.5.x and 7.x structures. Coordinates are now found regardless of their nesting under PLAC or MAP. [1] [2] [3] [4]
  • New utilities gedcomLatToNumeric and gedcomLonToNumeric convert GEDCOM-style latitude/longitude strings to signed numeric values. These are exported and documented for user convenience. [1] [2] [3] [4] [5]

GEDCOM Version Detection

  • Added a new internal function detectGedcomVersion to extract the GEDCOM version from file headers. The detected version is now attached as an attribute to the resulting data frame in readGedcom, and is also logged when verbosity is enabled. [1] [2] [3] [4]

Date Parsing Enhancements

  • Improved date parsing in postProcessGedcom to handle calendar escapes (e.g., @#DJULIAN@) and trim whitespace after removing qualifiers, resulting in more robust conversion to Date objects.

Documentation and Testing

  • Updated documentation to reflect the new event parsing strategy and added help files for new utilities and version detection. [1] [2] [3] [4] [5]
  • Expanded test coverage to verify correct handling of missing or reordered event subfields, version detection, and the gedcom_version attribute. [1] [2]

Developer Utilities

  • Added a script to fetch and store the FamilySearch GEDCOM 7 specification tables as internal data for future use.

These changes make the GEDCOM parser more robust to different file structures and versions, improve usability for downstream analysis, and add helpful utilities for working with geographic and version information.

smasongarrison and others added 6 commits June 5, 2026 15:17
…rser

Add extractGedcomLevel, extractEventSubBlock, extractInfoFromLines, and
extractCoordFromSubBlock helpers so processEventLine looks up fields by
tag name rather than fixed line offsets, correctly handling missing or
reordered GEDCOM event sub-fields.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dge case; add tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…blican calendar escape; fix pkgdown catch-all
@codacy-production

codacy-production Bot commented Jun 5, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances BGmisc’s GEDCOM I/O by making event parsing more structure-aware across GEDCOM 5.5.x and 7.x variants, adding header-based version detection, and providing user-facing helpers for converting GEDCOM coordinate strings to numeric values.

Changes:

  • Reworked readGedcom birth/death event parsing to use GEDCOM level-aware sub-block parsing and to locate coordinates across nested structures (including GEDCOM 7.x PLACMAP).
  • Added GEDCOM version detection from HEAD/GEDC/VERS, attached as a gedcom_version attribute to readGedcom output, plus new exported gedcomLatToNumeric() / gedcomLonToNumeric() utilities.
  • Expanded tests and updated pkgdown/reference + generated documentation for the new behavior and utilities.

Reviewed changes

Copilot reviewed 9 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
R/readGedcom.R Implements version attribute + level-aware event sub-block parsing and improved date parsing.
R/helpReadGedcom.R Adds internal version detection and new exported lat/lon conversion helpers.
tests/testthat/test-readGedcom.R Adds tests for reordered/missing event subfields, GEDCOM 7 MAP nesting, version detection, and new conversion helpers.
tests/testthat/test-readGedcomlegacy.R Adjusts legacy comparison to ignore the new gedcom_version attribute.
NAMESPACE Exports gedcomLatToNumeric and gedcomLonToNumeric.
man/readGedcom.Rd Updates user docs for the new parsing approach.
man/processEventLine.Rd Updates docs to reflect level-aware sub-block parsing.
man/gedcomLatToNumeric.Rd Adds generated documentation for latitude conversion helper.
man/gedcomLonToNumeric.Rd Adds generated documentation for longitude conversion helper.
man/detectGedcomVersion.Rd Adds generated internal documentation for version detection.
data-raw/gedcom_spec.R Adds a developer script to fetch/spec tables for GEDCOM 7 into internal data.
data-raw/df_royal92.R Minor boolean literal cleanup (TTRUE).
_pkgdown.yml Adds GEDCOM reference grouping (but currently has an indentation/config issue).
Files not reviewed (5)
  • man/detectGedcomVersion.Rd: Language not supported
  • man/gedcomLatToNumeric.Rd: Language not supported
  • man/gedcomLonToNumeric.Rd: Language not supported
  • man/processEventLine.Rd: Language not supported
  • man/readGedcom.Rd: Language not supported

Comment thread R/readGedcom.R
Comment on lines +391 to +396
extractInfoFromLines <- function(lines, tag) {
pattern <- paste0("\\b", tag, "\\b")
matches <- lines[grepl(pattern, lines)]
if (length(matches) == 0L) return(NA_character_)
extractInfo(matches[1L], tag)
}
Comment thread R/readGedcom.R
Comment on lines +398 to +407
extractCoordFromSubBlock <- function(sub_block, tag) {
# Searches all levels of the sub-block so it handles:
# GEDCOM 5.5.x: LATI/LONG as direct children of the event
# GEDCOM 5.5.x standard: LATI/LONG under PLAC (level+2)
# GEDCOM 7.x: LATI/LONG under MAP under PLAC (level+3)
pattern <- paste0("\\b", tag, "\\b")
matches <- sub_block[grepl(pattern, sub_block)]
if (length(matches) == 0L) return(NA_character_)
extractInfo(matches[1L], tag)
}
Comment thread R/readGedcom.R
Comment thread R/helpReadGedcom.R
Comment thread R/helpReadGedcom.R
Comment thread R/helpReadGedcom.R
Comment thread tests/testthat/test-readGedcom.R
Comment thread tests/testthat/test-readGedcom.R
Comment thread _pkgdown.yml Outdated
Potential fix for pull request finding

Potential fix for pull request finding

Potential fix for pull request finding

Potential fix for pull request finding

Potential fix for pull request finding

Potential fix for pull request finding

Co-Authored-By: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@smasongarrison smasongarrison merged commit e74fca4 into dev_main Jun 7, 2026
13 checks passed
@smasongarrison smasongarrison deleted the extendedgedcom branch June 7, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants