Extendedgedcom#153
Merged
Merged
Conversation
…rser Add extractGedcomLevel, extractEventSubBlock, extractInfoFromLines, and extractCoordFromSubBlock helpers so processEventLine looks up fields by tag name rather than fixed line offsets, correctly handling missing or reordered GEDCOM event sub-fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arify extractEventSubBlock guard assumption
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dge case; add tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…blican calendar escape; fix pkgdown catch-all
Up to standards ✅🟢 Issues
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR enhances BGmisc’s GEDCOM I/O by making event parsing more structure-aware across GEDCOM 5.5.x and 7.x variants, adding header-based version detection, and providing user-facing helpers for converting GEDCOM coordinate strings to numeric values.
Changes:
- Reworked
readGedcombirth/death event parsing to use GEDCOM level-aware sub-block parsing and to locate coordinates across nested structures (including GEDCOM 7.xPLAC→MAP). - Added GEDCOM version detection from
HEAD/GEDC/VERS, attached as agedcom_versionattribute toreadGedcomoutput, plus new exportedgedcomLatToNumeric()/gedcomLonToNumeric()utilities. - Expanded tests and updated pkgdown/reference + generated documentation for the new behavior and utilities.
Reviewed changes
Copilot reviewed 9 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
R/readGedcom.R |
Implements version attribute + level-aware event sub-block parsing and improved date parsing. |
R/helpReadGedcom.R |
Adds internal version detection and new exported lat/lon conversion helpers. |
tests/testthat/test-readGedcom.R |
Adds tests for reordered/missing event subfields, GEDCOM 7 MAP nesting, version detection, and new conversion helpers. |
tests/testthat/test-readGedcomlegacy.R |
Adjusts legacy comparison to ignore the new gedcom_version attribute. |
NAMESPACE |
Exports gedcomLatToNumeric and gedcomLonToNumeric. |
man/readGedcom.Rd |
Updates user docs for the new parsing approach. |
man/processEventLine.Rd |
Updates docs to reflect level-aware sub-block parsing. |
man/gedcomLatToNumeric.Rd |
Adds generated documentation for latitude conversion helper. |
man/gedcomLonToNumeric.Rd |
Adds generated documentation for longitude conversion helper. |
man/detectGedcomVersion.Rd |
Adds generated internal documentation for version detection. |
data-raw/gedcom_spec.R |
Adds a developer script to fetch/spec tables for GEDCOM 7 into internal data. |
data-raw/df_royal92.R |
Minor boolean literal cleanup (T → TRUE). |
_pkgdown.yml |
Adds GEDCOM reference grouping (but currently has an indentation/config issue). |
Files not reviewed (5)
- man/detectGedcomVersion.Rd: Language not supported
- man/gedcomLatToNumeric.Rd: Language not supported
- man/gedcomLonToNumeric.Rd: Language not supported
- man/processEventLine.Rd: Language not supported
- man/readGedcom.Rd: Language not supported
Comment on lines
+391
to
+396
| extractInfoFromLines <- function(lines, tag) { | ||
| pattern <- paste0("\\b", tag, "\\b") | ||
| matches <- lines[grepl(pattern, lines)] | ||
| if (length(matches) == 0L) return(NA_character_) | ||
| extractInfo(matches[1L], tag) | ||
| } |
Comment on lines
+398
to
+407
| extractCoordFromSubBlock <- function(sub_block, tag) { | ||
| # Searches all levels of the sub-block so it handles: | ||
| # GEDCOM 5.5.x: LATI/LONG as direct children of the event | ||
| # GEDCOM 5.5.x standard: LATI/LONG under PLAC (level+2) | ||
| # GEDCOM 7.x: LATI/LONG under MAP under PLAC (level+3) | ||
| pattern <- paste0("\\b", tag, "\\b") | ||
| matches <- sub_block[grepl(pattern, sub_block)] | ||
| if (length(matches) == 0L) return(NA_character_) | ||
| extractInfo(matches[1L], tag) | ||
| } |
Potential fix for pull request finding Potential fix for pull request finding Potential fix for pull request finding Potential fix for pull request finding Potential fix for pull request finding Potential fix for pull request finding Co-Authored-By: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
8e8801a to
7da356d
Compare
2cf3def to
8725309
Compare
f2ee3c3 to
704417a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces several improvements to the GEDCOM parsing functionality, focusing on more robust and flexible event parsing, better support for different GEDCOM versions, and enhanced usability. The changes include a new version detection utility, improved parsing of event subfields (especially for birth and death events), new utilities for handling GEDCOM latitude/longitude values, and updates to documentation and tests.
GEDCOM Parsing Improvements
readGedcomnow uses level-aware sub-block parsing instead of fixed line offsets, making extraction ofDATE,PLAC,CAUS,LATI, andLONGfields more robust and compatible with GEDCOM 5.5.x and 7.x structures. Coordinates are now found regardless of their nesting underPLACorMAP. [1] [2] [3] [4]gedcomLatToNumericandgedcomLonToNumericconvert GEDCOM-style latitude/longitude strings to signed numeric values. These are exported and documented for user convenience. [1] [2] [3] [4] [5]GEDCOM Version Detection
detectGedcomVersionto extract the GEDCOM version from file headers. The detected version is now attached as an attribute to the resulting data frame inreadGedcom, and is also logged when verbosity is enabled. [1] [2] [3] [4]Date Parsing Enhancements
postProcessGedcomto handle calendar escapes (e.g.,@#DJULIAN@) and trim whitespace after removing qualifiers, resulting in more robust conversion toDateobjects.Documentation and Testing
gedcom_versionattribute. [1] [2]Developer Utilities
These changes make the GEDCOM parser more robust to different file structures and versions, improve usability for downstream analysis, and add helpful utilities for working with geographic and version information.