Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1329 +/- ##
=======================================
Coverage 97.10% 97.10%
=======================================
Files 235 235
Lines 2904 2904
=======================================
Hits 2820 2820
Misses 84 84 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…mpat Python 3.14 changed the json C encoder to access dict subclasses via direct C-level storage rather than calling Python-level __iter__/items(). ruamel.yaml's CommentedMap relies on its Python-level iteration for correct key ordering, so the C shortcut produced a non-deterministic key order on 3.14, causing test_dateparser_data_integrity to fail. Add _to_plain_types() to recursively convert CommentedMap → OrderedDict and CommentedSeq → list (preserving insertion order via Python-level iteration) before passing data to json.dumps in write_complete_data().
There was a problem hiding this comment.
Pull request overview
This PR aims to fix issue #1123 by making the freshness/relative-time parser understand compact English month abbreviations like 1mon ago by updating English translation/simplification data and adding regression tests. It also tweaks the data-generation script to produce deterministic JSON output across Python versions.
Changes:
- Add English simplification for
\d+mon(s)→\d+ monthand extend month tokens in EN translation data. - Update freshness parser tests to assert that
1mon ago/2mon ago/3mons agoare parsed as relative months. - Make
write_complete_data.pyconvert ruamel YAML types to plain mapping/sequence types beforejson.dumpsfor stable output.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tests/test_freshness_date_parser.py | Adds regression coverage asserting *mon(s) ago parses as relative months. |
| dateparser/data/date_translation_data/en.py | Updates EN month tokens and adds a simplification regex for mons?. |
| dateparser_scripts/write_complete_data.py | Normalizes YAML-loaded structures to stable plain types before serialization. |
| dateparser_data/supplementary_language_data/date_translation_data/en.yaml | Mirrors EN month-token + simplification updates in supplementary YAML source. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "month", | ||
| "months" | ||
| "months", | ||
| "mon", |
There was a problem hiding this comment.
This is also my main concern with this change.
I wonder if you can use AI to expand the tests with inputs that would become problematic after this change and inputs that were problematic before it, and decide which ones is best to leave broken, keeping them as expected failures.
| - years | ||
| month: | ||
| - months | ||
| - mon |
| ], | ||
| "simplifications": [ | ||
| { | ||
| "(\\d+)\\s*mons?\\b": "\\1 month" |
| - (\d+[.,]?\d*) decades? ago | ||
|
|
||
| simplifications: | ||
| - (\d+)\s*mons?\b: \1 month |
| @parameterized.expand( | ||
| [ | ||
| param("1mon ago"), # 1116 | ||
| param("1mon ago", ago={"months": 1}, period="month"), # 1123 | ||
| param("2mon ago", ago={"months": 2}, period="month"), # 1123 | ||
| param("3mons ago", ago={"months": 3}, period="month"), # 1123 | ||
| ] | ||
| ) | ||
| def test_known_issues(self, date_string): | ||
| def test_known_issues(self, date_string, ago, period): | ||
| self.given_parser() | ||
| self.given_date_string(date_string) | ||
| self.when_date_is_parsed() | ||
| self.then_error_was_not_raised() | ||
| self.assertEqual(None, self.result["date_obj"]) | ||
| self.then_date_was_parsed_by_freshness_parser() | ||
| self.then_date_obj_is_exactly_this_time_ago(ago) | ||
| self.then_period_is(period) |
Close #1123