Skip to content

Finally added semantic_cleanup function.#8

Open
mmzeeman wants to merge 48 commits intomasterfrom
feature-cleanup-semantic
Open

Finally added semantic_cleanup function.#8
mmzeeman wants to merge 48 commits intomasterfrom
feature-cleanup-semantic

Conversation

@mmzeeman
Copy link
Copy Markdown
Member

@mmzeeman mmzeeman commented Apr 7, 2026

  • Added semantic cleanup
  • Removed array implementation for finding the middle snake. Now uses prefix and suffix routines from erlangs binary module.
  • Internal routines now work on utf32 binaries instead of utf8 and repairing matches.
  • Updated proper
  • Removed zotonic_stdlib

@mmzeeman mmzeeman requested a review from mworrell April 7, 2026 20:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes Diffy’s core diff/cleanup pipeline by adding semantic cleanup support and refactoring internal routines to operate on UTF-32 binaries, while also updating build/test tooling (PropEr, Dialyzer/Xref, CI OTP versions) and removing zotonic_stdlib.

Changes:

  • Added diff/3 with options to control linemode and apply semantic/efficiency cleanups in a defined order.
  • Refactored diff internals to run on UTF-32 binaries (including linemode and bisect paths) and replaced zotonic_stdlib HTML escaping with a local implementation.
  • Updated tests (new properties/regressions/corner cases), updated PropEr version, and refreshed tooling/CI configuration.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/diffy.erl Adds diff/3 options, introduces UTF-32 internal pipeline, implements semantic cleanup, replaces line indexing and HTML escaping.
test/diffy_tests.erl Updates property generators, adds semantic cleanup property + regressions and new linemode/options tests.
src/diffy_term.erl Header updates (@end, copyright year range).
src/diffy_simple_patch.erl Header updates (@end, copyright year range).
rebar.config Removes runtime deps, updates PropEr in test profile, adds ex_doc config, tightens xref/dialyzer warnings.
rebar.lock Removes locked dependencies (now empty).
Makefile Adds doc generation and doc cleanup targets; updates clean/distclean behavior.
.github/workflows/test.yml Updates OTP matrix and checkout action version.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes Diffy’s diff/cleanup pipeline by introducing semantic cleanup, moving core internals to UTF-32 processing to avoid UTF-8 boundary issues, and updating the project’s tooling/dependencies (Proper, docs generation, CI OTP versions).

Changes:

  • Added diff/3 with options to apply semantic/efficiency cleanups and disable linemode, and refactored internals to operate on UTF-32 binaries.
  • Implemented/rewired semantic cleanup and strengthened tests (including new linemode corner cases and HTML escaping).
  • Updated build/test tooling: bumped OTP matrix, updated Proper, removed zotonic_stdlib, added ExDoc integration.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/diffy.erl Adds diff/3, UTF-32 internal pipeline, semantic cleanup implementation, replaces HTML escaping, refactors linemode/bisect internals.
test/diffy_tests.erl Updates property generators, adds semantic cleanup property, expands linemode/option tests, adds HTML escaping regression.
src/diffy_term.erl Header/doc metadata updates.
src/diffy_simple_patch.erl Header/doc metadata updates.
rebar.config Removes zotonic_stdlib, bumps Proper (test profile), adds ExDoc configuration, tightens xref/dialyzer checks.
rebar.lock Replaced with empty lock list.
Makefile Adds doc target and clean_doc, adjusts clean/distclean.
.github/workflows/test.yml Updates OTP matrix to 26–28 and bumps checkout action.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes Diffy’s diff/cleanup implementation by introducing a semantic cleanup pipeline and shifting core algorithms to operate on UTF-32 internally (with UTF-8 conversion at the public API boundary), along with build/test tooling updates.

Changes:

  • Added diff/3 with options (semantic, efficiency, {efficiency, Cost}, no_linemode) and implemented semantic cleanup stages.
  • Refactored core diff/linemode/bisect/cleanup logic to run on UTF-32 binaries internally, plus updated HTML escaping.
  • Updated tests, CI OTP matrix, and project tooling (Proper version bump, ExDoc target, removed zotonic_stdlib).

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/diffy.erl Adds diff/3 options and semantic cleanup; refactors core algorithms to UTF-32 internals; updates HTML escaping and related tests.
test/diffy_tests.erl Expands property tests and adds regression/corner-case coverage (semantic cleanup, linemode, HTML escaping).
src/diffy_term.erl Header/doc updates only.
src/diffy_simple_patch.erl Header/doc updates only.
rebar.config Removes zotonic_stdlib, bumps Proper in test profile, adds ExDoc configuration and tighter xref/dialyzer settings.
rebar.lock Replaces existing lock content with an empty lockfile.
Makefile Adds doc build/clean targets and adjusts clean/distclean behavior.
.github/workflows/test.yml Updates OTP versions tested and bumps checkout action version.
Comments suppressed due to low confidence (1)

src/diffy.erl:660

  • levenshtein/1 now calls text_size/1, which converts each diff chunk to UTF-32 via unicode:characters_to_binary/3. That allocates a new binary per chunk and can become a noticeable performance/memory regression for large diffs. Consider restoring a non-allocating UTF-8 codepoint counter for text_size/1, or computing lengths during the UTF-32 phase and carrying counts forward.
% @doc Compute the Levenshtein distance, the number of inserted, deleted or substituted characters.
levenshtein(Diffs) ->
    levenshtein(Diffs, 0, 0, 0).

levenshtein([], Insertions, Deletions, Levenshtein) ->
    Levenshtein + max(Insertions, Deletions);
levenshtein([{insert, Data}|T], Insertions, Deletions, Levenshtein) ->
    levenshtein(T, Insertions+text_size(Data), Deletions, Levenshtein);
levenshtein([{delete, Data}|T], Insertions, Deletions, Levenshtein) ->
    levenshtein(T, Insertions, Deletions+text_size(Data), Levenshtein);
levenshtein([{equal, _Data}|T], Insertions, Deletions, Levenshtein) ->
    levenshtein(T, 0, 0, Levenshtein+max(Insertions, Deletions)).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mmzeeman and others added 3 commits April 12, 2026 10:51
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes Diffy’s diff/cleanup implementation by moving internal processing to UTF-32, adding option-driven diff behavior, and updating the build/test toolchain accordingly.

Changes:

  • Added diff/3 with options (semantic/efficiency cleanups and no_linemode) and refactored core diff/cleanup routines to operate internally on UTF-32.
  • Removed zotonic_stdlib dependency and replaced HTML escaping with an internal html_escape/1.
  • Updated test suite (new property + regression tests), CI matrix (OTP 26–28), and build tooling (ExDoc).

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/diffy.erl Adds diff/3, UTF-32 internal pipeline, semantic cleanup implementation, and internal HTML escaping.
test/diffy_tests.erl Updates properties, adds semantic cleanup properties/regressions, and adds linemode/options coverage.
src/diffy_term.erl Header/doc updates to align with refreshed module docs/copyright.
src/diffy_simple_patch.erl Header/doc updates to align with refreshed module docs/copyright.
rebar.config Removes runtime deps, updates Proper version, adds ExDoc and stronger xref/dialyzer settings.
rebar.lock Removes locked deps (now empty lockfile).
Makefile Adds doc/clean_doc targets and adjusts clean/distclean behavior.
.github/workflows/test.yml Updates OTP versions tested and bumps checkout action version.
Comments suppressed due to low confidence (1)

test/diffy_tests.erl:290

  • In cleanup_efficiency_test/0, the “Null case” assertion calls cleanup_semantic([]) instead of cleanup_efficiency([]). This makes the test less effective and likely isn’t intended given the test name and surrounding assertions.
cleanup_efficiency_test() ->
    % Null case
    ?assertEqual([], cleanup_semantic([])),


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes diffy by adding semantic cleanup and refactoring the core diff/cleanup pipeline to run internally on UTF-32 binaries (with UTF-8 conversion only at public API boundaries), alongside dependency/build tooling updates.

Changes:

  • Add diff/3 with options (semantic, efficiency, {efficiency, Cost}, no_linemode) and implement semantic cleanup in the main diff module.
  • Refactor internal diff/cleanup/linemode/bisect routines to operate on UTF-32 for correctness and simpler alignment handling.
  • Update tests and build tooling: new/expanded EUnit + PropEr coverage, remove zotonic_stdlib, update OTP CI matrix, and add ExDoc generation.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/diffy.erl Adds diff/3 options, semantic cleanup, UTF-32 internal pipeline, and HTML escaping implementation.
test/diffy_tests.erl Extends PropEr properties and adds multiple regression/corner-case tests (linemode/options/cleanup).
src/diffy_term.erl Adjusts term diff to call diffy:diff/3 with [no_linemode] and updates exported types.
src/diffy_simple_patch.erl Header/doc metadata updates only.
rebar.config Removes zotonic_stdlib, updates PropEr in test profile, adds ExDoc config, expands xref/dialyzer settings.
rebar.lock Resets lockfile content to empty.
Makefile Adds doc target and doc cleanup hooks; adjusts clean/distclean.
.github/workflows/test.yml Updates CI to OTP 26–28 and bumps checkout action version.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mmzeeman and others added 4 commits April 13, 2026 21:08
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants