Enable EVP flagevaluation system tests for Python by leoromanovsky · Pull Request #7156 · DataDog/system-tests

leoromanovsky · 2026-06-17T00:45:07Z

Motivation

tests/ffe/test_flag_eval_evp.py covers the EVP flagevaluation contract for server-side SDKs. Python now has a passing local focused run against the implementation PR, including aggregation, bounded context payloads, and natural degraded-tier overflow.

Changes

Enables tests/ffe/test_flag_eval_evp.py for Python in manifests/python.yml at v4.12.0rc1.
Sets the FFE scenario Python flush interval to 60 seconds so the degradation test exercises one aggregation window.
Leaves the shared test expectations from Add EVP flagevaluation system tests #7146 intact.
Keeps this as a sibling PR based on leo.romanovsky/ffe-evp-flagevaluation-system-tests, not stacked on other language enablement branches.

Decisions

This PR enables only Python; other languages are enabled by separate sibling PRs after their own local pass evidence.
Degradation is exercised naturally by exceeding the production per-flag full-tier cap; no test-only cap override is required.
Existing OTel metric coverage stays in tests/ffe/test_flag_eval_metrics.py.

Validation

./build.sh python -w flask-poc - PASS.
./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION --library python --weblog flask-poc -k "test_flag_eval_evp" - PASS, 8 passed, 2577 deselected in 59.91s.
System-test context: Agent 7.80.1; library python@4.12.0-rc1; artifact b9102dea0c5d812f6fecee87eb1fea54d06c09c1; weblog flask-poc; Linux x86_64.
Agent payload evidence:
- evp-count-flag: 1 event, total evaluation_count=5.
- evp-burst-aggregation-flag: 1 event, total evaluation_count=512.
- evp-high-cardinality-aggregation-flag: 128 events, total evaluation_count=128.
- evp-degradation-flag: 10001 events, total evaluation_count=10050, full_total=10000, degraded_total=50.

github-actions · 2026-06-17T00:45:39Z

CODEOWNERS have been resolved as:

manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
utils/_context/_scenarios/__init__.py                                   @DataDog/system-tests-core

datadog-prod-us1-3 · 2026-06-17T00:58:03Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 19 Pipeline jobs failed

Testing the test | System Tests (dotnet, dev) / End-to-end #2 / poc 2

🧪 5 Tests failed

tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start.test_ffe_rc_down_from_start[poc] from system_tests_suite

AssertionError: Flag evaluation failed: {&#34;type&#34;:&#34;https://tools.ietf.org/html/rfc9110#section-15.5.1&#34;,&#34;title&#34;:&#34;One or more validation errors occurred.&#34;,&#34;status&#34;:400,&#34;errors&#34;:{&#34;TargetingKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-a69709c89743c3537b25417d6549890d-73d4aa1debe0c229-00&#34;}
assert 400 == 200
 &#43;  where 400 = HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-a69709c89743c3537b25417d6549890d-73d4aa1debe0c229-00&#34;}).status_code
 &#43;    where HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-a69709c89743c3537b25417d6549890d-73d4aa1debe0c229-00&#34;}) = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start object at 0x7f4ec97566f0&gt;.r

self = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start object at 0x7f4ec97566f0&gt;

    def test_ffe_rc_down_from_start(self):
        &#34;&#34;&#34;Test that default value is returned when RC is down from start.&#34;&#34;&#34;
        # Verify tracer received 503 from RC
...

tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable.test_ffe_rc_unavailable_graceful_degradation[poc] from system_tests_suite

AssertionError: Baseline request failed: {&#34;type&#34;:&#34;https://tools.ietf.org/html/rfc9110#section-15.5.1&#34;,&#34;title&#34;:&#34;One or more validation errors occurred.&#34;,&#34;status&#34;:400,&#34;errors&#34;:{&#34;TargetingKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-fb57de908ef1291c77216cbe83d24b8e-d65142856a63cd96-00&#34;}
assert 400 == 200
 &#43;  where 400 = HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-fb57de908ef1291c77216cbe83d24b8e-d65142856a63cd96-00&#34;}).status_code
 &#43;    where HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-fb57de908ef1291c77216cbe83d24b8e-d65142856a63cd96-00&#34;}) = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable object at 0x7f4ec9757ce0&gt;.baseline_eval

self = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable object at 0x7f4ec9757ce0&gt;

    def test_ffe_rc_unavailable_graceful_degradation(self):
        &#34;&#34;&#34;Test that cached flag configs continue working when RC is unavailable.&#34;&#34;&#34;
        # Verify tracer received 503 from RC
...

View all 5 test failures

Testing the test | System Tests (dotnet, dev) / End-to-end #2 / uds 2

🧪 5 Tests failed

tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start.test_ffe_rc_down_from_start[uds] from system_tests_suite

AssertionError: Flag evaluation failed: {&#34;type&#34;:&#34;https://tools.ietf.org/html/rfc9110#section-15.5.1&#34;,&#34;title&#34;:&#34;One or more validation errors occurred.&#34;,&#34;status&#34;:400,&#34;errors&#34;:{&#34;TargetingKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-30517cc944891146913397d5675323c7-b532a671497324bb-00&#34;}
assert 400 == 200
 &#43;  where 400 = HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-30517cc944891146913397d5675323c7-b532a671497324bb-00&#34;}).status_code
 &#43;    where HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-30517cc944891146913397d5675323c7-b532a671497324bb-00&#34;}) = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start object at 0x7fee28c0f8c0&gt;.r

self = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start object at 0x7fee28c0f8c0&gt;

    def test_ffe_rc_down_from_start(self):
        &#34;&#34;&#34;Test that default value is returned when RC is down from start.&#34;&#34;&#34;
        # Verify tracer received 503 from RC
...

tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable.test_ffe_rc_unavailable_graceful_degradation[uds] from system_tests_suite

AssertionError: Baseline request failed: {&#34;type&#34;:&#34;https://tools.ietf.org/html/rfc9110#section-15.5.1&#34;,&#34;title&#34;:&#34;One or more validation errors occurred.&#34;,&#34;status&#34;:400,&#34;errors&#34;:{&#34;TargetingKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-180a836884298814410c7c28bfee1756-96fc02b5d3e4aa9f-00&#34;}
assert 400 == 200
 &#43;  where 400 = HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-180a836884298814410c7c28bfee1756-96fc02b5d3e4aa9f-00&#34;}).status_code
 &#43;    where HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-180a836884298814410c7c28bfee1756-96fc02b5d3e4aa9f-00&#34;}) = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable object at 0x7fee28c0f980&gt;.baseline_eval

self = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable object at 0x7fee28c0f980&gt;

    def test_ffe_rc_unavailable_graceful_degradation(self):
        &#34;&#34;&#34;Test that cached flag configs continue working when RC is unavailable.&#34;&#34;&#34;
        # Verify tracer received 503 from RC
...

View all 5 test failures

Testing the test | System Tests (dotnet, prod) / End-to-end #2 / poc 2

🧪 5 Tests failed

tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start.test_ffe_rc_down_from_start[poc] from system_tests_suite

AssertionError: Flag evaluation failed: {&#34;type&#34;:&#34;https://tools.ietf.org/html/rfc9110#section-15.5.1&#34;,&#34;title&#34;:&#34;One or more validation errors occurred.&#34;,&#34;status&#34;:400,&#34;errors&#34;:{&#34;TargetingKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-fb21223f0cf09e48ed232d7336e6a83b-3fd4319bd44583b6-00&#34;}
assert 400 == 200
 &#43;  where 400 = HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-fb21223f0cf09e48ed232d7336e6a83b-3fd4319bd44583b6-00&#34;}).status_code
 &#43;    where HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-fb21223f0cf09e48ed232d7336e6a83b-3fd4319bd44583b6-00&#34;}) = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start object at 0x7f1849f3a150&gt;.r

self = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Down_From_Start object at 0x7f1849f3a150&gt;

    def test_ffe_rc_down_from_start(self):
        &#34;&#34;&#34;Test that default value is returned when RC is down from start.&#34;&#34;&#34;
        # Verify tracer received 503 from RC
...

tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable.test_ffe_rc_unavailable_graceful_degradation[poc] from system_tests_suite

AssertionError: Baseline request failed: {&#34;type&#34;:&#34;https://tools.ietf.org/html/rfc9110#section-15.5.1&#34;,&#34;title&#34;:&#34;One or more validation errors occurred.&#34;,&#34;status&#34;:400,&#34;errors&#34;:{&#34;TargetingKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-c57cc1bfe506b636bd26793523200ea9-38f40f706d72c60e-00&#34;}
assert 400 == 200
 &#43;  where 400 = HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-c57cc1bfe506b636bd26793523200ea9-38f40f706d72c60e-00&#34;}).status_code
 &#43;    where HttpResponse(status_code:400, headers:{&#39;Content-Type&#39;: &#39;application/problem&#43;json; charset=utf-8&#39;, &#39;Date&#39;: &#39;Wed, 17 Jun...ngKeys&#34;:[&#34;The TargetingKeys field is required.&#34;]},&#34;traceId&#34;:&#34;00-c57cc1bfe506b636bd26793523200ea9-38f40f706d72c60e-00&#34;}) = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable object at 0x7f1849f3a090&gt;.baseline_eval

self = &lt;tests.ffe.test_dynamic_evaluation.Test_FFE_RC_Unavailable object at 0x7f1849f3a090&gt;

    def test_ffe_rc_unavailable_graceful_degradation(self):
        &#34;&#34;&#34;Test that cached flag configs continue working when RC is unavailable.&#34;&#34;&#34;
        # Verify tracer received 503 from RC
...

View all 5 test failures

View all 19 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 7ec9df5 | Docs | Datadog PR Page | Give us feedback!}

…-evp-flagevaluation-enable-python

This reverts commit e082ba4.

…github.com:DataDog/system-tests into leo.romanovsky/ffe-evp-flagevaluation-enable-python

leoromanovsky added 5 commits June 16, 2026 14:38

Add EVP flagevaluation system tests

74547e2

Strengthen EVP flagevaluation stress coverage

29743d2

Remove fixed EVP flagevaluation setup sleeps

d4aebe5

Skip disabled EVP flagevaluation tests before setup

bfc5f34

Enable EVP flagevaluation tests for Python

1581877

leoromanovsky added 9 commits June 16, 2026 21:08

Address EVP flagevaluation review feedback

b8e86f6

Make EVP degradation test hit production cap naturally

411be12

Move FFE fixtures under allowed utils package

8d9591c

Merge remote-tracking branch 'origin/pr/7146' into leo.romanovsky/ffe…

b419e40

…-evp-flagevaluation-enable-python

Make EVP degradation test use batched evaluations

15cfff8

Copy Node Express app into final weblog images

e082ba4

Revert "Copy Node Express app into final weblog images"

8c569a9

This reverts commit e082ba4.

Merge branch 'leo.romanovsky/ffe-evp-flagevaluation-system-tests' of …

408d50e

…github.com:DataDog/system-tests into leo.romanovsky/ffe-evp-flagevaluation-enable-python

test(ffe): keep Python flag evaluation aggregation in one flush

7ec9df5

Base automatically changed from leo.romanovsky/ffe-evp-flagevaluation-system-tests to main June 17, 2026 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable EVP flagevaluation system tests for Python#7156

Enable EVP flagevaluation system tests for Python#7156
leoromanovsky wants to merge 14 commits into
mainfrom
leo.romanovsky/ffe-evp-flagevaluation-enable-python

leoromanovsky commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-3 Bot commented Jun 17, 2026 •

edited by datadog-official Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leoromanovsky commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Decisions

Validation

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-prod-us1-3 Bot commented Jun 17, 2026 • edited by datadog-official Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leoromanovsky commented Jun 17, 2026 •

edited

Loading

github-actions Bot commented Jun 17, 2026 •

edited

Loading

datadog-prod-us1-3 Bot commented Jun 17, 2026 •

edited by datadog-official Bot

Loading