diff --git a/.github/skills/code-review/README.md b/.github/skills/code-review/README.md new file mode 100644 index 00000000..ec6955f4 --- /dev/null +++ b/.github/skills/code-review/README.md @@ -0,0 +1,181 @@ +# Code Review Skill + +Comprehensive code review assistant for embedded C/C++ pull requests in the Telemetry 2.0 framework. + +## Quick Start + +``` +@workspace /code-review https://github.com/rdkcentral/telemetry/pull/123 +``` + +or + +``` +@workspace /code-review #123 +``` + +## What It Does + +The code review skill analyzes GitHub pull requests and generates a comprehensive `REVIEW.md` file that includes: + +- **Coverity defect integration** - Extracts and prioritizes static analysis findings from PR comments +- **Visual diff summary** with Mermaid diagrams +- **Module-by-module impact analysis** +- **Memory safety assessment** (leaks, use-after-free, buffer overflows) +- **Thread safety analysis** (race conditions, deadlocks) +- **Regression risk scoring** (LOW, MEDIUM, HIGH, CRITICAL) +- **Actionable recommendations** with line references +- **Testing coverage gaps** identification +- **Cross-cutting concerns** (build, config, docs) + +## Features + +### Coverity Integration +Automatically detects and incorporates Coverity static analysis results: +- Scans PR comments from **rdkcmf-jenkins** user +- Identifies comments with **"Coverity Issue"** title prefix +- Extracts defect type, severity, file/line locations +- Integrates findings into risk assessment and recommendations +- Prioritizes HIGH severity defects as blocking issues + +## Output Structure + +``` +reviews/PR--REVIEW.md +``` + +The generated review includes: +1. **Executive Summary** - 2-3 sentence overview with risk level +2. **Coverity Analysis** - Static analysis defects from PR comments (if present) +3. **Change Visualization** - Mermaid diagram with change indicators +4. **Module Analysis** - Memory/thread safety per module +5. **Regression Risks** - Specific concerns with file:line references +6. **Recommendations** - Prioritized action items +7. **Checklist** - Verification tasks before merge + +## Example Output + +```markdown +# Code Review: Add profile reload support + +## Overview +- PR: #123 +- Files Changed: 8 files, +245/-112 lines +- Risk Level: MEDIUM ⚠️ + +## Executive Summary +Adds dynamic profile reload capability via rbus method. Changes touch scheduler +and profile management with new mutex. Memory safety looks good but potential +race condition in reload path needs attention. + +## Changes by Module + +```mermaid +graph TD + root["📁 telemetry"] + root --> source["📁 source"] + root --> test["📁 test"] + + source --> bulkdata["📁 bulkdata"] + source --> scheduler["📁 scheduler"] + + bulkdata --> profile["📄 profile.c (+85/-42) ⚠️"] + bulkdata --> reportprof["📄 reportprofiles.c (+12/-5)"] + + scheduler --> schedulerc["📄 scheduler.c (+95/-48) 🔴"] + + test --> profiletest["📄 profile_test.cpp (+53/-17) ✅"] + + classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff + classDef warning fill:#ffd43b,stroke:#f08c00,color:#000 + classDef safe fill:#51cf66,stroke:#2f9e44,color:#fff + classDef neutral fill:#e0e0e0,stroke:#9e9e9e,color:#000 + + class schedulerc critical + class profile warning + class profiletest safe + class reportprof neutral +``` + +## Detailed Analysis + +### Bulk Data Collection Module + +#### Files Modified +- [source/bulkdata/profile.c](source/bulkdata/profile.c#L145-L230) (+85/-42) +- [source/bulkdata/reportprofiles.c](source/bulkdata/reportprofiles.c#L88-L100) + +#### Key Changes +- New `reloadProfile()` function to update profile configuration +- Added mutex `g_profile_reload_mutex` for protection during reload +- Profile state validation before applying changes + +#### Impact Assessment + +**Memory Safety**: ✅ GOOD +- All malloc calls have NULL checks +- Error paths properly free resources +- Old profile data freed before replacing + +**Thread Safety**: ⚠️ CONCERN +- Line 167: `reloadProfile()` acquires `g_profile_reload_mutex` but also + calls `updateScheduler()` which acquires `g_scheduler_mutex` +- Potential deadlock if scheduler tries to access profile during reload +- **Recommendation**: Document lock ordering or refactor to single lock + +**API Compatibility**: ✅ MAINTAINED +- No changes to public function signatures +- New function is additive + +**Error Handling**: ✅ GOOD +- Schema validation before applying profile +- Rollback on partial failure +- Appropriate error logging + +#### Regression Risks +⚠️ **Race condition in profile access** ([profile.c:167](source/bulkdata/profile.c#L167)) + - TimeoutThread may access profile fields while reload is updating them + - Recommend: Hold profile mutex during entire reload operation + +... +``` + +## Integration with Other Skills + +The code review skill complements: +- **quality-checker**: Run static analysis and tests after review +- **memory-safety-analyzer**: Deep dive on specific memory issues +- **thread-safety-analyzer**: Detailed concurrency analysis +- **platform-portability-checker**: Cross-platform validation + +## Tips + +1. **Run early**: Review PRs before extensive commenting to guide discussion +2. **Focus on risk**: Pay special attention to HIGH/CRITICAL risk items +3. **Verify with tools**: Follow up with `/quality-checker` for validation +4. **Iterate**: Re-run as PR evolves to track risk changes +5. **Custom focus**: Add specific concerns like "focus on memory safety" + +## Reference Files + +The skill uses these references for analysis: +- [review-checklist.md](references/review-checklist.md) - Comprehensive review standards +- [memory-patterns.md](references/memory-patterns.md) - Memory safety patterns +- [thread-patterns.md](references/thread-patterns.md) - Thread safety patterns +- [common-pitfalls.md](references/common-pitfalls.md) - Known anti-patterns + +## Limitations + +- Requires GitHub access for remote PRs +- Heuristic-based risk assessment (not perfect) +- Cannot detect all semantic bugs +- Best for C/C++ embedded code +- Manual judgment still required for complex logic + +## Contributing + +To improve the skill: +1. Update [common-pitfalls.md](references/common-pitfalls.md) with new anti-patterns +2. Add project-specific patterns to reference files +3. Refine risk scoring algorithm in SKILL.md +4. Add test cases for skill validation diff --git a/.github/skills/code-review/SKILL.md b/.github/skills/code-review/SKILL.md new file mode 100644 index 00000000..aad14766 --- /dev/null +++ b/.github/skills/code-review/SKILL.md @@ -0,0 +1,403 @@ +--- +name: code-review +description: 'Analyze GitHub PR changes for embedded C/C++ code and generate comprehensive REVIEW.md with visual diffs, impact assessment, and regression risk analysis. Use when: reviewing pull requests, analyzing code changes, identifying functional regressions, assessing memory/thread safety impact, validating embedded systems changes.' +argument-hint: 'PR URL or number, with an optional focus filter (e.g., "#123", "https://github.com/rdkcentral/telemetry/pull/123", or "#123 focus on thread safety"). Supported focus filters: "focus on memory safety", "focus on thread safety", "focus on api compatibility", "focus on error handling".' +--- + +# Code Review for Embedded Systems + +## Purpose + +Generate a comprehensive `REVIEW.md` report for GitHub pull requests that helps senior engineers quickly understand changes, assess impact, and identify potential functional regressions in embedded C/C++ codebases. + +## Usage + +Invoke this skill when: +- Reviewing a pull request before merge +- Analyzing code changes for regression risk +- Understanding the scope and impact of modifications +- Validating memory safety, thread safety, or API compatibility +- Preparing for code review meetings +- Investigating functional issues introduced by recent changes + +**Invocation**: `@workspace /code-review [focus on ]` + +- `` — PR URL (e.g., `https://github.com/rdkcentral/telemetry/pull/123`) or short form (e.g., `#123`) +- `[focus on ]` *(optional)* — restrict analysis depth to one area; accepted values: `thread safety`, `memory safety`, `api compatibility`, `error handling` + +--- + +## Output: REVIEW.md Structure + +The skill generates a markdown report with the following sections: + +```markdown +# Code Review: [PR Title] + +## Overview +- PR: # +- Author: +- Files Changed: X files, +Y/-Z lines +- Risk Level: [LOW | MEDIUM | HIGH | CRITICAL] + +## Executive Summary +[2-3 sentence summary of changes and overall risk assessment] + +## Coverity Static Analysis (if applicable) +[Table of Coverity defects found in PR comments] + +## Changes by Module +[Visual tree showing impacted modules with change indicators] + +## Detailed Analysis + +### [Module 1] +#### Files Modified +- file1.c (+X/-Y) +- file2.h (+X/-Y) + +#### Key Changes +[Bulleted summary of functional changes] + +#### Impact Assessment +- **Memory Safety**: [Analysis] +- **Thread Safety**: [Analysis] +- **API Compatibility**: [Analysis] +- **Error Handling**: [Analysis] + +#### Regression Risks +⚠️ [Specific risks with line references] + +### [Module 2] +... + +## Cross-Cutting Concerns +- Build System Impact +- Configuration Changes +- Test Coverage Gaps +- Documentation Updates + +## Recommendations +1. [Priority action items] +2. [Suggested additional tests] +3. [Areas requiring closer inspection] + +## Checklist +- [ ] Memory leaks verified (valgrind) +- [ ] Thread safety validated +- [ ] API compatibility maintained +- [ ] Error paths tested +- [ ] Unit tests added/updated +- [ ] Integration tests pass +- [ ] Documentation updated +``` + +--- + +## Analysis Process + +### Step 1: Fetch PR Metadata + +Retrieve PR metadata using the GitHub CLI or API: + +```bash +# PR details (title, description, author, reviewers, CI status) +gh pr view --json number,title,body,author,reviewRequests,statusCheckRollup + +# File change list with line counts +gh pr view --json files + +# Existing review comments +gh pr view --json reviews,comments +``` + +If the GitHub CLI is unavailable, retrieve the same information via the GitHub REST API (`GET /repos/{owner}/{repo}/pulls/{pull_number}`, `/files`, `/comments`). + +**Check for Coverity Defects:** + +After fetching PR metadata, scan the comments for Coverity static analysis reports: +- Look for comments from user **"rdkcmf-jenkins"** +- Look for comments with titles starting with **"Coverity Issue"** +- Extract defect information: + - Defect type (e.g., DEADLOCK, RESOURCE_LEAK, ATOMICITY, USE_AFTER_FREE) + - File and line number + - Severity (High, Medium, Low) + - Checker description + +If Coverity defects are found: +1. Add a **"Coverity Analysis"** section to the REVIEW.md +2. List each defect with location and severity +3. Cross-reference these defects in module analysis +4. Flag defects as **MUST FIX** in recommendations if they are High severity +5. Include in risk assessment (higher safety score if critical defects present) + +Example Coverity section: +```markdown +## Coverity Static Analysis Results + +**Defects Found**: X issue(s) + +| Severity | Type | File:Line | Description | +|----------|------|-----------|-------------| +| HIGH | DEADLOCK | datamodel.c:120 | Lock ordering violation | +| MEDIUM | RESOURCE_LEAK | profile.c:456 | Memory not freed on error path | +``` + +### Step 2: Get PR Diff + +Retrieve the complete diff for analysis using standard tools available in the review environment: +- For a local checkout, use `git diff` against the target branch (for example, `git diff origin/main...HEAD`) +- For a GitHub pull request, use `gh pr diff ` +- If neither is available, obtain the unified diff from the PR page and review the changed files directly +- Parse diff hunks to identify: + - Added lines (new functionality) + - Removed lines (deleted code) + - Modified lines (behavior changes) + +### Step 3: Categorize Changes by Module + +Map changed files to architectural modules: + +| Pattern | Module | +|---------|--------| +| `source/bulkdata/*` | Bulk Data Collection | +| `source/scheduler/*` | Scheduling Engine | +| `source/reportgen/*` | Report Generation | +| `source/ccspinterface/*` | Bus Interface (CCSP/rbus) | +| `source/protocol/*` | Transport (HTTP/rbus) | +| `source/t2parser/*` | Profile Parser | +| `source/dcautil/*` | DCA Utilities | +| `source/utils/*` | Common Utilities | +| `source/privacycontrol/*` | Privacy Control | +| `source/test/*` | Test Infrastructure | +| `*.am`, `*.ac` | Build System | +| `config/*` | Configuration | +| `schemas/*` | JSON Schemas | + +### Step 4: Analyze Each Changed File + +For each file, apply domain-specific analysis using [reference checklists](./references/review-checklist.md): + +#### C Source Files (*.c) +1. **Memory Safety** (reference: [memory-patterns.md](./references/memory-patterns.md)) + - New allocations → verify corresponding free + - Pointer assignments → check NULL before dereference + - String operations → bounds checking + - Error paths → resource cleanup + +2. **Thread Safety** (reference: [thread-patterns.md](./references/thread-patterns.md)) + - Shared data access → mutex protection + - Lock acquisitions → deadlock potential + - Condition variables → proper usage pattern + - Thread creation → stack size specified + +3. **Error Handling** + - Return values checked + - Error codes meaningful + - Logging sufficient for debugging + - Failure modes handled + +4. **Resource Constraints** + - Stack vs heap allocation + - Memory footprint impact + - CPU impact (loops, algorithms) + +#### Header Files (*.h) +1. API changes → backward compatibility +2. Struct modifications → ABI compatibility +3. New functions → documentation complete +4. Constants/enums → semantic correctness + +#### Build Files (*.am, *.ac) +1. New dependencies → justified and minimal +2. Compiler flags → appropriate for embedded +3. Link order → correct +4. Conditional compilation → platform coverage + +#### Test Files (*.cpp, test/*) +1. Test coverage → adequate for changes +2. Mock usage → appropriate +3. Edge cases covered +4. Negative tests included + +### Step 5: Assess Regression Risk + +The risk level is determined **heuristically** based on the factors below. The weights are informational guidance for where to focus attention — there is no strict arithmetic formula. After evaluating each factor, assign the overall risk level that best reflects the combined picture. + +| Factor | Weight | Indicators | +|--------|--------|-----------| +| **Scope** | 30% | # files, # modules, LOC changed | +| **Criticality** | 25% | Core logic vs peripheral, production path | +| **Complexity** | 20% | Control flow changes, algorithm modifications | +| **Safety** | 15% | Memory/thread safety issues identified | +| **Testing** | 10% | Test coverage, CI status | + +**Risk Levels** (choose the highest level whose criteria are met): +- **LOW**: <10 files, single module, tests added, no safety concerns +- **MEDIUM**: 10-30 files, 2-3 modules, or minor safety concerns +- **HIGH**: >30 files, cross-module, or safety issues present +- **CRITICAL**: Core scheduler/bus/report logic, no tests, or confirmed safety issues + +> **Note:** When factors point to different levels, escalate to the higher level. The weights indicate relative importance, not a numeric formula — a single confirmed memory-safety issue can elevate an otherwise LOW-scope change to HIGH. + +### Step 6: Generate Visual Diff Summary + +Create Mermaid diagram showing changes: + +```mermaid +graph TD + root["📁 telemetry"] + root --> source["📁 source"] + root --> test["📁 test"] + root --> makefile["📄 Makefile.am (+2/-1)"] + + source --> bulkdata["📁 bulkdata"] + source --> scheduler["📁 scheduler"] + source --> ccsp["📁 ccspinterface"] + + bulkdata --> profile["📄 profile.c (+45/-12) ⚠️"] + bulkdata --> reportprof["📄 reportprofiles.c (+8/-3)"] + + scheduler --> schedulerc["📄 scheduler.c (+120/-80) 🔴"] + + ccsp --> rbus["📄 rbusInterface.c (+5/-2)"] + + test --> schedtest["📄 scheduler_test.cpp (+95/-0) ✅"] + + classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff + classDef warning fill:#ffd43b,stroke:#f08c00,color:#000 + classDef safe fill:#51cf66,stroke:#2f9e44,color:#fff + classDef neutral fill:#e0e0e0,stroke:#9e9e9e,color:#000 + + class schedulerc critical + class profile warning + class schedtest safe + class reportprof,rbus,makefile neutral +``` + +**Legend:** +- 🔴 High regression risk (red) +- ⚠️ Safety concern flagged (yellow) +- ✅ Test coverage added (green) +- Neutral changes (gray) + +### Step 7: Cross-Reference with Project Context + +Load project-specific context: +1. [Review checklist](./references/review-checklist.md) - Project standards +2. [Common pitfalls](./references/common-pitfalls.md) - Known anti-patterns +3. Architecture docs (`docs/architecture/overview.md`) +4. Build instructions (`.github/instructions/*.instructions.md`) + +Check against: +- Coding standards (naming, style) +- Project architecture principles +- Known anti-patterns for this codebase +- Historical issues (if session memory available) + +### Step 8: Generate Recommendations + +Prioritize action items: + +1. **MUST FIX** (blocking issues) + - Memory leaks + - Race conditions + - API/ABI breakage + - Missing critical error handling + +2. **SHOULD FIX** (before merge) + - Test coverage gaps + - Missing documentation + - Non-optimal patterns + - Minor safety concerns + +3. **CONSIDER** (future improvements) + - Refactoring opportunities + - Performance optimizations + - Code duplication + +--- + +## Example Invocations + +### Review a PR by URL +``` +@workspace /code-review https://github.com/rdkcentral/telemetry/pull/42 +``` + +### Review a PR by number (assumes current repo) +``` +@workspace /code-review #42 +``` + +### Review with an optional focus filter + +Append `focus on ` to restrict the depth of analysis to one concern. +Accepted focus values: `thread safety`, `memory safety`, `api compatibility`, `error handling`. + +``` +@workspace /code-review #42 focus on thread safety +``` + +``` +@workspace /code-review #42 focus on memory safety +``` + +**Note**: The skill generates Mermaid diagrams for change visualization. GitHub, VS Code, and most modern markdown viewers render these automatically. + +--- + +## Quality Checks Integration + +After generating the REVIEW.md, suggest running quality checks: + +```bash +# Use the quality-checker skill for comprehensive validation +@workspace /quality-checker +``` + +This will run: +- Static analysis (cppcheck) +- Memory safety (valgrind) +- Thread safety (helgrind) +- Build verification +- Unit tests + +--- + +## Output Location + +The REVIEW.md file is generated in: +- **Active PR**: `reviews/PR--REVIEW.md` +- **Quick review**: `REVIEW.md` (workspace root) + +The active PR report is written under `reviews/`, which is git-ignored by default. The quick review output uses `REVIEW.md` at the workspace root; if you do not want to commit that file, add it to `.gitignore` or remove it after review. Any timestamp is recorded inside the report content rather than in the filename. + +--- + +## Limitations + +- Requires GitHub access for remote PRs +- Complex logic changes need manual inspection +- Cannot detect all semantic bugs +- Risk assessment is heuristic-based +- Integration test impact requires human judgment + +--- + +## Tips for Best Results + +1. **Provide context**: If the PR fixes a specific issue, mention it +2. **Focus areas**: Specify if you want deep-dive on specific aspects +3. **Compare branches**: For local changes, ensure proper git state +4. **Supplement with testing**: Use `/quality-checker` for validation +5. **Review iteratively**: Run skill multiple times as PR evolves + +--- + +## Related Skills + +- **quality-checker**: Run comprehensive quality checks +- **memory-safety-analyzer**: Deep dive on memory issues +- **thread-safety-analyzer**: Deep dive on concurrency +- **platform-portability-checker**: Validate cross-platform code diff --git a/.github/skills/code-review/references/common-pitfalls.md b/.github/skills/code-review/references/common-pitfalls.md new file mode 100644 index 00000000..858096a6 --- /dev/null +++ b/.github/skills/code-review/references/common-pitfalls.md @@ -0,0 +1,432 @@ +# Common Pitfalls in Telemetry 2.0 Codebase + +This document captures recurring anti-patterns and common mistakes found in code reviews for the Telemetry 2.0 embedded framework. + +## Memory Management + +### Pitfall 1: JSON String Leaks (cJSON) +```c +// ❌ WRONG: cJSON_Print allocates memory +cJSON* json = cJSON_CreateObject(); +cJSON_AddStringToObject(json, "key", "value"); +char* json_str = cJSON_Print(json); // Allocates +sendReport(json_str); +cJSON_Delete(json); // ⚠️ Forgot to free json_str - LEAK! + +// ✅ CORRECT: +char* json_str = cJSON_Print(json); +sendReport(json_str); +free(json_str); // Must free +cJSON_Delete(json); +``` + +### Pitfall 2: strdup in Profile Parsing +```c +// ❌ WRONG: strdup without checking/freeing +profile->name = strdup(config_name); +profile->protocol = strdup(config_protocol); +// What if strdup fails? What if reassigned? + +// ✅ CORRECT: +if (profile->name) free(profile->name); +profile->name = strdup(config_name); +if (!profile->name) { + return ERR_NO_MEMORY; +} +``` + +### Pitfall 3: Vector/List Cleanup +```c +// ❌ WRONG: Clearing vector without freeing elements +Vector* markers = profile->markers; +// ... use markers ... +vector_destroy(markers); // ⚠️ Leaked marker objects inside! + +// ✅ CORRECT: +for (int i = 0; i < markers->count; i++) { + Marker* m = (Marker*)vector_at(markers, i); + free_marker(m); +} +vector_destroy(markers); +``` + +## Thread Safety + +### Pitfall 4: Profile State Race +```c +// ❌ WRONG: Reading profile state without lock +if (profile->state == PROFILE_ENABLED) { + collectAndReport(profile); // Race: state can change +} + +// ✅ CORRECT: +pthread_mutex_lock(&profile->mutex); +bool enabled = (profile->state == PROFILE_ENABLED); +pthread_mutex_unlock(&profile->mutex); + +if (enabled) { + collectAndReport(profile); +} +``` + +### Pitfall 5: rbus Callback Thread Safety +```c +// ❌ WRONG: Modifying global state in rbus callback +rbusError_t eventReceiveHandler(rbusHandle_t handle, rbusEvent_t const* event) { + g_event_count++; // Race condition! + processEvent(event); + return RBUS_ERROR_SUCCESS; +} + +// ✅ CORRECT: +rbusError_t eventReceiveHandler(rbusHandle_t handle, rbusEvent_t const* event) { + pthread_mutex_lock(&g_event_mutex); + g_event_count++; + pthread_mutex_unlock(&g_event_mutex); + + processEvent(event); + return RBUS_ERROR_SUCCESS; +} +``` + +### Pitfall 6: TimeoutThread vs CollectAndReport Deadlock +```c +// ❌ WRONG: TimeoutThread holds lock while signaling CollectAndReport +void timeoutThread(Profile* profile) { + pthread_mutex_lock(&profile->mutex); + // Generate timeout event + pthread_cond_signal(&profile->event_cond); + // CollectAndReport wakes, tries to lock same mutex - DEADLOCK! + pthread_mutex_unlock(&profile->mutex); +} + +// ✅ CORRECT: Signal after releasing lock +void timeoutThread(Profile* profile) { + pthread_mutex_lock(&profile->mutex); + profile->timeout_occurred = true; + pthread_mutex_unlock(&profile->mutex); // Release first + pthread_cond_signal(&profile->event_cond); // Then signal +} +``` + +## Error Handling + +### Pitfall 7: Silent rbus Failures +```c +// ❌ WRONG: Not checking rbus return values +rbusError_t ret = rbus_open(&handle, "T2"); +rbusMethod_Register(handle, method); // Proceeds even if rbus_open failed! + +// ✅ CORRECT: +rbusError_t ret = rbus_open(&handle, "T2"); +if (ret != RBUS_ERROR_SUCCESS) { + T2Error("rbus_open failed: %d\n", ret); + return ERR_RBUS_INIT; +} +``` + +### Pitfall 8: Ignoring snprintf Return Value +```c +// ❌ WRONG: snprintf truncation not checked +char buffer[64]; +snprintf(buffer, sizeof(buffer), "%s_%s_%ld", prefix, name, timestamp); +// What if truncated? Report will be malformed. + +// ✅ CORRECT: +char buffer[64]; +int written = snprintf(buffer, sizeof(buffer), "%s_%s_%ld", prefix, name, timestamp); +if (written >= sizeof(buffer)) { + T2Warning("Buffer truncated: needed %d bytes\n", written); + // Handle truncation +} +``` + +## Configuration & Schema + +### Pitfall 9: Missing Schema Validation +```c +// ❌ WRONG: Trusting JSON without validation +cJSON* interval = cJSON_GetObjectItem(profile_json, "ReportingInterval"); +int interval_sec = interval->valueint; // Crash if NULL or wrong type! + +// ✅ CORRECT: +cJSON* interval = cJSON_GetObjectItem(profile_json, "ReportingInterval"); +if (!interval || !cJSON_IsNumber(interval)) { + T2Error("Invalid ReportingInterval in profile\n"); + return ERR_INVALID_CONFIG; +} +int interval_sec = interval->valueint; +if (interval_sec < 60 || interval_sec > 86400) { + T2Error("ReportingInterval out of range: %d\n", interval_sec); + return ERR_INVALID_CONFIG; +} +``` + +### Pitfall 10: Profile Name Collisions +```c +// ❌ WRONG: Not checking for duplicate profiles +void addProfile(Profile* new_profile) { + vector_push(g_profiles, new_profile); // Duplicate names possible! +} + +// ✅ CORRECT: +bool addProfile(Profile* new_profile) { + for (int i = 0; i < g_profiles->count; i++) { + Profile* p = vector_at(g_profiles, i); + if (strcmp(p->name, new_profile->name) == 0) { + T2Error("Profile %s already exists\n", new_profile->name); + return false; + } + } + vector_push(g_profiles, new_profile); + return true; +} +``` + +## Scheduler & Timing + +### Pitfall 11: Timing Drift in Periodic Reports +```c +// ❌ WRONG: Accumulating drift +void periodicReport(int interval_sec) { + while (running) { + sleep(interval_sec); // Drift accumulates! + generateReport(); + } +} + +// ✅ CORRECT: Absolute timing +void periodicReport(int interval_sec) { + time_t next_report = time(NULL) + interval_sec; + while (running) { + time_t now = time(NULL); + if (now >= next_report) { + generateReport(); + next_report = now + interval_sec; // Reset to now, not next_report + } + sleep(1); // Short sleep to avoid busy-wait + } +} +``` + +### Pitfall 12: Timeout Calculation Overflow +```c +// ❌ WRONG: Integer overflow on large intervals +int timeout_ms = interval_sec * 1000; // Overflow if interval_sec > 2147483! + +// ✅ CORRECT: +long timeout_ms = (long)interval_sec * 1000L; +if (timeout_ms > INT_MAX) { + T2Error("Interval too large: %d seconds\n", interval_sec); + timeout_ms = INT_MAX; +} +``` + +## rbus/CCSP Interface + +### Pitfall 13: Bus Method Name Typos +```c +// ❌ WRONG: Hardcoded strings prone to typos +rbus_registerMethod(handle, "Device.X_RDKCENTRAL-COM_T2.ReportProfiles"); +// vs +rbus_invokeMethod(handle, "Device.X_RDKCENTRAL_COM_T2.ReportProfiles"); +// Underscore vs hyphen - won't match! + +// ✅ CORRECT: Use constants +#define T2_REPORT_PROFILES_METHOD "Device.X_RDKCENTRAL-COM_T2.ReportProfiles" +rbus_registerMethod(handle, T2_REPORT_PROFILES_METHOD); +rbus_invokeMethod(handle, T2_REPORT_PROFILES_METHOD); +``` + +### Pitfall 14: Bus Handle Lifetime +```c +// ❌ WRONG: Using handle after close +rbusHandle_t handle; +rbus_open(&handle, "T2"); +// ... use handle ... +rbus_close(handle); +rbus_sendData(handle, data); // ⚠️ Handle invalid! + +// ✅ CORRECT: Track handle state +static rbusHandle_t g_handle = NULL; +static bool g_rbus_initialized = false; + +void sendData(const char* data) { + if (!g_rbus_initialized || !g_handle) { + T2Error("rbus not initialized\n"); + return; + } + rbus_sendData(g_handle, data); +} +``` + +## Data Collection + +### Pitfall 15: Missing NULL Checks in TR-181 Fetch +```c +// ❌ WRONG: Assuming parameter always exists +char* value = getParameterValue("Device.DeviceInfo.SerialNumber"); +snprintf(report, sizeof(report), "SN:%s", value); // Crash if NULL! + +// ✅ CORRECT: +char* value = getParameterValue("Device.DeviceInfo.SerialNumber"); +if (value) { + snprintf(report, sizeof(report), "SN:%s", value); + free(value); +} else { + T2Warning("Failed to get SerialNumber\n"); + snprintf(report, sizeof(report), "SN:UNKNOWN"); +} +``` + +### Pitfall 16: Marker Regex Compilation Every Time +```c +// ❌ WRONG: Recompiling regex on every log line +void processLogLine(const char* line) { + regex_t regex; + regcomp(®ex, marker->pattern, REG_EXTENDED); + if (regexec(®ex, line, 0, NULL, 0) == 0) { + marker->count++; + } + regfree(®ex); // ⚠️ Expensive! +} + +// ✅ CORRECT: Compile once, reuse +typedef struct { + char* pattern; + regex_t compiled_regex; + bool regex_valid; +} Marker; + +void initMarker(Marker* m) { + int ret = regcomp(&m->compiled_regex, m->pattern, REG_EXTENDED); + m->regex_valid = (ret == 0); +} + +void processLogLine(Marker* m, const char* line) { + if (m->regex_valid && regexec(&m->compiled_regex, line, 0, NULL, 0) == 0) { + m->count++; + } +} +``` + +## Build & Dependencies + +### Pitfall 17: Missing Feature Detection in configure.ac +```c +// ❌ WRONG: Assuming rbus is always available +#include // Build fails if rbus not installed + +// ✅ CORRECT: Configure script checks +AC_CHECK_LIB([rbus], [rbus_open], + [have_rbus=yes], + [have_rbus=no]) +AM_CONDITIONAL([HAVE_RBUS], [test "x$have_rbus" = "xyes"]) + +// Then in code: +#ifdef HAVE_RBUS +#include +#else +#include "rbus_stub.h" +#endif +``` + +### Pitfall 18: Library Link Order +```c +// ❌ WRONG: Random link order in Makefile.am +telemetry2_0_LDADD = -lrbus -lrt -lpthread -lcjson + +// ✅ CORRECT: Dependencies before dependents +telemetry2_0_LDADD = -lcjson -lrbus -lrt -lpthread +# cjson has no deps, rbus depends on cjson, pthread is base +``` + +## Testing + +### Pitfall 19: Non-Deterministic Tests +```c +// ❌ WRONG: Test depends on timing +TEST(SchedulerTest, TimeoutFires) { + startScheduler(1); // 1 second interval + sleep(1); + EXPECT_EQ(1, getReportCount()); // ⚠️ Flaky: may be 0 or 1 +} + +// ✅ CORRECT: Use mock time or polling +TEST(SchedulerTest, TimeoutFires) { + startScheduler(1); + // Poll with timeout + bool fired = false; + for (int i = 0; i < 20 && !fired; i++) { + usleep(100000); // 100ms + fired = (getReportCount() > 0); + } + EXPECT_TRUE(fired); +} +``` + +### Pitfall 20: Memory Leaks in Tests +```c +// ❌ WRONG: Setup creates objects, teardown doesn't clean +class ProfileTest : public ::testing::Test { +protected: + void SetUp() override { + profile = createProfile("test"); + } + // ⚠️ Missing TearDown - leaked in every test! + Profile* profile; +}; + +// ✅ CORRECT: +class ProfileTest : public ::testing::Test { +protected: + void SetUp() override { + profile = createProfile("test"); + } + void TearDown() override { + destroyProfile(profile); + profile = nullptr; + } + Profile* profile; +}; +``` + +## Platform Portability + +### Pitfall 21: Hardcoded Paths +```c +// ❌ WRONG: Linux-specific path +#define CONFIG_FILE "/etc/telemetry/T2Agent.cfg" + +// ✅ CORRECT: Configurable path +#ifndef T2_CONFIG_DIR +#define T2_CONFIG_DIR "/etc/telemetry" +#endif +#define CONFIG_FILE T2_CONFIG_DIR "/T2Agent.cfg" +``` + +### Pitfall 22: Endianness Assumptions +```c +// ❌ WRONG: Assuming little-endian +uint32_t net_value = *(uint32_t*)buffer; // Wrong on big-endian! + +// ✅ CORRECT: Use network byte order functions +uint32_t net_value = ntohl(*(uint32_t*)buffer); +``` + +## Review Checklist: Avoid These Pitfalls + +When reviewing PRs, specifically check for: +- [ ] cJSON memory management (free cJSON_Print results) +- [ ] Profile state access (always use mutex) +- [ ] rbus return value checking +- [ ] Schema validation before using JSON values +- [ ] Profile name uniqueness enforcement +- [ ] Timing calculations (avoid drift and overflow) +- [ ] Bus method name consistency (use constants) +- [ ] TR-181 parameter NULL checks +- [ ] Regex compilation (do it once, not per line) +- [ ] Feature detection in build system +- [ ] Test determinism (no timing dependencies) +- [ ] Test cleanup (TearDown matches SetUp) +- [ ] Path portability (no hardcoded Linux paths) diff --git a/.github/skills/code-review/references/example-review-template.md b/.github/skills/code-review/references/example-review-template.md new file mode 100644 index 00000000..98bc043d --- /dev/null +++ b/.github/skills/code-review/references/example-review-template.md @@ -0,0 +1,549 @@ +# Code Review: [PR Title Here] + +**Generated**: [Timestamp] +**Reviewer**: GitHub Copilot Code Review Skill +**PR**: #[NUMBER] | [AUTHOR] | [DATE] + +--- + +## Overview + +| Attribute | Value | +|-----------|-------| +| **Files Changed** | X files (+YYY/-ZZZ lines) | +| **Modules Impacted** | Module1, Module2, Module3 | +| **Risk Level** | 🟢 LOW / 🟡 MEDIUM / 🔴 HIGH / ⚫ CRITICAL | +| **CI Status** | ✅ Passing / ❌ Failing / ⏳ Pending | +| **Test Coverage** | New: X%, Overall: Y% | + +--- + +## Executive Summary + +[2-3 sentences describing the core changes and overall assessment] + +**Key Points:** +- Main functional change +- Notable API/behavior modifications +- Primary risk factor + +--- + +## Coverity Static Analysis Results + +**Source**: Comments from rdkcmf-jenkins / "Coverity Issue" comments +**Defects Found**: 3 issue(s) + +| Severity | Type | File:Line | Description | +|----------|------|-----------|-------------| +| 🔴 HIGH | DEADLOCK | [scheduler.c:342](scheduler.c#L342) | Lock ordering violation detected between `tMutex` and `scMutex` | +| 🟡 MEDIUM | RESOURCE_LEAK | [profile.c:456](profile.c#L456) | Memory allocated but not freed on error path | +| 🟢 LOW | CHECKED_RETURN | [utils.c:123](utils.c#L123) | Return value of `pthread_mutex_lock` not checked | + +**Impact on Risk Assessment:** +- HIGH severity defects: +3 to Safety Score +- MEDIUM severity defects: +2 to Safety Score +- Total Coverity impact: +5 points + +**Recommendations:** +- 🔴 HIGH defects are blocking and must be fixed before merge +- 🟡 MEDIUM defects should be addressed or justified +- 🟢 LOW defects can be tracked for future cleanup + +--- + +## Risk Assessment + +| Category | Score | Notes | +|----------|-------|-------| +| **Scope** | [0-10] | X files, Y modules, ZZZ LOC | +| **Criticality** | [0-10] | Core logic / Peripheral feature | +| **Complexity** | [0-10] | Algorithm changes / Simple edits | +| **Safety** | [0-10] | Memory/thread concerns identified | +| **Testing** | [0-10] | Coverage and test quality | +| **TOTAL** | [0-50] | **Risk Level: [LOW/MEDIUM/HIGH/CRITICAL]** | + +**Risk Scoring:** +- LOW (0-10): Minimal impact, well-tested, no safety concerns +- MEDIUM (11-20): Moderate impact, minor concerns, good tests +- HIGH (21-35): Significant impact or safety issues present +- CRITICAL (36-50): Core functionality, severe safety issues, or inadequate tests + +--- + +## Changes by Module + +```mermaid +graph TD + root["📁 telemetry"] + root --> source["📁 source"] + root --> sourcetest["📁 source/test"] + root --> config["📁 config"] + root --> makefile["📄 Makefile.am (+2/-1)"] + + source --> bulkdata["📁 bulkdata"] + source --> scheduler["📁 scheduler"] + source --> ccsp["📁 ccspinterface"] + source --> protocol["📁 protocol"] + + bulkdata --> profile["📄 profile.c (+85/-42) ⚠️"] + bulkdata --> reportprof["📄 reportprofiles.c (+12/-5)"] + bulkdata --> markers["📄 t2markers.c (+3/-1)"] + + scheduler --> schedulerc["📄 scheduler.c (+120/-80) 🔴"] + + ccsp --> rbus["📄 rbusInterface.c (+15/-8) 🟡"] + + protocol --> http["📁 http"] + http --> client["📄 client.c (+5/-2)"] + + sourcetest --> bulktest["📄 bulkdata_test.cpp (+95/-10) ✅"] + sourcetest --> schedtest["📄 scheduler_test.cpp (+42/-0) ✅"] + + config --> cfg["📄 T2Agent.cfg (+3/-0)"] + + classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff + classDef warning fill:#ffd43b,stroke:#f08c00,color:#000 + classDef medium fill:#ffa94d,stroke:#fd7e14,color:#000 + classDef safe fill:#51cf66,stroke:#2f9e44,color:#fff + classDef neutral fill:#e0e0e0,stroke:#9e9e9e,color:#000 + + class schedulerc critical + class profile warning + class rbus medium + class bulktest,schedtest safe + class reportprof,markers,client,cfg,makefile neutral +``` + +**Legend:** +- 🔴 High risk / Critical (red) +- 🟡 Medium risk (orange) +- ⚠️ Caution needed (yellow) +- ✅ Good / Tests added (green) +- Neutral (gray) + +--- + +## Detailed Analysis + +### 1. Bulk Data Collection Module + +#### Files Modified +- [source/bulkdata/profile.c](source/bulkdata/profile.c) (+85/-42) +- [source/bulkdata/reportprofiles.c](source/bulkdata/reportprofiles.c) (+12/-5) +- [source/bulkdata/t2markers.c](source/bulkdata/t2markers.c) (+3/-1) + +#### Key Changes +- **New functionality**: Dynamic profile reload capability +- **Refactoring**: Profile state machine restructured for reload support +- **Bug fix**: Fixed memory leak in profile cleanup path +- **Configuration**: Added validation for new profile fields + +#### Code Quality Metrics +``` +Cyclomatic Complexity: [Average / Max] +Function Length: [Average lines] +Nesting Depth: [Max depth] +``` + +#### Impact Assessment + +##### Memory Safety: 🟡 MINOR CONCERNS + +**Positive:** +- ✅ All `malloc` calls include NULL checks ([profile.c:145](source/bulkdata/profile.c#L145)) +- ✅ Error paths properly free allocated memory +- ✅ Use of `goto cleanup` pattern for consistent resource cleanup + +**Concerns:** +- ⚠️ **Line 167**: `reloadProfile()` allocates new profile but doesn't free old profile's internal data structures before replacing pointer + - **Impact**: ~2KB memory leak per profile reload + - **Fix**: Call `cleanupProfileInternals(old_profile)` before assignment + +- ⚠️ **Line 223**: `strdup()` result not checked for NULL + - **Impact**: Potential crash if allocation fails + - **Fix**: Add NULL check and error handling + +##### Thread Safety: 🔴 SIGNIFICANT CONCERNS + +**Positive:** +- ✅ New `g_profile_reload_mutex` added for protection +- ✅ Consistent lock acquire/release pattern + +**Concerns:** +- 🔴 **CRITICAL - Line 167**: Potential deadlock scenario + ``` + reloadProfile() locks g_profile_reload_mutex + → calls updateScheduler() which locks g_scheduler_mutex + + Simultaneously: + TimeoutThread() locks g_scheduler_mutex + → accesses profile data (expects g_profile_reload_mutex) + ``` + - **Impact**: System hang, watchdog reset + - **Fix**: Document and enforce lock ordering: always acquire g_scheduler_mutex before g_profile_reload_mutex + +- ⚠️ **Line 205**: Profile state read without mutex + ```c + if (profile->state == PROFILE_ENABLED) { // Race condition + reload(profile); + } + ``` + - **Impact**: Race condition, profile might be disabled during reload + - **Fix**: Read state under mutex lock + +##### API Compatibility: ✅ MAINTAINED + +- No changes to public function signatures +- New functions are additive (`reloadProfile`, `validateProfile`) +- Struct layouts unchanged (ABI compatible) + +##### Error Handling: ✅ GOOD + +- Return values checked consistently +- Error codes are meaningful (`ERR_INVALID_CONFIG`, `ERR_PROFILE_LOCKED`) +- Appropriate logging at ERROR, WARN, and DEBUG levels +- Graceful degradation when reload fails (keeps old profile) + +##### Performance Impact: 🟢 LOW + +- Reload operation runs in separate thread (non-blocking) +- O(1) profile lookup (hash table unchanged) +- No new periodic operations +- Memory footprint increase: ~500 bytes per profile (new mutex + state fields) + +--- + +### 2. Scheduler Module + +#### Files Modified +- [source/scheduler/scheduler.c](source/scheduler/scheduler.c) (+120/-80) + +#### Key Changes +- **Modified**: Timeout calculation to support profile reload +- **Added**: `pauseScheduler()` and `resumeScheduler()` APIs +- **Refactored**: Timer wheel implementation for efficiency + +#### Impact Assessment + +##### Thread Safety: 🔴 HIGH RISK + +**Concerns:** +- 🔴 **Line 342**: Lock ordering issue (see Bulk Data section above) +- ⚠️ **Line 401**: Condition variable broadcast without checking predicates + - Multiple threads may wake up unnecessarily + - Potential for spurious wakeups causing incorrect behavior +- ⚠️ **Line 456**: `pauseScheduler()` doesn't wait for in-flight operations + - Profile reload might start while scheduled operation is running + +##### Algorithm Complexity: 🟡 MODERATE + +- Timer wheel introduces O(1) insert/delete (improvement from O(n) list) +- Edge case: Large interval values (>24 hours) might overflow wheel indices + - **Recommendation**: Add bounds checking at [scheduler.c:378](source/scheduler/scheduler.c#L378) + +--- + +### 3. CCSP Interface Module + +#### Files Modified +- [source/ccspinterface/rbusInterface.c](source/ccspinterface/rbusInterface.c) (+15/-8) + +#### Key Changes +- **New API**: `Device.X_RDKCENTRAL-COM_T2.ReloadProfile()` rbus method +- **Modified**: Error codes returned to caller (previously silent failures) + +#### Impact Assessment + +##### API Compatibility: ⚠️ BEHAVIOR CHANGE + +- **Breaking change**: RPC error codes changed + - Old: Always returned `RBUS_ERROR_SUCCESS` + - New: Returns actual error codes (`RBUS_ERROR_INVALID_INPUT`, etc.) + - **Impact**: Callers might not handle new error codes + - **Recommendation**: Document in release notes, add backward compatibility flag + +##### Error Handling: ✅ IMPROVED + +- Now validates input parameters before processing +- Appropriate error messages for field debugging + +--- + +### 4. HTTP Protocol Module + +#### Files Modified +- [source/protocol/http/client.c](source/protocol/http/client.c) (+5/-2) + +#### Key Changes +- **Bug fix**: Added timeout to HTTP POST (was infinite wait) + +#### Impact Assessment + +##### Reliability: ✅ IMPROVEMENT + +- Timeout prevents hung connections from blocking system +- Value: 30 seconds (reasonable for embedded environment) + +--- + +## Cross-Cutting Concerns + +### Build System + +**Files**: [Makefile.am](Makefile.am) + +**Changes:** +- Added `-pthread` flag (required for new mutex operations) +- No new external dependencies + +**Assessment:** ✅ Clean + +--- + +### Configuration + +**Files**: [config/T2Agent.cfg](config/T2Agent.cfg) + +**Changes:** +- New option: `EnableProfileReload=true` (default: false) + +**Assessment:** ✅ Good +- Feature flag allows gradual rollout +- Backward compatible (defaults to off) + +--- + +### Testing + +**Files**: +- [source/test/bulkdata_test.cpp](source/test/bulkdata_test.cpp) (+95/-10) +- [source/test/scheduler_test.cpp](source/test/scheduler_test.cpp) (+42/-0) + +**Coverage Analysis:** + +| Module | Before | After | Delta | +|--------|--------|-------|-------| +| profile.c | 78% | 85% | +7% | +| scheduler.c | 72% | 81% | +9% | +| rbusInterface.c | 65% | 68% | +3% | + +**Test Quality:** ✅ GOOD +- Comprehensive unit tests for reload logic +- Negative test cases included (invalid config, concurrent reloads) +- Mock objects used appropriately (rbus, scheduler) + +**Gaps:** +- ⚠️ Missing integration test for full reload flow (unit tests only) +- ⚠️ No stress test for concurrent reload + report generation +- ⚠️ Thread safety tests don't cover deadlock scenario identified above + +--- + +### Documentation + +**Status:** ⚠️ INCOMPLETE + +**Present:** +- ✅ Function-level comments for new APIs +- ✅ Updated README with reload instructions + +**Missing:** +- ❌ Architecture doc update (reload flow diagram) +- ❌ API reference for new rbus method +- ❌ Lock ordering documentation +- ❌ Performance impact analysis + +--- + +## Security & Privacy + +### Input Validation +- ✅ Profile JSON validated against schema +- ✅ rbus method parameters type-checked +- ⚠️ Profile name not sanitized (potential path traversal if used in file operations) + +### Privilege Escalation +- ✅ No new privileged operations +- ✅ rbus ACLs unchanged + +### Data Exposure +- ✅ No sensitive data in logs +- ✅ Profile content not logged at INFO level + +--- + +## Regression Analysis + +### High-Risk Areas + +1. **Scheduler Deadlock** 🔴 CRITICAL + - **Location**: [scheduler.c:342](source/scheduler/scheduler.c#L342) + - **Scenario**: Profile reload during scheduled timeout + - **Impact**: System hang, watchdog reset, loss of telemetry + - **Probability**: MEDIUM (requires precise timing, but occurs in production workload) + - **Mitigation**: Fix lock ordering before merge + +2. **Memory Leak During Reload** 🟡 MEDIUM + - **Location**: [profile.c:167](source/bulkdata/profile.c#L167) + - **Scenario**: Profile reloaded multiple times + - **Impact**: ~2KB leak per reload, OOM after ~500 reloads + - **Probability**: LOW (reloads are rare in production) + - **Mitigation**: Can be addressed post-merge with monitoring + +3. **rbus Error Code Change** 🟡 MEDIUM + - **Location**: [rbusInterface.c:89](source/ccspinterface/rbusInterface.c#L89) + - **Scenario**: Caller expects SUCCESS but gets ERROR + - **Impact**: Caller might retry unnecessarily or fail operation + - **Probability**: HIGH (any existing caller affected) + - **Mitigation**: Add backward compatibility flag or version new API + +### Potential Failure Modes + +| Scenario | Symptom | Detectability | Impact | +|----------|---------|---------------|---------| +| Deadlock during reload | System hang | High (watchdog fires) | CRITICAL | +| Memory leak accumulation | OOM after days | Medium (slow growth) | HIGH | +| Race in profile state | Wrong profile used | Low (silent) | MEDIUM | +| Invalid config accepted | Malformed reports | High (logs) | LOW | + +### Comparison with Previous Similar Changes + +**PR #87** (Add profile deletion) +- Also modified scheduler and profile modules +- Had similar lock ordering issue → caused production incident +- **Lesson**: Extra scrutiny needed for scheduler + profile interactions + +--- + +## Recommendations + +### 🔴 MUST FIX (Blocking) + +1. **Fix deadlock scenario** ([scheduler.c:342](source/scheduler/scheduler.c#L342)) + - Document lock ordering: `g_scheduler_mutex` → `g_profile_reload_mutex` + - Add runtime deadlock detection (DEBUG build) + - Refactor to use single lock or lock-free reload if possible + +2. **Fix memory leak** ([profile.c:167](source/bulkdata/profile.c#L167)) + ```c + // Before assignment: + cleanupProfileInternals(old_profile); + old_profile->data = new_profile->data; + ``` + +3. **Add NULL check** ([profile.c:223](source/bulkdata/profile.c#L223)) + ```c + char* name_copy = strdup(name); + if (!name_copy) { + return ERR_NO_MEMORY; + } + ``` + +### 🟡 SHOULD FIX (Before Merge) + +4. **Add integration test** for full reload flow + - Test: Start daemon → load profile → send reports → reload profile → verify new config active + +5. **Add stress test** for concurrent operations + - Test: Reload profile while reports generating (run 1000 iterations) + +6. **Document lock ordering** in scheduler.h and profile.h + ```c + /** + * Lock ordering: + * 1. g_scheduler_mutex (scheduler.c) + * 2. g_profile_reload_mutex (profile.c) + * + * Always acquire in this order to prevent deadlock. + */ + ``` + +7. **Fix rbus API compatibility** + - Option A: Add `v2` suffix to new method, keep old behavior + - Option B: Add config flag `UseV2ErrorCodes=false` for rollback + +8. **Update documentation** + - Add reload flow diagram to architecture doc + - Update API reference + - Add performance notes (reload takes ~50ms) + +### 🟢 CONSIDER (Future Improvements) + +9. **Add telemetry for reload operations** + - Count: successful/failed reloads + - Timing: reload duration histogram + - Errors: categorized by failure type + +10. **Optimize reload path** + - Current: Full profile teardown + rebuild + - Better: Diff old vs new, update only changed fields + +11. **Add reload rate limiting** + - Prevent rapid reload spamming + - Max 1 reload per 60 seconds per profile + +--- + +## Testing Checklist + +Before merging, verify: + +### Functionality +- [ ] Profile reload succeeds with valid config +- [ ] Profile reload fails gracefully with invalid config +- [ ] Existing profiles unaffected by reload of different profile +- [ ] Report generation continues during reload +- [ ] Reloaded profile immediately uses new configuration + +### Memory Safety +- [ ] Valgrind shows no leaks for reload operation +- [ ] Valgrind clean on error paths (invalid config, failed allocation) +- [ ] No use-after-free in concurrent profile access +- [ ] Memory usage stable over 1000+ reloads + +### Thread Safety +- [ ] Helgrind clean (no race conditions detected) +- [ ] No deadlocks under stress test (1000 iterations) +- [ ] Lock contention acceptable (profiled with perf) +- [ ] Scheduler timeouts continue firing during reload + +### Platform Testing +- [ ] Tested on target hardware (not just x86 VM) +- [ ] Tested on low-memory device (256MB RAM) +- [ ] Tested on busy system (>80% CPU) +- [ ] Cross-compiler build succeeds + +### Integration +- [ ] Unit tests pass (95%+ coverage on new code) +- [ ] Integration tests pass (full end-to-end flow) +- [ ] No CI failures +- [ ] Static analysis clean (cppcheck, scan-build) + +### Documentation +- [ ] README updated +- [ ] API docs updated +- [ ] Release notes drafted +- [ ] Known issues documented + +--- + +## Sign-Off + +**Code Review Status:** ⚠️ CONDITIONAL APPROVAL + +**Summary:** Solid implementation of profile reload feature with good test coverage. However, critical deadlock scenario must be addressed before merge. Memory leak and API compatibility concerns should also be resolved. + +**Approval Conditions:** +1. Fix deadlock issue +2. Fix memory leak +3. Add NULL check for strdup +4. Add integration test +5. Document lock ordering + +Once these items are addressed, re-run review or fast-track with senior engineer approval. + +--- + +**Review completed by GitHub Copilot Code Review Skill** +**For questions or concerns, consult the [review-checklist](references/review-checklist.md)** diff --git a/.github/skills/code-review/references/memory-patterns.md b/.github/skills/code-review/references/memory-patterns.md new file mode 100644 index 00000000..295e6e61 --- /dev/null +++ b/.github/skills/code-review/references/memory-patterns.md @@ -0,0 +1,322 @@ +# Memory Safety Patterns for Code Review + +## Pattern 1: Allocation with Error Check + +### ✅ CORRECT +```c +char* buffer = (char*)malloc(size); +if (!buffer) { + T2Error("Failed to allocate %zu bytes\n", size); + return ERR_MEMORY_ALLOCATION_FAILED; +} +// Use buffer +free(buffer); +``` + +### ❌ INCORRECT +```c +char* buffer = (char*)malloc(size); +strcpy(buffer, data); // Crash if malloc failed +free(buffer); +``` + +## Pattern 2: Error Path Cleanup (Single Exit) + +### ✅ CORRECT +```c +int process_data(const char* input) { + int ret = SUCCESS; + char* buffer = NULL; + FILE* fp = NULL; + + buffer = malloc(BUFFER_SIZE); + if (!buffer) { + ret = ERR_NO_MEMORY; + goto cleanup; + } + + fp = fopen(input, "r"); + if (!fp) { + ret = ERR_FILE_OPEN; + goto cleanup; + } + + // Process data... + +cleanup: + if (buffer) free(buffer); + if (fp) fclose(fp); + return ret; +} +``` + +### ❌ INCORRECT (Memory Leak) +```c +int process_data(const char* input) { + char* buffer = malloc(BUFFER_SIZE); + FILE* fp = fopen(input, "r"); + + if (!fp) return ERR_FILE_OPEN; // Leaked buffer + + // Process data... + + free(buffer); + fclose(fp); + return SUCCESS; +} +``` + +## Pattern 3: String Handling + +### ✅ CORRECT +```c +char dest[MAX_SIZE]; +strncpy(dest, source, MAX_SIZE - 1); +dest[MAX_SIZE - 1] = '\0'; // Ensure NULL termination +``` + +### ✅ BETTER (Helper Function) +```c +void safe_strcpy(char* dest, const char* src, size_t dest_size) { + if (dest_size > 0) { + strncpy(dest, src, dest_size - 1); + dest[dest_size - 1] = '\0'; + } +} +``` + +### ❌ INCORRECT +```c +char dest[MAX_SIZE]; +strcpy(dest, source); // Buffer overflow if source > MAX_SIZE +``` + +## Pattern 4: Dynamic String Allocation + +### ✅ CORRECT +```c +size_t len = strlen(source) + 1; // +1 for NULL terminator +char* dest = (char*)malloc(len); +if (!dest) { + return ERR_NO_MEMORY; +} +memcpy(dest, source, len); +// Later: free(dest); +``` + +### ❌ INCORRECT (Off-by-one) +```c +char* dest = (char*)malloc(strlen(source)); // Missing +1 +strcpy(dest, source); // Buffer overflow +``` + +## Pattern 5: Avoid Use-After-Free + +### ✅ CORRECT +```c +free(ptr); +ptr = NULL; // Invalidate pointer + +// Later... +if (ptr != NULL) { + // Safe: won't use freed memory +} +``` + +### ❌ INCORRECT +```c +free(ptr); +// ptr still points to freed memory + +// Later... +strcpy(ptr, data); // Use-after-free! +``` + +## Pattern 6: Double-Free Prevention + +### ✅ CORRECT +```c +void cleanup(Context* ctx) { + if (ctx) { + if (ctx->buffer) { + free(ctx->buffer); + ctx->buffer = NULL; // Prevent double-free + } + if (ctx->data) { + free(ctx->data); + ctx->data = NULL; + } + } +} +``` + +### ❌ INCORRECT +```c +void cleanup(Context* ctx) { + free(ctx->buffer); // What if already freed? + free(ctx->data); // Potential double-free +} +``` + +## Pattern 7: Array Bounds Checking + +### ✅ CORRECT +```c +for (int i = 0; i < count && i < MAX_ITEMS; i++) { + array[i] = data[i]; +} +``` + +### ❌ INCORRECT +```c +for (int i = 0; i < count; i++) { + array[i] = data[i]; // Buffer overflow if count > MAX_ITEMS +} +``` + +## Pattern 8: Safe realloc + +### ✅ CORRECT +```c +char* temp = (char*)realloc(buffer, new_size); +if (!temp) { + T2Error("realloc failed\n"); + free(buffer); // Original buffer still valid + return ERR_NO_MEMORY; +} +buffer = temp; +``` + +### ❌ INCORRECT (Memory Leak) +```c +buffer = (char*)realloc(buffer, new_size); +if (!buffer) { + // Lost reference to original buffer! + return ERR_NO_MEMORY; +} +``` + +## Pattern 9: Struct with Dynamic Members + +### ✅ CORRECT +```c +typedef struct { + char* name; + char* value; +} Item; + +void free_item(Item* item) { + if (item) { + free(item->name); + free(item->value); + free(item); + } +} + +Item* create_item(const char* name, const char* value) { + Item* item = (Item*)calloc(1, sizeof(Item)); + if (!item) return NULL; + + item->name = strdup(name); + if (!item->name) { + free(item); + return NULL; + } + + item->value = strdup(value); + if (!item->value) { + free(item->name); + free(item); + return NULL; + } + + return item; +} +``` + +## Pattern 10: Avoid Stack Overflow + +### ✅ CORRECT (Small Array) +```c +char buffer[256]; // Small, OK on stack +``` + +### ✅ CORRECT (Large Array) +```c +char* buffer = (char*)malloc(LARGE_SIZE); // Heap for large data +if (!buffer) return ERR_NO_MEMORY; +// Use buffer... +free(buffer); +``` + +### ❌ INCORRECT (Stack Overflow Risk) +```c +char buffer[1024 * 1024]; // 1MB on stack - too large! +``` + +## Pattern 11: Const Correctness (Avoid Unexpected Modifications) + +### ✅ CORRECT +```c +void process(const char* input) { + // Can't modify input + char* copy = strdup(input); + if (copy) { + modify(copy); + free(copy); + } +} +``` + +## Pattern 12: NULL Safety in Cleanup + +### ✅ CORRECT +```c +void cleanup(Data* data) { + if (data == NULL) return; // Guard against NULL + + if (data->buffer) free(data->buffer); + if (data->file) fclose(data->file); + if (data->mutex) { + pthread_mutex_destroy(data->mutex); + free(data->mutex); + } +} +``` + +## Red Flags to Look For + +1. **malloc/calloc without NULL check** +2. **strcpy/strcat/sprintf (use bounded versions)** +3. **Return before cleanup on error path** +4. **Pointer used after free (no NULL assignment)** +5. **Array index without bounds check** +6. **Large arrays on stack** +7. **realloc result assigned directly to original pointer** +8. **Missing free for every malloc** +9. **Buffer size calculations with off-by-one errors** +10. **Strdup without considering NULL return** + +## Valgrind Signals to Watch + +When reviewing changes, recommend valgrind if: +- New malloc/calloc/realloc/free calls +- Complex pointer manipulation +- String handling code +- Data structure modifications +- Error path additions + +```bash +valgrind --leak-check=full \ + --show-leak-kinds=all \ + --track-origins=yes \ + --verbose \ + ./test_binary +``` + +Look for: +- **Definitely lost**: Memory leak (must fix) +- **Indirectly lost**: Leaked structure with allocated members +- **Possibly lost**: Pointer arithmetic issue +- **Still reachable**: Not freed but still tracked (often acceptable for globals) +- **Invalid read/write**: Buffer overflow or use-after-free diff --git a/.github/skills/code-review/references/quick-assessment-guide.md b/.github/skills/code-review/references/quick-assessment-guide.md new file mode 100644 index 00000000..6aaf69b2 --- /dev/null +++ b/.github/skills/code-review/references/quick-assessment-guide.md @@ -0,0 +1,323 @@ +# Quick Assessment Guide + +Fast reference for generating code reviews. Use this to streamline the analysis process. + +## Risk Scoring Matrix + +### Scope Score (0-10) +| Files Changed | Modules | LOC | Score | +|---------------|---------|-----|-------| +| 1-5 | 1 | <100 | 1-2 | +| 6-15 | 1-2 | 100-500 | 3-5 | +| 16-30 | 2-4 | 500-1000 | 6-8 | +| >30 | >4 | >1000 | 9-10 | + +### Criticality Score (0-10) +| Component Type | Examples | Score | +|----------------|----------|-------| +| Peripheral | Test code, docs, comments | 1-2 | +| Configuration | Config files, schemas | 3-4 | +| Utilities | Helper functions, parsing | 5-6 | +| Core Logic | Scheduler, profile mgmt, report gen | 7-8 | +| Critical Path | Bus interface, data collection | 9-10 | + +### Complexity Score (0-10) +| Change Type | Indicators | Score | +|-------------|------------|-------| +| Trivial | Typo, comment, doc | 1 | +| Simple | Single function, <20 LOC | 2-3 | +| Moderate | Multi-function, refactor | 4-6 | +| Complex | Algorithm change, state machine | 7-8 | +| Very Complex | Concurrency change, protocol change | 9-10 | + +**Complexity Indicators:** +- Cyclomatic complexity >15: +2 +- Nesting depth >4: +1 +- New thread/lock: +2 +- Recursive function: +1 + +### Safety Score (0-10) +| Issues Found | Score | +|--------------|-------| +| None (clean review) | 0 | +| Minor style issues | 1-2 | +| Missing NULL checks | 3-4 | +| Potential race condition | 5-6 | +| Memory leak confirmed | 7-8 | +| Deadlock scenario | 9-10 | + +**Auto-score based on findings:** +- Each memory leak: +2 +- Each race condition: +2 +- Each potential deadlock: +3 +- Each buffer overflow: +3 +- Use-after-free: +4 + +**Coverity Defects (from PR comments):** +- Check for comments from user "rdkcmf-jenkins" or titles starting with "Coverity Issue" +- Each HIGH severity Coverity defect: +3 +- Each MEDIUM severity Coverity defect: +2 +- Each LOW severity Coverity defect: +1 +- Cap at 10 for safety score + +### Testing Score (0-10) +| Test Status | Score | +|-------------|-------| +| Comprehensive (>90% coverage, integration tests) | 0-1 | +| Good (>75% coverage, unit tests complete) | 2-3 | +| Basic (>50% coverage, some tests) | 4-6 | +| Minimal (<50% coverage, few tests) | 7-8 | +| None (no tests added) | 9-10 | + +--- + +## Risk Level Calculation + +**Total Score** = Scope + Criticality + Complexity + Safety + Testing + +| Total | Risk Level | Symbol | Action | +|-------|-----------|---------|---------| +| 0-10 | LOW | 🟢 | Standard review | +| 11-20 | MEDIUM | 🟡 | Careful review, validate with tools | +| 21-35 | HIGH | 🔴 | Deep review, testing required | +| 36-50 | CRITICAL | ⚫ | Block merge, full analysis needed | + +--- + +## Module Classification + +Map file paths to modules for change tree: + +| Path Pattern | Module | Criticality | +|--------------|--------|-------------| +| `source/bulkdata/*.c` | Bulk Data Collection | High | +| `source/scheduler/*.c` | Scheduling Engine | Critical | +| `source/reportgen/*.c` | Report Generation | High | +| `source/ccspinterface/*.c` | Bus Interface | Critical | +| `source/protocol/http/*.c` | HTTP Transport | High | +| `source/protocol/rbusMethod/*.c` | rbus Transport | High | +| `source/t2parser/*.c` | Profile Parser | Medium | +| `source/dcautil/*.c` | DCA Utilities | Medium | +| `source/utils/*.c` | Common Utilities | Medium | +| `source/privacycontrol/*.c` | Privacy Control | High | +| `source/t2dm/*.c` | Data Model Plugin | Medium | +| `source/t2ssp/*.c` | SSP Main | High | +| `source/test/*.cpp` | Unit Tests | Low | +| `source/testApp/*.c` | Test Application | Low | +| `*.am`, `*.ac` | Build System | Medium | +| `config/*` | Configuration | Medium | +| `schemas/*` | JSON Schemas | Low | +| `docs/*` | Documentation | Low | + +--- + +## Change Indicators (for Mermaid diagrams) + +**Symbols:** +- ✅ Test coverage added (positive) +- 🟢 Low risk change +- 🟡 Medium risk (review carefully) +- ⚠️ Safety concern flagged +- 🔴 High risk (requires attention) +- ⚫ Critical issue (blocking) + +**Mermaid CSS Classes:** +```css +classDef critical fill:#ff6b6b,stroke:#c92a2a,color:#fff /* Red - High risk */ +classDef warning fill:#ffd43b,stroke:#f08c00,color:#000 /* Yellow - Warning */ +classDef medium fill:#ffa94d,stroke:#fd7e14,color:#000 /* Orange - Medium */ +classDef safe fill:#51cf66,stroke:#2f9e44,color:#fff /* Green - Safe/Tests */ +classDef neutral fill:#e0e0e0,stroke:#9e9e9e,color:#000 /* Gray - Neutral */ +``` + +--- + +## Quick Memory Safety Check + +When analyzing C files, scan for these patterns: + +### 🔴 RED FLAGS (must fix) +- `strcpy`, `strcat`, `sprintf`, `gets` +- `malloc`/`calloc` without NULL check +- Return before `free` on error path +- Pointer used after `free` +- Array access without bounds check +- `realloc` result assigned directly to original pointer + +### 🟡 YELLOW FLAGS (verify carefully) +- `strncpy`, `snprintf` (check NULL termination) +- `memcpy` (check for overlap, size validation) +- Large stack arrays (>4KB) +- Complex pointer arithmetic +- cJSON functions (check for free) +- Vector/list operations (cleanup elements?) + +### ✅ GREEN PATTERNS (good) +- `goto cleanup` single-exit pattern +- Pointer set to NULL after free +- Defensive NULL checks before use +- Bounded string operations with explicit NULL +- RAII wrappers (C++ tests) + +--- + +## Quick Thread Safety Check + +### 🔴 RED FLAGS (must fix) +- Shared global modified without mutex +- Multiple locks acquired in different orders +- Condition variable without predicate loop +- Long operation under lock (>10ms) +- Thread created without join or detach +- Recursive mutex usage + +### 🟡 YELLOW FLAGS (verify carefully) +- New mutex or lock introduced +- Lock held across function call +- Callback invoked under lock +- Complex lock/unlock patterns +- Lock with goto (verify error paths) + +### ✅ GREEN PATTERNS (good) +- Consistent lock ordering documented +- Short critical sections (<1ms) +- Condition variable with while loop +- Thread stack size explicitly set +- Lock released before blocking operation +- Timeout on lock acquisition + +--- + +## API Compatibility Quick Check + +### Breaking Changes (high risk) +- ✗ Function signature modified +- ✗ Struct member reordered +- ✗ Enum value changed +- ✗ Public function deleted +- ✗ Return value semantics changed + +### Safe Changes (low risk) +- ✓ New function added +- ✓ New struct member at end +- ✓ New enum value at end +- ✓ Internal function changed +- ✓ Implementation detail changed + +--- + +## Priority Flagging + +When generating recommendations, prioritize: + +### P0 (MUST FIX - blocking) +- Deadlock scenarios +- Memory corruption (buffer overflow, use-after-free) +- NULL pointer dereferences +- Race conditions in critical path +- API/ABI breakage + +### P1 (SHOULD FIX - before merge) +- Memory leaks +- Missing error handling +- Test coverage gaps (<80% on new code) +- Documentation missing +- Performance regressions + +### P2 (CONSIDER - post-merge) +- Code style inconsistencies +- Minor optimization opportunities +- Refactoring suggestions +- Additional test scenarios +- Enhanced error messages + +--- + +## Review Flow Checklist + +Follow this order for efficient review generation: + +1. **Fetch PR metadata** (title, description, files changed) +2. **Calculate risk scores** (use matrices above) +3. **Classify files by module** (use module table) +4. **Generate change tree** (ASCII visualization) +5. **Analyze each file**: + - Run memory safety patterns + - Run thread safety patterns + - Check against common pitfalls + - Calculate local risk +6. **Identify cross-cutting concerns** (build, config, tests) +7. **Aggregate findings** (group by severity) +8. **Generate recommendations** (prioritize by P0/P1/P2) +9. **Create checklist** (actionable items) +10. **Write executive summary** (2-3 sentences) + +--- + +## Common Review Shortcuts + +### For small PRs (<5 files, <100 LOC) +- Skip detailed module breakdown +- Focus on changed functions only +- Brief risk assessment +- Quick checklist + +### For doc-only PRs +- Verify accuracy +- Check formatting +- Ensure completeness +- Skip code analysis sections + +### For test-only PRs +- Verify test quality (determinism, coverage) +- Check for test resource leaks +- Ensure setup/teardown balanced +- Brief risk summary only + +### For config-only PRs +- Validate against schema +- Check backward compatibility +- Verify default values +- Document impact + +--- + +## Time-Saving Tips + +1. **Use grep patterns** for quick scans: + ```bash + grep -rn "malloc\|calloc\|free" source/ + grep -rn "pthread_mutex\|pthread_cond" source/ + grep -rn "strcpy\|sprintf\|gets" source/ + ``` + +2. **Focus on diff hunks**, not entire files + +3. **Reuse analysis** from previous similar PRs + +4. **Skip generated code** (e.g., protocol buffers) + +5. **Batch similar issues** rather than listing individually + +6. **Reference line numbers** for specific issues only + +7. **Use tables** for repetitive data (better than prose) + +--- + +## Output Structure Template + +```markdown +# Title + Overview (5% of content) +## Executive Summary (5%) +## Risk Assessment (10%) +## Change Tree (5%) +## Module Analysis (50%) + - Per module: files, changes, impacts +## Cross-Cutting (10%) +## Regression Analysis (10%) +## Recommendations (5%) +## Checklist (5%) +``` + +Aim for 1500-3000 words total, adjust based on PR size. diff --git a/.github/skills/code-review/references/review-checklist.md b/.github/skills/code-review/references/review-checklist.md new file mode 100644 index 00000000..20226ba4 --- /dev/null +++ b/.github/skills/code-review/references/review-checklist.md @@ -0,0 +1,225 @@ +# Code Review Checklist for Telemetry Embedded Systems + +## General Requirements + +- [ ] Code follows project naming conventions +- [ ] Comments explain WHY, not WHAT +- [ ] No debug/dead code committed +- [ ] Commit messages are meaningful +- [ ] Changes align with PR description + +## Memory Safety (C/C++) + +### Allocations +- [ ] All `malloc`/`calloc`/`realloc` return values checked +- [ ] Every allocation has a corresponding `free` +- [ ] No memory leaks on error paths +- [ ] Error paths clean up in reverse order +- [ ] Use of `strdup` is justified (prefer stack when possible) + +### Pointers +- [ ] Pointers initialized to NULL +- [ ] NULL checks before dereference +- [ ] No use-after-free scenarios +- [ ] No double-free possible +- [ ] Pointer invalidation after free (`ptr = NULL`) + +### String Operations +- [ ] `strcpy` → replaced with `strncpy` or bounds-checked +- [ ] `sprintf` → replaced with `snprintf` +- [ ] `strcat` → buffer size validated +- [ ] `gets` → NEVER used (immediate rejection) +- [ ] String buffers always NULL-terminated + +### Buffer Safety +- [ ] Array bounds validated before access +- [ ] Loop conditions prevent overruns +- [ ] `memcpy` size validated, no overlap +- [ ] Fixed-size buffers avoid magic numbers (use sizeof) + +## Thread Safety + +### Shared Data +- [ ] Shared variables protected by mutex +- [ ] Lock ordering is consistent (prevent deadlocks) +- [ ] Critical sections are minimal +- [ ] Read-only access considered for atomics + +### Synchronization +- [ ] Mutex initialization checked (`pthread_mutex_init`) +- [ ] Mutexes properly destroyed on cleanup +- [ ] Condition variables paired with predicates +- [ ] No locks held across blocking operations unless necessary + +### Thread Management +- [ ] Thread creation specifies stack size +- [ ] Thread joins or detaches (no zombie threads) +- [ ] Thread-local storage used appropriately +- [ ] No busy-wait loops (use condition variables) + +## Resource Constraints + +### Memory Usage +- [ ] Prefer stack allocation over heap for small objects +- [ ] Memory pools considered for frequent allocations +- [ ] Large buffers justified by requirements +- [ ] No unbounded growth (arrays, queues, caches) + +### CPU Usage +- [ ] Algorithms are O(n) or better where possible +- [ ] No unnecessary loops or redundant operations +- [ ] Expensive operations avoided in hot paths +- [ ] Consider caching for repeated calculations + +## Error Handling + +### Return Values +- [ ] All function return values checked +- [ ] Error codes are meaningful and documented +- [ ] Errors propagated to caller when appropriate +- [ ] No silent failures (all errors logged) + +### Logging +- [ ] Error conditions logged with context +- [ ] Log levels appropriate (ERROR, WARN, INFO, DEBUG) +- [ ] No sensitive data in logs +- [ ] Logs sufficient for field debugging + +### Failure Modes +- [ ] All error paths tested +- [ ] System degrades gracefully +- [ ] No unbounded retries +- [ ] Timeouts implemented where needed + +## API Design + +### Backward Compatibility +- [ ] Existing function signatures unchanged (or versioned) +- [ ] Struct layouts maintain ABI compatibility +- [ ] Enum values not reordered +- [ ] New APIs clearly documented + +### Function Design +- [ ] Functions do one thing well (single responsibility) +- [ ] Function names are descriptive +- [ ] Parameter validation at entry +- [ ] Output parameters clearly documented + +### Header Files +- [ ] Include guards present +- [ ] Minimize dependencies (forward declarations) +- [ ] Public vs private APIs separated +- [ ] No implementation in headers (except inline) + +## Platform Independence + +### Portable Code +- [ ] No platform-specific types without abstraction +- [ ] Endianness handled explicitly if needed +- [ ] Size assumptions avoided (use sizeof, stdint.h) +- [ ] Conditional compilation minimized + +### System Calls +- [ ] System calls error-checked +- [ ] Alternatives available for missing APIs +- [ ] File paths use portable separators +- [ ] Network byte order handled for network data + +## Build System + +### Makefile Changes (*.am) +- [ ] Dependencies complete and minimal +- [ ] Compiler flags justified +- [ ] Link order correct +- [ ] No absolute paths + +### Configuration (*.ac) +- [ ] Feature detection used (not hard-coded) +- [ ] Optional dependencies handled +- [ ] Cross-compilation supported +- [ ] Platform detection accurate + +## Testing + +### Unit Tests +- [ ] New functionality has unit tests +- [ ] Edge cases covered +- [ ] Negative tests included (error paths) +- [ ] Mocks used appropriately (not excessively) + +### Test Quality +- [ ] Tests are deterministic (no flakiness) +- [ ] Test names describe what is tested +- [ ] Assertions have meaningful messages +- [ ] Setup/teardown prevent state leaks + +### Coverage +- [ ] All new functions tested +- [ ] Branch coverage >80% for new code +- [ ] Critical paths have integration tests +- [ ] Regression tests for bug fixes + +## Documentation + +### Code Documentation +- [ ] Public functions documented (purpose, params, return) +- [ ] Complex algorithms explained +- [ ] Assumptions documented +- [ ] TODOs tracked with tickets + +### External Documentation +- [ ] README.md updated if needed +- [ ] API documentation updated +- [ ] Configuration examples provided +- [ ] Migration guide for breaking changes + +## Embedded-Specific Concerns + +### Telemetry 2.0 Framework +- [ ] Profile configurations validated against schema +- [ ] Report generation doesn't block data collection +- [ ] Scheduler changes maintain timing guarantees +- [ ] Bus interface changes compatible with both CCSP and rbus + +### RDK Integration +- [ ] Changes tested on target hardware (or simulator) +- [ ] No increase in boot time +- [ ] Watchdog timeouts respected +- [ ] Log rotation considerations + +### Performance +- [ ] Startup time impact measured +- [ ] Memory footprint delta calculated +- [ ] CPU usage profiled for new loops/threads +- [ ] No I/O in critical paths unless unavoidable + +## Security + +### Input Validation +- [ ] All external inputs validated +- [ ] Buffer sizes enforced +- [ ] No arbitrary command execution +- [ ] Path traversal prevented + +### Data Protection +- [ ] No hardcoded credentials +- [ ] Sensitive data redacted in logs +- [ ] Privacy controls respected +- [ ] No unnecessary privilege escalation + +## Risk Assessment Questions + +1. **What is the blast radius if this change has a bug?** + - Single profile? All profiles? System stability? + +2. **Can this change cause a device to become unresponsive?** + - Deadlocks? Infinite loops? Resource exhaustion? + +3. **How will this fail in production?** + - Graceful degradation? Hard crash? Silent corruption? + +4. **What's the rollback plan?** + - Configuration revert? Firmware downgrade? Feature flag? + +5. **What operational visibility exists?** + - Logs? Metrics? Debug capabilities? diff --git a/.github/skills/code-review/references/thread-patterns.md b/.github/skills/code-review/references/thread-patterns.md new file mode 100644 index 00000000..5008819b --- /dev/null +++ b/.github/skills/code-review/references/thread-patterns.md @@ -0,0 +1,488 @@ +# Thread Safety Patterns for Code Review + +## Pattern 1: Mutex-Protected Shared Data + +### ✅ CORRECT +```c +typedef struct { + pthread_mutex_t mutex; + int counter; + char* shared_data; +} SharedContext; + +void increment_counter(SharedContext* ctx) { + pthread_mutex_lock(&ctx->mutex); + ctx->counter++; + pthread_mutex_unlock(&ctx->mutex); +} +``` + +### ❌ INCORRECT (Race Condition) +```c +typedef struct { + int counter; // No mutex protection +} SharedContext; + +void increment_counter(SharedContext* ctx) { + ctx->counter++; // Race condition! +} +``` + +## Pattern 2: Consistent Lock Ordering (Avoid Deadlock) + +### ✅ CORRECT +```c +// Always acquire in same order: mutex_a then mutex_b +void operation1() { + pthread_mutex_lock(&mutex_a); + pthread_mutex_lock(&mutex_b); + // Critical section + pthread_mutex_unlock(&mutex_b); + pthread_mutex_unlock(&mutex_a); +} + +void operation2() { + pthread_mutex_lock(&mutex_a); // Same order + pthread_mutex_lock(&mutex_b); + // Critical section + pthread_mutex_unlock(&mutex_b); + pthread_mutex_unlock(&mutex_a); +} +``` + +### ❌ INCORRECT (Deadlock Risk) +```c +void operation1() { + pthread_mutex_lock(&mutex_a); + pthread_mutex_lock(&mutex_b); + // ... +} + +void operation2() { + pthread_mutex_lock(&mutex_b); // Opposite order - DEADLOCK! + pthread_mutex_lock(&mutex_a); + // ... +} +``` + +## Pattern 3: Condition Variable Usage + +### ✅ CORRECT +```c +pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; +pthread_cond_t cond = PTHREAD_COND_INITIALIZER; +bool ready = false; + +// Producer +void signal_ready() { + pthread_mutex_lock(&mutex); + ready = true; // Set predicate + pthread_cond_signal(&cond); + pthread_mutex_unlock(&mutex); +} + +// Consumer +void wait_ready() { + pthread_mutex_lock(&mutex); + while (!ready) { // WHILE loop, not IF + pthread_cond_wait(&cond, &mutex); + } + // Ready is true here + pthread_mutex_unlock(&mutex); +} +``` + +### ❌ INCORRECT (Missing Predicate) +```c +void wait_ready() { + pthread_mutex_lock(&mutex); + pthread_cond_wait(&cond, &mutex); // No predicate check! + // Spurious wakeup possible + pthread_mutex_unlock(&mutex); +} +``` + +## Pattern 4: Thread Creation with Stack Size + +### ✅ CORRECT (Embedded System) +```c +pthread_t thread; +pthread_attr_t attr; + +pthread_attr_init(&attr); +pthread_attr_setstacksize(&attr, 256 * 1024); // 256KB explicit + +int ret = pthread_create(&thread, &attr, thread_func, arg); +if (ret != 0) { + T2Error("pthread_create failed: %d\n", ret); +} +pthread_attr_destroy(&attr); +``` + +### ❌ INCORRECT (Default Stack) +```c +pthread_t thread; +pthread_create(&thread, NULL, thread_func, arg); // Unknown stack size +``` + +## Pattern 5: Initialization Safety + +### ✅ CORRECT +```c +static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; +static pthread_once_t once_control = PTHREAD_ONCE_INIT; + +void init_once() { + // Initialization code (runs exactly once) +} + +void safe_init() { + pthread_once(&once_control, init_once); +} +``` + +### ✅ CORRECT (Dynamic Init) +```c +pthread_mutex_t* mutex; + +int init_mutex() { + mutex = (pthread_mutex_t*)malloc(sizeof(pthread_mutex_t)); + if (!mutex) return -1; + + int ret = pthread_mutex_init(mutex, NULL); + if (ret != 0) { + free(mutex); + return -1; + } + return 0; +} +``` + +### ❌ INCORRECT +```c +pthread_mutex_t mutex; // Not initialized! +pthread_mutex_lock(&mutex); // Undefined behavior +``` + +## Pattern 6: Read-Write Lock for Read-Heavy Workloads + +### ✅ CORRECT (Multiple Readers) +```c +pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER; + +void read_data() { + pthread_rwlock_rdlock(&rwlock); + // Read shared data (multiple readers OK) + pthread_rwlock_unlock(&rwlock); +} + +void write_data() { + pthread_rwlock_wrlock(&rwlock); + // Modify shared data (exclusive) + pthread_rwlock_unlock(&rwlock); +} +``` + +## Pattern 7: Atomic Operations (Lockless) + +### ✅ CORRECT (C11 Atomics) +```c +#include + +atomic_int counter = ATOMIC_VAR_INIT(0); + +void increment() { + atomic_fetch_add(&counter, 1); // Thread-safe +} + +int get_value() { + return atomic_load(&counter); +} +``` + +### ✅ CORRECT (GCC Built-ins for embedded) +```c +static int counter = 0; + +void increment() { + __sync_fetch_and_add(&counter, 1); // Thread-safe because the built-in is atomic +} +``` + +## Pattern 8: Thread Cleanup + +### ✅ CORRECT +```c +typedef struct { + pthread_t thread; + bool running; + pthread_mutex_t mutex; +} ThreadContext; + +void stop_thread(ThreadContext* ctx) { + pthread_mutex_lock(&ctx->mutex); + ctx->running = false; + pthread_mutex_unlock(&ctx->mutex); + + pthread_join(ctx->thread, NULL); // Wait for termination + pthread_mutex_destroy(&ctx->mutex); +} +``` + +### ❌ INCORRECT (Resource Leak) +```c +void stop_thread(ThreadContext* ctx) { + ctx->running = false; // No synchronization + // Missing pthread_join - zombie thread! +} +``` + +## Pattern 9: Minimize Lock Hold Time + +### ✅ CORRECT +```c +void process_item(Queue* queue) { + Item* item = NULL; + + pthread_mutex_lock(&queue->mutex); + item = dequeue(queue); + pthread_mutex_unlock(&queue->mutex); // Release early + + if (item) { + expensive_operation(item); // Done outside lock + free(item); + } +} +``` + +### ❌ INCORRECT +```c +void process_item(Queue* queue) { + pthread_mutex_lock(&queue->mutex); + Item* item = dequeue(queue); + + if (item) { + expensive_operation(item); // Holding lock too long! + } + + pthread_mutex_unlock(&queue->mutex); +} +``` + +## Pattern 10: Thread-Local Storage + +### ✅ CORRECT (C11 standard — use when the toolchain/runtime supports C11 TLS) +```c +#include /* C11 threads API; availability varies by embedded toolchain */ + +_Thread_local char error_buffer[256]; /* Standard C11 TLS, but not universally available on all embedded toolchains */ + +void set_error(const char* msg) { + strncpy(error_buffer, msg, sizeof(error_buffer) - 1); + error_buffer[sizeof(error_buffer) - 1] = '\0'; +} + +const char* get_error() { + return error_buffer; /* Thread-safe */ +} +``` + +### ✅ CORRECT (POSIX — maximum portability for pre-C11 and embedded toolchains) +```c +#include + +static pthread_key_t error_key; +static pthread_once_t key_once = PTHREAD_ONCE_INIT; + +static void make_key(void) { + pthread_key_create(&error_key, free); +} + +void set_error(const char* msg) { + pthread_once(&key_once, make_key); + char* buf = pthread_getspecific(error_key); + if (!buf) { + buf = malloc(256); + if (!buf) return; + pthread_setspecific(error_key, buf); + } + strncpy(buf, msg, 255); + buf[255] = '\0'; +} + +const char* get_error() { + pthread_once(&key_once, make_key); + return pthread_getspecific(error_key); /* Thread-safe */ +} +``` + +### ⚠️ COMPILER EXTENSION ONLY (`__thread`) +```c +/* __thread is a GCC/Clang extension — not standard C. + * Avoid in new code; prefer _Thread_local (C11) or pthread_key_t (POSIX). */ +__thread char error_buffer[256]; +``` + +## Pattern 11: Lock Timeout (Avoid Infinite Wait) + +### ✅ CORRECT (use `CLOCK_MONOTONIC` to avoid NTP/time-change disruption) +```c +struct timespec timeout; +/* CLOCK_MONOTONIC is preferred over CLOCK_REALTIME: it is not affected by + * NTP adjustments or manual clock changes, so the timeout is always + * relative to wall-clock elapsed time. */ +clock_gettime(CLOCK_MONOTONIC, &timeout); +timeout.tv_sec += 5; // 5 second timeout + +int ret = pthread_mutex_timedlock(&mutex, &timeout); +if (ret == ETIMEDOUT) { + T2Error("Mutex lock timeout\n"); + return ERR_TIMEOUT; +} +``` + +> **Note:** `pthread_mutex_timedlock` requires an *absolute* time based on +> `CLOCK_REALTIME` per POSIX. To combine the monotonic elapsed measurement +> with the required real-time absolute value, compute the delta from +> `CLOCK_MONOTONIC` and add it to the current `CLOCK_REALTIME` value, or use +> a platform-specific extension such as +> `pthread_condattr_setclock(CLOCK_MONOTONIC)` with a condition variable. +> On Linux (glibc ≥ 2.30) `pthread_mutex_clocklock` accepts `CLOCK_MONOTONIC` +> directly and is the cleanest solution when available. + +## Pattern 12: Volatile for Hardware Registers + +### ✅ CORRECT +```c +volatile uint32_t* const hardware_reg = (uint32_t*)0x40000000; + +void write_register(uint32_t value) { + *hardware_reg = value; // Compiler won't optimize away +} + +uint32_t read_register() { + return *hardware_reg; // Always reads from hardware +} +``` + +### ❌ INCORRECT (Shared Variable Between Threads) +```c +volatile int counter = 0; // volatile != thread-safe! + +// Still need mutex or atomics for thread safety +``` + +## Red Flags to Look For + +1. **Shared data without mutex protection** +2. **Multiple locks acquired in different orders** +3. **Condition variable without predicate loop** +4. **Missing pthread_join (zombie threads)** +5. **Mutex not initialized before use** +6. **Lock held across blocking I/O** +7. **Recursive locks (potential for deadlock)** +8. **Global state modified without synchronization** +9. **Memory barriers missing in lockless code** +10. **ThreadSanitizer warnings ignored** + +## Common Deadlock Scenarios + +### Scenario 1: Lock Ordering Violation +```c +// Thread A: Lock(A) → Lock(B) +// Thread B: Lock(B) → Lock(A) +// Result: DEADLOCK +``` + +### Scenario 2: Lock and Wait +```c +pthread_mutex_lock(&mutex); +pthread_join(thread, NULL); // Waiting for thread that needs mutex +pthread_mutex_unlock(&mutex); // DEADLOCK +``` + +### Scenario 3: Callback Under Lock +```c +pthread_mutex_lock(&mutex); +user_callback(); // Callback tries to lock same mutex +pthread_mutex_unlock(&mutex); // DEADLOCK +``` + +## Testing for Thread Safety + +### Helgrind (Valgrind) +```bash +valgrind --tool=helgrind ./program +``` +Detects: +- Data races +- Lock order violations +- Improper use of POSIX threading primitives + +### ThreadSanitizer (TSan) +```bash +gcc -fsanitize=thread -g program.c -o program +./program +``` +Detects: +- Data races +- Deadlocks +- Use of uninitialized mutexes + +## Telemetry 2.0 Specific Patterns + +### TimeoutThread Pattern +```c +// TimeoutThread uses condition variable with timeout +pthread_mutex_lock(&profile->mutex); +while (profile->running) { + struct timespec timeout; + // Calculate next timeout + int ret = pthread_cond_timedwait(&profile->cond, + &profile->mutex, + &timeout); + if (ret == ETIMEDOUT) { + // Trigger report generation + } +} +pthread_mutex_unlock(&profile->mutex); +``` + +### rbus Async Handler Pattern +```c +// Short-lived thread for async rbus operations +void* asyncMethodHandler(void* arg) { + Message* msg = (Message*)arg; + + // Process message (no shared state modifications) + rbus_sendData(msg->method, msg->data); + + // Cleanup + free(msg->data); + free(msg); + + return NULL; +} +``` + +### Profile State Protection +```c +// All profile state changes must be protected +pthread_mutex_lock(&profile->mutex); +profile->state = STATE_COLLECTING; +profile->last_report_time = time(NULL); +pthread_mutex_unlock(&profile->mutex); +``` + +## Review Checklist for Threading Changes + +- [ ] Shared data identified and protected +- [ ] Lock ordering documented and consistent +- [ ] Condition variables used with predicate loop +- [ ] Thread stack size specified +- [ ] Thread cleanup (join/detach) implemented +- [ ] Mutexes initialized and destroyed +- [ ] No locks held across blocking operations +- [ ] Timeouts prevent infinite waits +- [ ] Error paths release locks +- [ ] Tested with helgrind/TSan diff --git a/.gitignore b/.gitignore index 54a6d8ff..b1ea3412 100644 --- a/.gitignore +++ b/.gitignore @@ -42,3 +42,6 @@ source/**/.libs/ # Libtool wrapper scripts source/telemetry2_0 source/commonlib/telemetry2_0_client + +# Code review artifacts +reviews/ diff --git a/CHANGELOG.md b/CHANGELOG.md index ff758989..394e4522 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,9 +4,22 @@ All notable changes to this project will be documented in this file. Dates are d Generated by [`auto-changelog`](https://github.com/CookPete/auto-changelog). +#### [1.9.1](https://github.com/rdkcentral/telemetry/compare/1.9.0...1.9.1) + +> 29 April 2026 + +- RDKB-64487: TSAN fixes for various issues [`#350`](https://github.com/rdkcentral/telemetry/pull/350) +- RDKB-64487: Code changes for thread hardening and safety under concurrent load [`#345`](https://github.com/rdkcentral/telemetry/pull/345) +- RDKB-64487: Code changes for Mutex locking and deadlock prevention [`#338`](https://github.com/rdkcentral/telemetry/pull/338) +- RDKEMW-16851: Fix for top log markers and the polling frequency [`#341`](https://github.com/rdkcentral/telemetry/pull/341) +- Adding agentic skills for code reviews [`#323`](https://github.com/rdkcentral/telemetry/pull/323) + #### [1.9.0](https://github.com/rdkcentral/telemetry/compare/1.8.8...1.9.0) +> 8 April 2026 + - RDK-60291: RDK Coverity Defect Resolution [`#319`](https://github.com/rdkcentral/telemetry/pull/319) +- Changelog updates for 1.9.0 release [`c3239a5`](https://github.com/rdkcentral/telemetry/commit/c3239a5c22ad9da475210a2448777563c6b84390) #### [1.8.8](https://github.com/rdkcentral/telemetry/compare/1.8.7...1.8.8) diff --git a/configure.ac b/configure.ac index 03951e9f..39340abb 100644 --- a/configure.ac +++ b/configure.ac @@ -206,6 +206,19 @@ AC_ARG_ENABLE([rdkcertselector], AM_CONDITIONAL([IS_LIBRDKCERTSEL_ENABLED], [test x$IS_LIBRDKCERTSEL_ENABLED = xtrue]) AC_SUBST(LIBRDKCERTSEL_FLAG) +# ThreadSanitizer support for detecting data races in L2 tests +AC_ARG_ENABLE([tsan], + AS_HELP_STRING([--enable-tsan],[enable ThreadSanitizer for race detection (default is no)]), + [ + case "${enableval}" in + yes) CFLAGS="$CFLAGS -fsanitize=thread -g -O1 -fno-omit-frame-pointer" + LDFLAGS="$LDFLAGS -fsanitize=thread";; + no) ;; + *) AC_MSG_ERROR([bad value ${enableval} for --enable-tsan]) ;; + esac + ], + [echo "ThreadSanitizer is disabled"]) + AC_CONFIG_FILES([Makefile source/Makefile source/bulkdata/Makefile diff --git a/source/bulkdata/profile.c b/source/bulkdata/profile.c index 1b0cf593..88b478c4 100644 --- a/source/bulkdata/profile.c +++ b/source/bulkdata/profile.c @@ -54,7 +54,7 @@ static bool initialized = false; static Vector *profileList; -static pthread_mutex_t plMutex; +static pthread_rwlock_t profileListLock; // L0: protects profileList (rwlock) static pthread_mutex_t reportLock; static pthread_mutex_t triggerConditionQueMutex = PTHREAD_MUTEX_INITIALIZER; @@ -302,19 +302,19 @@ T2ERROR profileWithNameExists(const char *profileName, bool *bProfileExists) *bProfileExists = false; return T2ERROR_FAILURE; } - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); for(; profileIndex < Vector_Size(profileList); profileIndex++) { tempProfile = (Profile *)Vector_At(profileList, profileIndex); if(strcmp(tempProfile->name, profileName) == 0) { *bProfileExists = true; - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_SUCCESS; } } *bProfileExists = false; - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Error("Profile with Name : %s not found\n", profileName); return T2ERROR_PROFILE_NOT_FOUND; } @@ -925,17 +925,17 @@ reportThreadEnd : void NotifyTimeout(const char* profileName, bool isClearSeekMap) { T2Debug("%s ++in\n", __FUNCTION__); - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); Profile *profile = NULL; if(T2ERROR_SUCCESS != getProfile(profileName, &profile)) { T2Error("Profile : %s not found\n", profileName); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return ; } - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Info("%s: profile %s is in %s state\n", __FUNCTION__, profileName, profile->enable ? "Enabled" : "Disabled"); pthread_mutex_lock(&profile->reportInProgressMutex); if(profile->enable && !profile->reportInProgress) @@ -975,15 +975,15 @@ T2ERROR Profile_storeMarkerEvent(const char *profileName, T2Event *eventInfo) { T2Debug("%s ++in\n", __FUNCTION__); - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); Profile *profile = NULL; if(T2ERROR_SUCCESS != getProfile(profileName, &profile)) { T2Error("Profile : %s not found\n", profileName); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); if(!profile->enable) { T2Warning("Profile : %s is disabled, ignoring the event\n", profileName); @@ -1138,10 +1138,10 @@ T2ERROR addProfile(Profile *profile) T2Error("profile list is not initialized yet, ignoring\n"); return T2ERROR_FAILURE; } - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); Vector_PushBack(profileList, profile); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); return T2ERROR_SUCCESS; } @@ -1154,18 +1154,18 @@ T2ERROR enableProfile(const char *profileName) T2Error("profile list is not initialized yet, ignoring\n"); return T2ERROR_FAILURE; } - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); Profile *profile = NULL; if(T2ERROR_SUCCESS != getProfile(profileName, &profile)) { T2Error("Profile : %s not found\n", profileName); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } if(profile->enable) { T2Info("Profile : %s is already enabled - ignoring duplicate request\n", profileName); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_SUCCESS; } else @@ -1174,19 +1174,19 @@ T2ERROR enableProfile(const char *profileName) if(pthread_mutex_init(&profile->triggerCondMutex, NULL) != 0) { T2Error(" %s Mutex init has failed\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } if(pthread_mutex_init(&profile->reportInProgressMutex, NULL) != 0) { T2Error(" %s Mutex init has failed\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } if(pthread_cond_init(&profile->reportInProgressCond, NULL) != 0) { T2Error(" %s Cond init has failed\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } @@ -1201,7 +1201,7 @@ T2ERROR enableProfile(const char *profileName) { profile->enable = false; T2Error("Unable to register profile : %s with Scheduler\n", profileName); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } T2ER_StartDispatchThread(); @@ -1209,7 +1209,7 @@ T2ERROR enableProfile(const char *profileName) T2Info("Successfully enabled profile : %s\n", profileName); } T2Debug("%s --out\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_SUCCESS; } @@ -1221,7 +1221,7 @@ void updateMarkerComponentMap() size_t profileIndex = 0; Profile *tempProfile = NULL; - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); for(; profileIndex < Vector_Size(profileList); profileIndex++) { tempProfile = (Profile *)Vector_At(profileList, profileIndex); @@ -1237,7 +1237,7 @@ void updateMarkerComponentMap() } } } - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); } @@ -1251,12 +1251,12 @@ T2ERROR disableProfile(const char *profileName, bool *isDeleteRequired) return T2ERROR_FAILURE; } - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); Profile *profile = NULL; if(T2ERROR_SUCCESS != getProfile(profileName, &profile)) { T2Error("Profile : %s not found\n", profileName); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } @@ -1266,13 +1266,28 @@ T2ERROR disableProfile(const char *profileName, bool *isDeleteRequired) } else { - profile->enable = false; + /* Only touch reuseThreadMutex/reuseThread when the report thread + * exists and therefore owns valid synchronization primitives. + * reuseThreadMutex is initialized inside CollectAndReport() and + * destroyed when that thread exits, so accessing it when the thread + * has not started or has already exited is undefined behavior. */ + if (profile->threadExists) + { + pthread_mutex_lock(&profile->reuseThreadMutex); + profile->enable = false; + pthread_cond_signal(&profile->reuseThread); + pthread_mutex_unlock(&profile->reuseThreadMutex); + } + else + { + profile->enable = false; + } } #ifdef PERSIST_LOG_MON_REF removeProfileFromDisk(SEEKFOLDER, profile->name); #endif profile->isSchedulerstarted = false; - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); return T2ERROR_SUCCESS; @@ -1286,24 +1301,37 @@ T2ERROR deleteAllProfiles(bool delFromDisk) int profileIndex = 0; Profile *tempProfile = NULL; - pthread_mutex_lock(&plMutex); + /* Acquire wrlock once to atomically validate, read count, and disable all + * profiles. This eliminates the TOCTOU race between the old pattern of + * rdlock-for-count then wrlock-per-profile, where the list could change + * between the two operations. */ + pthread_rwlock_wrlock(&profileListLock); if(profileList == NULL) { T2Error("profile list is not initialized yet or profileList is empty, ignoring\n"); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } count = Vector_Size(profileList); - pthread_mutex_unlock(&plMutex); - - for(; profileIndex < count; profileIndex++) + for(profileIndex = 0; profileIndex < count; profileIndex++) { - pthread_mutex_lock(&plMutex); tempProfile = (Profile *)Vector_At(profileList, profileIndex); tempProfile->enable = false; tempProfile->isSchedulerstarted = false; - pthread_mutex_unlock(&plMutex); + } + pthread_rwlock_unlock(&profileListLock); + + /* Second pass: perform blocking operations (unregister, signal, join) without + * holding profileListLock for extended periods. The caller (uninitProfileList) + * sets initialized=false before calling, which prevents concurrent list + * modifications. We still briefly acquire rdlock for each Vector_At to + * satisfy Coverity's lock-consistency analysis. */ + for(profileIndex = 0; profileIndex < count; profileIndex++) + { + pthread_rwlock_rdlock(&profileListLock); + tempProfile = (Profile *)Vector_At(profileList, profileIndex); + pthread_rwlock_unlock(&profileListLock); if(T2ERROR_SUCCESS != unregisterProfileFromScheduler(tempProfile->name)) { @@ -1330,13 +1358,13 @@ T2ERROR deleteAllProfiles(bool delFromDisk) * after setting threadExists = false (see CollectAndReport cleanup). */ } - /* Re-acquire plMutex for profile cleanup */ - pthread_mutex_lock(&plMutex); + /* grepSeekProfile cleanup is safe without profileListLock here: + * the profile's thread has been joined (or never existed), and + * initialized=false prevents concurrent access from other threads. */ if(tempProfile->grepSeekProfile) { freeGrepSeekProfile(tempProfile->grepSeekProfile); } - pthread_mutex_unlock(&plMutex); if(delFromDisk == true) { removeProfileFromDisk(REPORTPROFILES_PERSISTENCE_PATH, tempProfile->name); @@ -1350,12 +1378,12 @@ T2ERROR deleteAllProfiles(bool delFromDisk) removeProfileFromDisk(REPORTPROFILES_PERSISTENCE_PATH, MSGPACK_REPORTPROFILES_PERSISTENT_FILE); } - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); T2Debug("Deleting all profiles from the profileList\n"); Vector_Destroy(profileList, freeProfile); profileList = NULL; Vector_Create(&profileList); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); @@ -1366,17 +1394,17 @@ bool isProfileEnabled(const char *profileName) { bool is_profile_enable = false; Profile *get_profile = NULL; - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); if(T2ERROR_SUCCESS != getProfile(profileName, &get_profile)) { T2Error("Profile : %s not found\n", profileName); T2Debug("%s --out\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return false; } is_profile_enable = get_profile->enable; T2Debug("is_profile_enable = %d \n", is_profile_enable); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return is_profile_enable; } @@ -1391,11 +1419,11 @@ T2ERROR deleteProfile(const char *profileName) } Profile *profile = NULL; - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); if(T2ERROR_SUCCESS != getProfile(profileName, &profile)) { T2Error("Profile : %s not found\n", profileName); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_FAILURE; } @@ -1407,14 +1435,14 @@ T2ERROR deleteProfile(const char *profileName) { profile->isSchedulerstarted = false; } - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); if(T2ERROR_SUCCESS != unregisterProfileFromScheduler(profileName)) { T2Info("Profile : %s already removed from scheduler\n", profileName); } T2Info("Waiting for CollectAndReport to be complete : %s\n", profileName); - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); // Wait for any in-progress report to finish under reportInProgressMutex. // threadExists must NOT be read here — it is written under reuseThreadMutex @@ -1435,13 +1463,13 @@ T2ERROR deleteProfile(const char *profileName) bool threadStillExists = profile->threadExists; pthread_mutex_unlock(&profile->reuseThreadMutex); - /* Release plMutex before pthread_join to avoid deadlock. + /* Release profileListLock before pthread_join to avoid deadlock. * pthread_join can block indefinitely if the CollectAndReport thread - * is stuck (e.g., waiting on rbusMethodMutex). Holding plMutex during + * is stuck (e.g., waiting on rbusMethodMutex). Holding profileListLock during * pthread_join prevents other threads (timeout callbacks, other profile * operations) from making progress, creating a deadlock. */ - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); if (threadStillExists) { @@ -1454,8 +1482,8 @@ T2ERROR deleteProfile(const char *profileName) * after setting threadExists = false (see CollectAndReport cleanup). */ } - /* Re-acquire plMutex for profile cleanup operations */ - pthread_mutex_lock(&plMutex); + /* Re-acquire profileListLock (wrlock) for profile cleanup operations */ + pthread_rwlock_wrlock(&profileListLock); if(Vector_Size(profile->triggerConditionList) > 0) { @@ -1476,7 +1504,7 @@ T2ERROR deleteProfile(const char *profileName) #endif Vector_RemoveItem(profileList, profile, freeProfile); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); return T2ERROR_SUCCESS; } @@ -1487,7 +1515,7 @@ void sendLogUploadInterruptToScheduler() Profile *tempProfile = NULL; T2Debug("%s ++in\n", __FUNCTION__); - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); for(; profileIndex < Vector_Size(profileList); profileIndex++) { tempProfile = (Profile *)Vector_At(profileList, profileIndex); @@ -1496,7 +1524,7 @@ void sendLogUploadInterruptToScheduler() SendInterruptToTimeoutThread(tempProfile->name); } } - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); } @@ -1640,20 +1668,22 @@ T2ERROR initProfileList(bool checkPreviousSeek) return T2ERROR_SUCCESS; } initialized = true; - if(pthread_mutex_init(&plMutex, NULL) != 0) + if(pthread_rwlock_init(&profileListLock, NULL) != 0) { - T2Error("%s mutex init has failed\n", __FUNCTION__); + T2Error("%s rwlock init has failed\n", __FUNCTION__); return T2ERROR_FAILURE; } if(pthread_mutex_init(&reportLock, NULL) != 0 ) { T2Error("%s mutex init has failed\n", __FUNCTION__); + pthread_rwlock_destroy(&profileListLock); + initialized = false; return T2ERROR_FAILURE; } - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); Vector_Create(&profileList); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); registerConditionalReportCallBack(&triggerReportOnCondtion); @@ -1672,9 +1702,9 @@ int getProfileCount() T2Info("profile list isn't initialized\n"); return count; } - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); count = Vector_Size(profileList); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); return count; @@ -1687,7 +1717,7 @@ hash_map_t *getProfileHashMap() Profile *tempProfile = NULL; T2Debug("%s ++in\n", __FUNCTION__); - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); profileHashMap = hash_map_create(); for(; profileIndex < Vector_Size(profileList); profileIndex++) { @@ -1696,7 +1726,7 @@ hash_map_t *getProfileHashMap() char *profileHash = strdup(tempProfile->hash); hash_map_put(profileHashMap, profileName, profileHash, free); } - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); return profileHashMap; @@ -1726,7 +1756,7 @@ T2ERROR uninitProfileList() } pthread_mutex_destroy(&reportLock); - pthread_mutex_destroy(&plMutex); + pthread_rwlock_destroy(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); return T2ERROR_SUCCESS; @@ -1746,7 +1776,7 @@ T2ERROR registerTriggerConditionConsumer() while(retry_count <= MAX_RETRY_COUNT) { - pthread_mutex_lock(&plMutex); + pthread_rwlock_rdlock(&profileListLock); profileIndex = 0; for(; profileIndex < Vector_Size(profileList); profileIndex++) { @@ -1762,7 +1792,7 @@ T2ERROR registerTriggerConditionConsumer() } } - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); if(retry == 1) { if(retry_count >= MAX_RETRY_COUNT) @@ -1788,8 +1818,8 @@ void NotifySchedulerstart(char* profileName, bool isschedulerstarted) { size_t profileIndex = 0; Profile *tempProfile = NULL; - pthread_mutex_lock(&plMutex); - T2Debug("plMutex is locked %s\n", __FUNCTION__); + pthread_rwlock_wrlock(&profileListLock); + T2Debug("profileListLock is locked %s\n", __FUNCTION__); for(; profileIndex < Vector_Size(profileList); profileIndex++) { tempProfile = (Profile *)Vector_At(profileList, profileIndex); @@ -1801,8 +1831,8 @@ void NotifySchedulerstart(char* profileName, bool isschedulerstarted) } } } - pthread_mutex_unlock(&plMutex); - T2Debug("plMutex is unlocked %s\n", __FUNCTION__); + pthread_rwlock_unlock(&profileListLock); + T2Debug("profileListLock is unlocked %s\n", __FUNCTION__); return; } @@ -1954,7 +1984,7 @@ T2ERROR triggerReportOnCondtion(const char *referenceName, const char *reference size_t j, profileIndex = 0; Profile *tempProfile = NULL; - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); for(; profileIndex < Vector_Size(profileList); profileIndex++) { tempProfile = (Profile *)Vector_At(profileList, profileIndex); @@ -1976,9 +2006,9 @@ T2ERROR triggerReportOnCondtion(const char *referenceName, const char *reference char *tempProfilename = strdup(tempProfile->name); //RDKB-42640 - // plmutex should be unlocked before sending interrupt for report generation - T2Debug("%s : Release lock on &plMutex\n ", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + // profileListLock should be unlocked before sending interrupt for report generation + T2Debug("%s : Release lock on &profileListLock\n ", __FUNCTION__); + pthread_rwlock_unlock(&profileListLock); T2Info("Triggering report on condition for %s with %s operator, %d threshold\n", triggerCondition->reference, triggerCondition->oprator, triggerCondition->threshold); if(tempProfile->isSchedulerstarted) @@ -1999,8 +2029,8 @@ T2ERROR triggerReportOnCondtion(const char *referenceName, const char *reference else { T2Info("Report generation will take place by popping up from the que by callback function \n"); - T2Debug("%s : Release lock on &plMutex\n ", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + T2Debug("%s : Release lock on &profileListLock\n ", __FUNCTION__); + pthread_rwlock_unlock(&profileListLock); return T2ERROR_SUCCESS; } } @@ -2008,7 +2038,7 @@ T2ERROR triggerReportOnCondtion(const char *referenceName, const char *reference } } } - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); return T2ERROR_SUCCESS; } @@ -2018,18 +2048,18 @@ unsigned int getMinThresholdDuration(char *profileName) unsigned int minThresholdDuration = 0; Profile *get_profile = NULL; T2Debug("%s --in\n", __FUNCTION__); - pthread_mutex_lock(&plMutex); + pthread_rwlock_wrlock(&profileListLock); if(T2ERROR_SUCCESS != getProfile(profileName, &get_profile)) { T2Error("Profile : %s not found\n", profileName); T2Debug("%s --out\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); return 0; } minThresholdDuration = get_profile->minThresholdDuration; get_profile->minThresholdDuration = 0; // reinit the value T2Debug("minThresholdDuration = %u \n", minThresholdDuration); - pthread_mutex_unlock(&plMutex); + pthread_rwlock_unlock(&profileListLock); T2Debug("%s --out\n", __FUNCTION__); return minThresholdDuration; } diff --git a/source/bulkdata/profile.h b/source/bulkdata/profile.h index 21bd4de4..f329bfcd 100644 --- a/source/bulkdata/profile.h +++ b/source/bulkdata/profile.h @@ -39,6 +39,20 @@ typedef struct _JSONEncoding TimeStampFormat tsFormat; } JSONEncoding; +/** + * Lock Hierarchy (acquire in ascending order to prevent deadlocks): + * L0: profileListLock (module-level rwlock - protects profileList) + * Use rdlock for read-only traversals (lookups, iteration without mutation). + * Use wrlock when adding/removing/modifying profiles or profileList itself. + * L1: reuseThreadMutex (per-profile - protects thread lifecycle) + * L2: reportInProgressMutex (per-profile - protects reportInProgress flag) + * L3: triggerCondMutex / eventMutex / reportMutex (per-profile - leaf locks) + * + * Rules: + * - Always acquire L0 before L1, L1 before L2, etc. + * - Never hold a higher-numbered lock when acquiring a lower-numbered lock. + * - Release profileListLock before long operations (pthread_join, HTTP send). + */ typedef struct _Profile { bool enable; diff --git a/source/bulkdata/profilexconf.c b/source/bulkdata/profilexconf.c index 3a8e8424..5a6b7c6f 100644 --- a/source/bulkdata/profilexconf.c +++ b/source/bulkdata/profilexconf.c @@ -45,7 +45,12 @@ static bool initialized = false; static ProfileXConf *singleProfile = NULL; -static pthread_mutex_t plMutex; /* TODO - we can remove plMutex most likely but firseck that CollectAndReport doesn't cause issue */ +/** + * xconfProfileLock: Protects singleProfile pointer and reportThreadExits flag. + * Kept as mutex (not rwlock) because pthread_cond_wait requires pthread_mutex_t. + * Lock hierarchy: L0 (same level as profileListLock in profile.c). + */ +static pthread_mutex_t xconfProfileLock = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t reuseThread; static bool reportThreadExits = false; @@ -209,16 +214,16 @@ static void* CollectAndReportXconf(void* data) (void) data;// To fix compiler warning /* Set reportThreadExits flag under mutex to prevent data race with - * ProfileXConf_notifyTimeout which reads this flag under plMutex. + * ProfileXConf_notifyTimeout which reads this flag under xconfProfileLock. */ - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); reportThreadExits = true; - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); do { - /* CRITICAL SECTION START: Acquire plMutex to check/access singleProfile */ - pthread_mutex_lock(&plMutex); + /* CRITICAL SECTION START: Acquire xconfProfileLock to check/access singleProfile */ + pthread_mutex_lock(&xconfProfileLock); T2Info("%s while Loop -- START \n", __FUNCTION__); if(singleProfile == NULL) @@ -231,7 +236,7 @@ static void* CollectAndReportXconf(void* data) /* Set reportInProgress flag to prevent concurrent report generation * and profile deletion while we're working. This must be done under - * plMutex to prevent races with ProfileXConf_notifyTimeout() and + * xconfProfileLock to prevent races with ProfileXConf_notifyTimeout() and * ProfileXConf_delete(). */ profile->reportInProgress = true; @@ -257,21 +262,21 @@ static void* CollectAndReportXconf(void* data) T2Info("%s ++in profileName : %s\n", __FUNCTION__, profile->name); } - /* CRITICAL: Release plMutex before potentially blocking operations. + /* CRITICAL: Release xconfProfileLock before potentially blocking operations. * Report generation involves: * - * Holding plMutex during these operations blocks ALL other XCONF profile + * Holding xconfProfileLock during these operations blocks ALL other XCONF profile * operations (timeouts, updates, deletions, marker events) system-wide, * causing the telemetry system to hang. * - * We can safely release plMutex here because: + * We can safely release xconfProfileLock here because: * 1. We've already checked singleProfile is valid * 2. We use profile->reportInProgress to prevent concurrent reports * 3. Profile deletion waits for reportInProgress to be false - * 4. We'll re-acquire plMutex before updating shared state + * 4. We'll re-acquire xconfProfileLock before updating shared state */ - pthread_mutex_unlock(&plMutex); - /* CRITICAL SECTION END - plMutex released, other threads can now proceed */ + pthread_mutex_unlock(&xconfProfileLock); + /* CRITICAL SECTION END - xconfProfileLock released, other threads can now proceed */ int clockReturn = 0; clockReturn = clock_gettime(CLOCK_MONOTONIC, &startTime); @@ -280,11 +285,11 @@ static void* CollectAndReportXconf(void* data) if(T2ERROR_SUCCESS != initJSONReportXconf(&profile->jsonReportObj, &valArray)) { T2Error("Failed to initialize JSON Report\n"); - /* Re-acquire plMutex before updating profile state. - * CRITICAL: Keep plMutex locked before goto because reportXconfThreadEnd + /* Re-acquire xconfProfileLock before updating profile state. + * CRITICAL: Keep xconfProfileLock locked before goto because reportXconfThreadEnd * calls pthread_cond_wait which requires the mutex to be locked. */ - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(singleProfile == profile) { profile->reportInProgress = false; @@ -331,19 +336,19 @@ static void* CollectAndReportXconf(void* data) dcaFlagReportCompleation(); - /* CRITICAL: Re-acquire plMutex to safely access eMarkerList. + /* CRITICAL: Re-acquire xconfProfileLock to safely access eMarkerList. * External components can call t2_event_s() which modifies eMarkerList - * via ProfileXConf_storeMarkerEvent(). We must hold plMutex during + * via ProfileXConf_storeMarkerEvent(). We must hold xconfProfileLock during * event marker encoding to prevent race conditions. * This is safe because encoding is quick (~milliseconds), unlike HTTP * upload which can take 30+ seconds. */ - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(singleProfile == profile && profile->eMarkerList != NULL && Vector_Size(profile->eMarkerList) > 0) { encodeEventMarkersInJSON(valArray, profile->eMarkerList); } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); profile->grepSeekProfile->execCounter += 1; T2Info("Xconf Profile Execution Count = %d\n", profile->grepSeekProfile->execCounter); @@ -364,11 +369,11 @@ static void* CollectAndReportXconf(void* data) if(ret != T2ERROR_SUCCESS) { T2Error("Unable to generate report for : %s\n", profile->name); - /* Re-acquire plMutex before updating profile state. - * CRITICAL: Keep plMutex locked before goto because reportXconfThreadEnd + /* Re-acquire xconfProfileLock before updating profile state. + * CRITICAL: Keep xconfProfileLock locked before goto because reportXconfThreadEnd * calls pthread_cond_wait which requires the mutex to be locked. */ - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(singleProfile == profile) { profile->reportInProgress = false; @@ -400,11 +405,11 @@ static void* CollectAndReportXconf(void* data) // Before caching the report, add "REPORT_TYPE": "CACHED" // tagReportAsCached(&jsonReport); Vector_PushBack(profile->cachedReportList, strdup(jsonReport)); - /* Re-acquire plMutex before updating profile state. - * CRITICAL: Keep plMutex locked before goto because reportXconfThreadEnd + /* Re-acquire xconfProfileLock before updating profile state. + * CRITICAL: Keep xconfProfileLock locked before goto because reportXconfThreadEnd * calls pthread_cond_wait which requires the mutex to be locked. */ - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(singleProfile == profile) { profile->reportInProgress = false; @@ -555,11 +560,11 @@ static void* CollectAndReportXconf(void* data) isAbortTriggered = false ; } - /* CRITICAL SECTION START: Re-acquire plMutex before updating profile state. - * pthread_cond_wait requires us to hold plMutex, so we acquire it here + /* CRITICAL SECTION START: Re-acquire xconfProfileLock before updating profile state. + * pthread_cond_wait requires us to hold xconfProfileLock, so we acquire it here * and hold it through the state update and into the cond_wait. */ - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(singleProfile == profile) { profile->reportInProgress = false; @@ -573,26 +578,26 @@ reportXconfThreadEnd : * Wait while: profile exists AND no timeout pending AND not shutting down. * Exit loop when: timeout arrives (reportInProgress=true) OR shutdown (initialized=false). * - * pthread_cond_wait atomically releases plMutex while waiting. - * When signaled or spuriously woken, it re-acquires plMutex before returning. + * pthread_cond_wait atomically releases xconfProfileLock while waiting. + * When signaled or spuriously woken, it re-acquires xconfProfileLock before returning. */ while(singleProfile && !singleProfile->reportInProgress && initialized) { - pthread_cond_wait(&reuseThread, &plMutex); + pthread_cond_wait(&reuseThread, &xconfProfileLock); } - /* After cond_wait loop exits, we hold plMutex again. Release it before + /* After cond_wait loop exits, we hold xconfProfileLock again. Release it before * the next loop iteration (which will re-acquire it). */ - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); } while(initialized); - /* Thread is exiting. We don't hold plMutex here, so acquire it to + /* Thread is exiting. We don't hold xconfProfileLock here, so acquire it to * update the reportThreadExits flag, then release it. */ - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); reportThreadExits = false; - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); T2Info("%s --out exiting the CollectAndReportXconf thread \n", __FUNCTION__); return NULL; @@ -608,18 +613,13 @@ T2ERROR ProfileXConf_init(bool checkPreviousSeek) Config *config = NULL; initialized = true; - if(pthread_mutex_init(&plMutex, NULL) != 0) - { - T2Error("%s Mutex init has failed\n", __FUNCTION__); - return T2ERROR_FAILURE; - } /* Initialize condition variable at module init to prevent race where * ProfileXConf_notifyTimeout signals before CollectAndReportXconf initializes it. */ if(pthread_cond_init(&reuseThread, NULL) != 0) { T2Error("%s Condition variable init has failed\n", __FUNCTION__); - pthread_mutex_destroy(&plMutex); + initialized = false; return T2ERROR_FAILURE; } Vector_Create(&configList); @@ -687,28 +687,31 @@ T2ERROR ProfileXConf_uninit() } initialized = false; - pthread_mutex_lock(&singleProfile->reportInProgressMutex); - bool reportInProgress = singleProfile->reportInProgress; - pthread_mutex_unlock(&singleProfile->reportInProgressMutex); - if(reportInProgress) + /* Wake the report thread (if it exists) and join it before cleanup. + * The thread may be in one of two states: + * (a) reportInProgress=true — actively generating a report + * (b) reportInProgress=false — idle in pthread_cond_wait(&reuseThread) + * In both cases we must signal and join; otherwise case (b) would skip + * the join, and freeProfileXConf + pthread_cond_destroy would destroy + * resources the sleeping thread still references (undefined behavior). */ + pthread_mutex_lock(&xconfProfileLock); + if(reportThreadExits) { - T2Debug("Waiting for final report before uninit\n"); - pthread_mutex_lock(&plMutex); pthread_cond_signal(&reuseThread); - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); pthread_join(singleProfile->reportThread, NULL); - pthread_mutex_lock(&singleProfile->reportInProgressMutex); - singleProfile->reportInProgress = false ; - pthread_mutex_unlock(&singleProfile->reportInProgressMutex); - T2Info("Final report is completed, releasing profile memory\n"); + pthread_mutex_lock(&xconfProfileLock); + singleProfile->reportInProgress = false; + T2Info("Report thread joined, releasing profile memory\n"); } - pthread_mutex_lock(&plMutex); freeProfileXConf(); - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); - /* Destroy condition variable at module uninit, after all threads are stopped */ + /* Destroy condition variable at module uninit, after all threads are stopped. + * xconfProfileLock is statically initialized (PTHREAD_MUTEX_INITIALIZER) and + * does not need to be destroyed — destroying it would leave it invalid if + * the module were ever re-initialized in the same process. */ pthread_cond_destroy(&reuseThread); - pthread_mutex_destroy(&plMutex); T2Debug("%s --out\n", __FUNCTION__); return T2ERROR_SUCCESS; } @@ -719,14 +722,13 @@ T2ERROR ProfileXConf_set(ProfileXConf *profile) T2ERROR ret = T2ERROR_FAILURE; - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(!singleProfile) { singleProfile = profile; - pthread_mutex_lock(&singleProfile->reportInProgressMutex); - singleProfile->reportInProgress = false ; - pthread_mutex_unlock(&singleProfile->reportInProgressMutex); + /* reportInProgress is protected by xconfProfileLock (already held) */ + singleProfile->reportInProgress = false; size_t emIndex = 0; EventMarker *eMarker = NULL; for(; emIndex < Vector_Size(singleProfile->eMarkerList); emIndex++) @@ -735,11 +737,11 @@ T2ERROR ProfileXConf_set(ProfileXConf *profile) addT2EventMarker(eMarker->markerName, eMarker->compName, singleProfile->name, eMarker->skipFreq); } - /* Release plMutex before calling scheduler API to avoid potential + /* Release xconfProfileLock before calling scheduler API to avoid potential * blocking while holding the mutex. Scheduler operations may involve * timer management and other operations that shouldn't block profile access. */ - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); if(registerProfileWithScheduler(singleProfile->name, singleProfile->reportingInterval, INFINITE_TIMEOUT, false, true, false, DEFAULT_FIRST_REPORT_INT, NULL) == T2ERROR_SUCCESS) { @@ -751,15 +753,15 @@ T2ERROR ProfileXConf_set(ProfileXConf *profile) T2Error("Unable to register profile : %s with Scheduler\n", singleProfile->name); } - /* Note: We already released plMutex above, so no need to unlock again */ - pthread_mutex_lock(&plMutex); + /* Note: We already released xconfProfileLock above, so no need to unlock again */ + pthread_mutex_lock(&xconfProfileLock); } else { T2Error("XConf profile already added, can't have more then 1 profile\n"); } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); T2Debug("%s --out\n", __FUNCTION__); return ret; @@ -775,7 +777,7 @@ void ProfileXConf_updateMarkerComponentMap() } size_t emIndex = 0; EventMarker *eMarker = NULL; - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(singleProfile) { for(; emIndex < Vector_Size(singleProfile->eMarkerList); emIndex++) @@ -788,14 +790,14 @@ void ProfileXConf_updateMarkerComponentMap() { T2Error("Profile not found in %s\n", __FUNCTION__); } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); T2Debug("%s --out\n", __FUNCTION__); } bool ProfileXConf_isNameEqual(char* profileName) { bool isName = false; - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(initialized) { if(singleProfile && (singleProfile->name != NULL) && (profileName != NULL) && !strcmp(singleProfile->name, profileName)) //Adding NULL check to avoid strcmp crash @@ -805,7 +807,7 @@ bool ProfileXConf_isNameEqual(char* profileName) } } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); return isName; } @@ -820,23 +822,23 @@ T2ERROR ProfileXConf_delete(ProfileXConf *profile) T2Debug("calling ProfileXConf_isNameEqual function form %s and line %d\n", __FUNCTION__, __LINE__); bool isNameEqual = ProfileXConf_isNameEqual(profile->name); - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(!singleProfile) { T2Error("Profile not found in %s\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); return T2ERROR_FAILURE; } /* Copy profile name before unlocking to prevent use-after-free if - * another thread deletes/frees singleProfile after we release plMutex. + * another thread deletes/frees singleProfile after we release xconfProfileLock. */ char profileNameCopy[256] = {0}; if(!isNameEqual && singleProfile && singleProfile->name) { strncpy(profileNameCopy, singleProfile->name, sizeof(profileNameCopy) - 1); } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); if(isNameEqual) { @@ -849,7 +851,7 @@ T2ERROR ProfileXConf_delete(ProfileXConf *profile) else { /* Use copied profile name instead of singleProfile->name to avoid - * use-after-free since we no longer hold plMutex. + * use-after-free since we no longer hold xconfProfileLock. */ if(profileNameCopy[0] != '\0') { @@ -865,7 +867,7 @@ T2ERROR ProfileXConf_delete(ProfileXConf *profile) * causing use-after-free crash when CollectAndReportXconf() was still * accessing profile members during brief mutex holds (event encoding, etc). */ - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); unsigned int waitIterations = 0; const unsigned int LOG_INTERVAL = 3000; // Log every 3000 iterations (30 seconds at 10ms per iteration) while(singleProfile && singleProfile->reportInProgress) @@ -875,9 +877,9 @@ T2ERROR ProfileXConf_delete(ProfileXConf *profile) T2Info("Waiting for CollectAndReportXconf to be complete : %s\n", singleProfile->name); } waitIterations++; - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); usleep(10000); // 10ms polling interval - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); } profile->reportThread = singleProfile->reportThread; @@ -887,9 +889,8 @@ T2ERROR ProfileXConf_delete(ProfileXConf *profile) if(isNameEqual) { profile->bClearSeekMap = singleProfile->bClearSeekMap ; - pthread_mutex_lock(&profile->reportInProgressMutex); - profile->reportInProgress = false ; - pthread_mutex_unlock(&profile->reportInProgressMutex); + /* reportInProgress is protected by xconfProfileLock (already held) */ + profile->reportInProgress = false; if(count > 0 && profile->cachedReportList != NULL) { T2Info("There are %zu cached reports in the profile \n", count); @@ -989,7 +990,7 @@ T2ERROR ProfileXConf_delete(ProfileXConf *profile) T2Info("removing profile : %s\n", singleProfile->name); freeProfileXConf(); - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); T2Debug("%s --out\n", __FUNCTION__); return T2ERROR_SUCCESS; } @@ -997,7 +998,7 @@ T2ERROR ProfileXConf_delete(ProfileXConf *profile) bool ProfileXConf_isSet() { bool isSet = false; - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(singleProfile != NULL) { @@ -1005,14 +1006,14 @@ bool ProfileXConf_isSet() isSet = true; } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); return isSet; } char* ProfileXconf_getName() { char* profileName = NULL ; - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(initialized) { if(singleProfile) @@ -1020,7 +1021,7 @@ char* ProfileXconf_getName() profileName = strdup(singleProfile->name); } } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); return profileName; } @@ -1029,20 +1030,19 @@ void ProfileXConf_notifyTimeout(bool isClearSeekMap, bool isOnDemand) T2Debug("%s ++in\n", __FUNCTION__); int reportThreadStatus = 0 ; - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(!singleProfile) { T2Error("Profile not found in %s\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); return ; } isOnDemandReport = isOnDemand; - pthread_mutex_lock(&singleProfile->reportInProgressMutex); + /* reportInProgress is protected by xconfProfileLock (already held) */ if(!singleProfile->reportInProgress) { singleProfile->bClearSeekMap = isClearSeekMap; singleProfile->reportInProgress = true; - pthread_mutex_unlock(&singleProfile->reportInProgressMutex); if (reportThreadExits) { @@ -1055,15 +1055,24 @@ void ProfileXConf_notifyTimeout(bool isClearSeekMap, bool isOnDemand) { T2Error("Failed to create report thread with error code = %d !!! \n", reportThreadStatus); } + else + { + /* Set flag immediately so ProfileXConf_uninit can find and + * join this thread. Without this, there is a race window + * between pthread_create returning and the new thread + * setting reportThreadExits=true inside CollectAndReportXconf, + * during which uninit would skip the join and free resources + * the thread is about to use. */ + reportThreadExits = true; + } } } else { - pthread_mutex_unlock(&singleProfile->reportInProgressMutex); T2Warning("Received profileTimeoutCb while previous callback is still in progress - ignoring the request\n"); } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); T2Debug("%s --out\n", __FUNCTION__); } @@ -1073,11 +1082,11 @@ T2ERROR ProfileXConf_storeMarkerEvent(T2Event *eventInfo) { T2Debug("%s ++in\n", __FUNCTION__); - pthread_mutex_lock(&plMutex); + pthread_mutex_lock(&xconfProfileLock); if(!singleProfile) { T2Error("Profile not found in %s\n", __FUNCTION__); - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); return T2ERROR_FAILURE; } @@ -1138,11 +1147,11 @@ T2ERROR ProfileXConf_storeMarkerEvent(T2Event *eventInfo) { T2Error("Event name : %s value : %s\n", eventInfo->name, eventInfo->value); T2Error("Event doens't match any marker information, shouldn't come here\n"); - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); return T2ERROR_FAILURE; } - pthread_mutex_unlock(&plMutex); + pthread_mutex_unlock(&xconfProfileLock); T2Debug("%s --out\n", __FUNCTION__); return T2ERROR_SUCCESS; diff --git a/source/dcautil/dca.c b/source/dcautil/dca.c index 3af9ed83..c31df381 100644 --- a/source/dcautil/dca.c +++ b/source/dcautil/dca.c @@ -274,6 +274,21 @@ int processTopPattern(char* profileName, Vector* topMarkerList, int profileExec { continue; } + if (topMarkerObj->cpuValue) + { + free(topMarkerObj->cpuValue); + topMarkerObj->cpuValue = NULL; + } + if (topMarkerObj->memValue) + { + free(topMarkerObj->memValue); + topMarkerObj->memValue = NULL; + } + if(topMarkerObj->loadAverage) + { + free(topMarkerObj->loadAverage); + topMarkerObj->loadAverage = NULL; + } int tmp_skip_interval, is_skip_param; tmp_skip_interval = topMarkerObj->skipFreq; if(tmp_skip_interval <= 0) @@ -309,6 +324,21 @@ int processTopPattern(char* profileName, Vector* topMarkerList, int profileExec continue; } + if (topMarkerObj->cpuValue) + { + free(topMarkerObj->cpuValue); + topMarkerObj->cpuValue = NULL; + } + if (topMarkerObj->memValue) + { + free(topMarkerObj->memValue); + topMarkerObj->memValue = NULL; + } + if(topMarkerObj->loadAverage) + { + free(topMarkerObj->loadAverage); + topMarkerObj->loadAverage = NULL; + } // If the skip frequency is set, skip the marker processing for this interval int tmp_skip_interval, is_skip_param; diff --git a/source/dcautil/dcaproc.c b/source/dcautil/dcaproc.c index 2a9a4b59..a8795991 100644 --- a/source/dcautil/dcaproc.c +++ b/source/dcautil/dcaproc.c @@ -29,6 +29,7 @@ * @{ **/ +#define _GNU_SOURCE #include #include #include @@ -94,7 +95,8 @@ int getProcUsage(char *processName, TopMarker* marker, char* filename) pid_t *pid = NULL; pid_t *temp = NULL; memset(&pInfo, '\0', sizeof(procMemCpuInfo)); - memcpy(pInfo.processName, processName, strlen(processName) + 1); + strncpy(pInfo.processName, processName, BUF_LEN - 1); + pInfo.processName[BUF_LEN - 1] = '\0'; T2Debug("Command for collecting process info : \n pidof %s", processName); #ifdef LIBSYSWRAPPER_BUILD @@ -229,17 +231,7 @@ int getProcUsage(char *processName, TopMarker* marker, char* filename) if(0 != getProcInfo(&pInfo, filename)) { T2Debug("Process info - CPU: %s, Memory: %s \n", pInfo.cpuUse, pInfo.memUse); - - if (marker->cpuValue) - { - free(marker->cpuValue); - } marker->cpuValue = strdup(pInfo.cpuUse); - - if (marker->memValue) - { - free(marker->memValue); - } marker->memValue = strdup(pInfo.memUse); ret = 1; @@ -533,15 +525,27 @@ int getCPUInfo(procMemCpuInfo *pInfo, char* filename) char top_op[2048] = { '\0' }; int cmd_option = 0; int normalize = 1; + int read_from_file = 0; if(pInfo == NULL) { return 0; } - if((filename != NULL) && (access(filename, F_OK) != 0)) + if(filename != NULL) + { + inFp = fopen(filename, "r"); + } + if(inFp != NULL) { - T2Debug("%s ++in the savad temp log %s is not available \n", __FUNCTION__, filename); + /* TOPTEMP file is available - open directly in C, no shell spawning needed */ + T2Debug("%s ++in the saved temp log %s is available \n", __FUNCTION__, filename); + normalize = TOPITERATION; + read_from_file = 1; + } + else + { + T2Debug("%s ++in the saved temp log %s is not available \n", __FUNCTION__, filename); /* Check Whether -c option is supported */ #ifdef LIBSYSWRAPPER_BUILD ret = v_secure_system(" top -c -n 1 2> /dev/null 1> /dev/null"); @@ -554,53 +558,29 @@ int getCPUInfo(procMemCpuInfo *pInfo, char* filename) } #ifdef INTEL - /* Format Use: `top n 1 | grep Receiver` */ - if ( 1 == cmd_option ) - { + /* Format Use: `top n 1 (-c)` */ #ifdef LIBSYSWRAPPER_BUILD - inFp = v_secure_popen("r", "top -n 1 -c | grep -v grep |grep -i '%s'", pInfo->processName); + inFp = v_secure_popen("r", "top -n 1 %s", (cmd_option == 1) ? "-c" : ""); #else - sprintf(command, "top -n 1 -c | grep -v grep |grep -i '%s'", pInfo->processName); - inFp = popen(command, "r"); + snprintf(command, CMD_LEN, "top -n 1 %s", (cmd_option == 1) ? "-c" : ""); + inFp = popen(command, "r"); #endif - } - else - { -#ifdef LIBSYSWRAPPER_BUILD - inFp = v_secure_popen("r", "top -n 1 | grep -i '%s'", pInfo->processName); -#else - sprintf(command, "top -n 1 | grep -i '%s'", pInfo->processName); - inFp = popen(command, "r"); -#endif - } #else /* ps -C Receiver -o %cpu -o %mem */ //sprintf(command, "ps -C '%s' -o %%cpu -o %%mem | sed 1d", pInfo->processName); #ifdef LIBSYSWRAPPER_BUILD - inFp = v_secure_popen("r", "top -b -n 1 %s | grep -v grep | grep -i '%s'", (cmd_option == 1) ? "-c" : "", pInfo->processName); + inFp = v_secure_popen("r", "top -b -n 1 %s", (cmd_option == 1) ? "-c" : ""); #else - snprintf(command, CMD_LEN, "top -b -n 1 %s | grep -v grep | grep -i '%s'", (cmd_option == 1) ? "-c" : "", pInfo->processName); + snprintf(command, CMD_LEN, "top -b -n 1 %s", (cmd_option == 1) ? "-c" : ""); inFp = popen(command, "r"); #endif #endif } - else - { - T2Debug("%s ++in the savad temp log %s is available \n", __FUNCTION__, filename); -#ifdef LIBSYSWRAPPER_BUILD - inFp = v_secure_popen("r", "cat %s |grep -i '%s'", TOPTEMP, pInfo->processName); -#else - snprintf(command, sizeof(command), "cat %s |grep -i '%s'", filename, pInfo->processName); - inFp = popen(command, "r"); -#endif - normalize = TOPITERATION; - - } if(!(inFp)) { - T2Debug("failed in open v_scure_popen pipe! ret %d\n", pclose_ret); + T2Debug("failed in open v_secure_popen pipe! ret %d\n", pclose_ret); return 0; } @@ -608,6 +588,15 @@ int getCPUInfo(procMemCpuInfo *pInfo, char* filename) #ifdef INTEL while(fgets(top_op, 2048, inFp) != NULL) { + /* match by process name (case-insensitive), PID as fallback */ + if(strcasestr(top_op, pInfo->processName) == NULL) + { + int line_pid = 0; + if(pInfo->pid == NULL || sscanf(top_op, "%d", &line_pid) != 1 || line_pid != pInfo->pid[0]) + { + continue; + } + } if(sscanf(top_op, "%s %s %s %s %s %s %s %s", var1, var2, var3, var4, var5, var6, var7, var8) == 8) { total_cpu_usage += atof(var7); @@ -618,6 +607,15 @@ int getCPUInfo(procMemCpuInfo *pInfo, char* filename) #else while(fgets(top_op, 2048, inFp) != NULL) { + /* match by process name (case-insensitive), PID as fallback */ + if(strcasestr(top_op, pInfo->processName) == NULL) + { + int line_pid = 0; + if(pInfo->pid == NULL || sscanf(top_op, "%d", &line_pid) != 1 || line_pid != pInfo->pid[0]) + { + continue; + } + } if(sscanf(top_op, "%16s %16s %16s %16s %16s %16s %16s %512s %512s %512s", var1, var2, var3, var4, var5, var6, var7, var8, var9, var10) == 10) { total_cpu_usage += atof(var9); @@ -628,19 +626,24 @@ int getCPUInfo(procMemCpuInfo *pInfo, char* filename) snprintf(pInfo->cpuUse, sizeof(pInfo->cpuUse), "%.1lf", (float)(total_cpu_usage / normalize)); T2Debug("calculated CPU total value : %f Normalized value : %.1lf\n", total_cpu_usage, (float)(total_cpu_usage / normalize)); + if(read_from_file) + { + fclose(inFp); + } + else + { #ifdef LIBSYSWRAPPER_BUILD - pclose_ret = v_secure_pclose(inFp); + pclose_ret = v_secure_pclose(inFp); #else - pclose_ret = pclose(inFp); + pclose_ret = pclose(inFp); #endif - - if(pclose_ret != 0) - { - T2Debug("failed in closing pipe! ret %d\n", pclose_ret); + if(pclose_ret != 0) + { + T2Debug("failed in closing pipe! ret %d\n", pclose_ret); + } } - return ret; T2Debug("--out %s", __FUNCTION__); - + return ret; } #else //ENABLE_RDKC_SUPPORT & ENABLE_RDKB_SUPPORT diff --git a/source/dcautil/dcautil.h b/source/dcautil/dcautil.h index 1d25bf25..73487672 100644 --- a/source/dcautil/dcautil.h +++ b/source/dcautil/dcautil.h @@ -53,7 +53,7 @@ void freeGResult(void *data); T2ERROR saveGrepConfig(char *name, Vector* grepMarkerList); T2ERROR getGrepResults(GrepSeekProfile **GSP, Vector *markerList, bool isClearSeekMap, bool check_rotated, char *customLogPath); #define PREFIX_SIZE 5 -#define BUF_LEN 16 +#define BUF_LEN 64 typedef struct proc_info { diff --git a/source/protocol/rbusMethod/rbusmethodinterface.c b/source/protocol/rbusMethod/rbusmethodinterface.c index 5a393299..15ce064b 100644 --- a/source/protocol/rbusMethod/rbusmethodinterface.c +++ b/source/protocol/rbusMethod/rbusmethodinterface.c @@ -26,6 +26,8 @@ #include #include #include +#include +#include #include "reportgen.h" #include "rbusInterface.h" @@ -36,12 +38,48 @@ static pthread_once_t rbusMethodMutexOnce = PTHREAD_ONCE_INIT; static pthread_mutex_t rbusMethodMutex; - +static pthread_cond_t rbusMethodCond; +static clockid_t rbusMethodCondClock = CLOCK_REALTIME; +static bool rbusMethodCallbackDone = false; static bool isRbusMethod = false ; static void sendOverRBUSMethodInit() { + pthread_condattr_t condattr; + int ret; + pthread_mutex_init(&rbusMethodMutex, NULL); + /* Use CLOCK_MONOTONIC so NTP/time-sync adjustments don't cause + * the timed wait to expire too early or block too long. + * Track the actual clock so the timed-wait side always matches. */ + ret = pthread_condattr_init(&condattr); + if (ret != 0) + { + T2Error("pthread_condattr_init failed: %s\n", strerror(ret)); + pthread_cond_init(&rbusMethodCond, NULL); + rbusMethodCondClock = CLOCK_REALTIME; + return; + } + ret = pthread_condattr_setclock(&condattr, CLOCK_MONOTONIC); + if (ret != 0) + { + T2Error("pthread_condattr_setclock failed: %s\n", strerror(ret)); + rbusMethodCondClock = CLOCK_REALTIME; + } + else + { + rbusMethodCondClock = CLOCK_MONOTONIC; + } + if (pthread_cond_init(&rbusMethodCond, &condattr) != 0) + { + T2Error("pthread_cond_init failed, retrying with defaults\n"); + pthread_cond_init(&rbusMethodCond, NULL); + rbusMethodCondClock = CLOCK_REALTIME; + } + if (pthread_condattr_destroy(&condattr) != 0) + { + T2Error("pthread_condattr_destroy failed\n"); + } } static void asyncMethodHandler(rbusHandle_t handle, char const* methodName, rbusError_t retStatus, rbusObject_t params) @@ -50,10 +88,11 @@ static void asyncMethodHandler(rbusHandle_t handle, char const* methodName, rbus (void) params; T2Info("T2 asyncMethodHandler called: %s with return error code = %s \n", methodName, rbusError_ToString(retStatus)); - //Note: The mutex synchronization is handled by the caller (sendReportsOverRBUSMethod) - //which locks the mutex before calling rbusMethodCaller and uses pthread_mutex_trylock - //to wait for this async callback to complete. This callback updates isRbusMethod - //without additional locking as the caller manages the synchronization. + + /* Lock the mutex, update result, signal the waiting caller, then unlock. + * This ensures no cross-thread unlock (which is UB for default mutexes) + * and provides proper memory visibility for isRbusMethod. */ + pthread_mutex_lock(&rbusMethodMutex); if(retStatus == RBUS_ERROR_SUCCESS) { isRbusMethod = true ; @@ -63,6 +102,8 @@ static void asyncMethodHandler(rbusHandle_t handle, char const* methodName, rbus T2Error("Unable to send data over method %s \n", methodName) ; isRbusMethod = false ; } + rbusMethodCallbackDone = true; + pthread_cond_signal(&rbusMethodCond); pthread_mutex_unlock(&rbusMethodMutex); } @@ -109,48 +150,64 @@ T2ERROR sendReportsOverRBUSMethod(char *methodName, Vector* inputParams, char* p rbusValue_Release(value); pthread_once(&rbusMethodMutexOnce, sendOverRBUSMethodInit); + + /* Hold rbusMethodMutex across the entire async-call + wait sequence. + * This serializes concurrent callers so shared globals (isRbusMethod, + * rbusMethodCallbackDone) cannot be clobbered by a second caller while + * an async invocation is in flight. The callback can still acquire the + * mutex because pthread_cond_timedwait atomically releases it. */ pthread_mutex_lock(&rbusMethodMutex); isRbusMethod = false ; + rbusMethodCallbackDone = false; + if ( T2ERROR_SUCCESS == rbusMethodCaller(methodName, &inParams, payload, &asyncMethodHandler)) { - int retry_count = 0; - while (retry_count < MAX_RETRY_ATTEMPTS) + struct timespec ts; + if (clock_gettime(rbusMethodCondClock, &ts) != 0) + { + T2Error("clock_gettime failed: %s (errno=%d)\n", strerror(errno), errno); + pthread_mutex_unlock(&rbusMethodMutex); + rbusObject_Release(inParams); + T2Debug("%s --out\n", __FUNCTION__); + return T2ERROR_FAILURE; + } + ts.tv_sec += MAX_RETRY_ATTEMPTS * 2; /* Total timeout: 10 seconds */ + + while (!rbusMethodCallbackDone) { - if(!(pthread_mutex_trylock(&rbusMethodMutex))) // locking the rbusMethodMutex, in asyncMethodHandler mutex unlock is present + int rc = pthread_cond_timedwait(&rbusMethodCond, &rbusMethodMutex, &ts); + if(rc == ETIMEDOUT) { - if (isRbusMethod) - { - T2Info("Return status of send via rbusMethod is success \n " ); - ret = T2ERROR_SUCCESS ; - } - else - { - T2Info("Return status of send via rbusMethod is failure \n " ); - ret = T2ERROR_NO_RBUS_METHOD_PROVIDER; - } + T2Error("Timed out waiting for rbus method callback. Giving up\n"); break; } - else + else if(rc != 0) { - retry_count++; - /* - * NOTE: CID=190391 is a false positive. - * The rbusMethodMutex is already unlocked within asyncMethodHandler. - * This code path handles the scenario where locking fails again, - * so there is no need to address or fix this warning. - */ - sleep(2); // giving 2 seconds sleep for 5 times which will be equal to waiting for RBUS_METHOD_TIMEOUT value // Removing the pthread_mutex_unlock before sleep as rbusmethodmutex will be unlocked by asyncMethodHandler. - if(retry_count == 5) - { - T2Error("Max attempts reached for rbusmethodlock. Unlocking it\n"); - ret = T2ERROR_NO_RBUS_METHOD_PROVIDER; - } + T2Error("pthread_cond_timedwait failed: %s (rc=%d)\n", strerror(rc), rc); + break; } } + if (rbusMethodCallbackDone && isRbusMethod) + { + T2Info("Return status of send via rbusMethod is success\n"); + ret = T2ERROR_SUCCESS ; + } + else if (rbusMethodCallbackDone) + { + T2Info("Return status of send via rbusMethod is failure\n"); + ret = T2ERROR_NO_RBUS_METHOD_PROVIDER; + } + else + { + /* Timed out — callback never fired */ + ret = T2ERROR_NO_RBUS_METHOD_PROVIDER; + } + pthread_mutex_unlock(&rbusMethodMutex); + } + else + { + pthread_mutex_unlock(&rbusMethodMutex); } - /*rbusMethodMutex is locked before rbusMethodCaller if rbusMethodCaller function returns failure rbusMethodMutex will be unlocked here and in second case if rbusMethodCaller returns success then in asyncMethodHandler already mutex unlock is done which unlocks the rbusMethodMutex - lock done above so adding another lock inside rbusMethodCaller if loop for rbusMethodMutex which later gets unlocked by the below mutex unlock.*/ - pthread_mutex_unlock(&rbusMethodMutex); rbusObject_Release(inParams); T2Debug("%s --out\n", __FUNCTION__); diff --git a/source/scheduler/scheduler.c b/source/scheduler/scheduler.c index 2a8930f3..daeb0bd8 100644 --- a/source/scheduler/scheduler.c +++ b/source/scheduler/scheduler.c @@ -263,7 +263,31 @@ void* TimeoutThread(void *arg) T2Info("Waiting for %d sec for next TIMEOUT for profile as reporting interval is taken - %s\n", tProfile->timeOutDuration, tProfile->name); } } - notifySchedulerstartcb(tProfile->name, true); + /* Release tMutex before calling notifySchedulerstartcb to avoid + * lock-order-inversion: this thread holds tMutex and the callback + * acquires profileListLock(wr), while other threads hold + * profileListLock(rd) and then try to acquire tMutex. + * Copy name to a local buffer first — tProfile->name could be freed + * by freeSchedulerProfile on another thread while tMutex is released. */ + char *profileNameCopy = strdup(tProfile->name); + if(pthread_mutex_unlock(&tProfile->tMutex) != 0) + { + T2Error("tProfile Mutex unlock failed before notifySchedulerstartcb\n"); + } + if(profileNameCopy) + { + notifySchedulerstartcb(profileNameCopy, true); + free(profileNameCopy); + } + else + { + T2Error("Failed to allocate profileNameCopy for notifySchedulerstartcb\n"); + } + if(pthread_mutex_lock(&tProfile->tMutex) != 0) + { + T2Error("tProfile Mutex lock failed after notifySchedulerstartcb\n"); + return NULL; + } //When first reporting interval is given waiting for first report int vale if(tProfile->firstreportint > 0 && tProfile->firstexecution == true ) { diff --git a/source/telemetry2_0.c b/source/telemetry2_0.c index bb92e5a6..5c915abf 100644 --- a/source/telemetry2_0.c +++ b/source/telemetry2_0.c @@ -67,6 +67,13 @@ static bool isDebugEnabled = true; static int initcomplete = 0; static pid_t DAEMONPID; //static varible store the Main Pid +/* Signal-safe flags: sig_handler sets these; main loop dispatches. + * volatile sig_atomic_t is the only type guaranteed safe from signal context. */ +static volatile sig_atomic_t g_sig_shutdown = 0; +static volatile sig_atomic_t g_sig_log_upload = 0; +static volatile sig_atomic_t g_sig_log_upload_ondemand = 0; +static volatile sig_atomic_t g_sig_reload = 0; + T2ERROR initTelemetry() { T2ERROR ret = T2ERROR_FAILURE; @@ -133,114 +140,54 @@ static void terminate() } } -static void _print_stack_backtrace(void) -{ -#ifdef __GNUC__ -#ifndef _BUILD_ANDROID -#ifdef __GLIBC__ - void* tracePtrs[100]; - char** funcNames = NULL; - int i, count = 0; - - count = backtrace( tracePtrs, 100 ); - backtrace_symbols_fd( tracePtrs, count, 2 ); - funcNames = backtrace_symbols( tracePtrs, count ); +void sig_handler(int sig, siginfo_t* info, void* uc) +{ + int saved_errno = errno; + (void)info; + (void)uc; - if ( funcNames ) + if(DAEMONPID != getpid()) { - // Print the stack trace - for( i = 0; i < count; i++ ) + /* Child process: terminate immediately for fatal signals */ + if(!(sig == SIGUSR1 || sig == LOG_UPLOAD || sig == LOG_UPLOAD_ONDEMAND || sig == SIGIO || sig == SIGCHLD || sig == SIGPIPE || sig == SIGUSR2 || sig == EXEC_RELOAD || sig == SIGALRM )) { - printf("%s\n", funcNames[i] ); + _exit(1); } - - // Free the string pointers - free( funcNames ); + errno = saved_errno; + return; } -#endif -#endif -#endif -} -void sig_handler(int sig, siginfo_t* info, void* uc) -{ - if(DAEMONPID == getpid()) + /* POSIX async-signal-safe: only set flags here. + * The main loop polls these flags and performs the actual work. */ + if ( sig == SIGINT || sig == SIGTERM ) { - int fd; - const char *path = "/tmp/telemetry_logupload"; - if ( sig == SIGINT ) - { - T2Info(("SIGINT received!\n")); -#ifndef DEVICE_EXTENDER - uninitXConfClient(); -#endif - ReportProfiles_uninit(); - exit(0); - } - else if ( sig == SIGUSR1 || sig == LOG_UPLOAD ) - { - T2Info(("LOG_UPLOAD received!\n")); - set_retainseekmap(false); - ReportProfiles_Interrupt(); - } - else if (sig == LOG_UPLOAD_ONDEMAND || sig == SIGIO) - { - T2Info(("LOG_UPLOAD_ONDEMAND received!\n")); - ReportProfiles_Interrupt(); - } - else if(sig == SIGUSR2 || sig == EXEC_RELOAD) - { - T2Info(("EXEC_RELOAD received!\n")); - fd = open(path, O_RDONLY | O_CREAT, 0400); - if(fd == -1) - { - T2Warning("Failed to open the file\n"); - } - else - { - T2Debug("File is created\n"); - close(fd); - } -#ifndef DEVICE_EXTENDER - stopXConfClient(); - if(T2ERROR_SUCCESS == startXConfClient()) - { - T2Info("XCONF config reload - SUCCESS \n"); - } - else - { - T2Info("XCONF config reload - IN PROGRESS ... Ignore current reload request \n"); - } -#endif - - } - else if ( sig == SIGTERM || sig == SIGKILL ) - { - terminate(); - exit(0); - } - else + if (g_sig_shutdown) { - /* get stack trace first */ - _print_stack_backtrace(); - terminate(); - T2Warning("[%s][%u] Signal number: %d\n", __FUNCTION__, __LINE__, info->si_signo); - T2Warning("[%s][%u] Signal error: %d\n", __FUNCTION__, __LINE__, info->si_errno); - T2Warning("[%s][%u] Signal code: %d\n", __FUNCTION__, __LINE__, info->si_code); - T2Warning("[%s][%u] Signal value: %d\n", __FUNCTION__, __LINE__, info->si_value.sival_int); - (void)uc; - exit(0); + /* Repeated shutdown signal while terminate() is still running. + * Force immediate exit. _exit() is async-signal-safe. */ + _exit(1); } + g_sig_shutdown = 1; + } + else if ( sig == SIGUSR1 || sig == LOG_UPLOAD ) + { + g_sig_log_upload = 1; + } + else if ( sig == LOG_UPLOAD_ONDEMAND || sig == SIGIO ) + { + g_sig_log_upload_ondemand = 1; + } + else if ( sig == SIGUSR2 || sig == EXEC_RELOAD ) + { + g_sig_reload = 1; } else { - // This logic is added to terminate child imediately if we get terminate signals eg:SIGTERM / SIGKILL ... - if(!(sig == SIGUSR1 || sig == LOG_UPLOAD || sig == LOG_UPLOAD_ONDEMAND || sig == SIGIO || sig == SIGCHLD || sig == SIGPIPE || sig == SIGUSR2 || sig == EXEC_RELOAD || sig == SIGALRM )) - { - exit(0); - } + /* Unknown fatal signal — request shutdown */ + g_sig_shutdown = 1; } + errno = saved_errno; } static void t2DaemonMainModeInit( ) @@ -265,6 +212,7 @@ static void t2DaemonMainModeInit( ) act.sa_sigaction = sig_handler; act.sa_flags = SA_ONSTACK | SA_SIGINFO ; + sigemptyset(&blocking_signal); sigaddset(&blocking_signal, SIGUSR2); sigaddset(&blocking_signal, SIGUSR1); sigaddset(&blocking_signal, LOG_UPLOAD); @@ -277,6 +225,7 @@ static void t2DaemonMainModeInit( ) DAEMONPID = getpid(); // save the pid of the deamon T2Debug("Telemetry 2.0 Process PID %d\n", (int)DAEMONPID); //Debug line + sigaction(SIGINT, &act, NULL); sigaction(SIGTERM, &act, NULL); sigaction(SIGUSR1, &act, NULL); sigaction(LOG_UPLOAD, &act, NULL); @@ -293,9 +242,61 @@ static void t2DaemonMainModeInit( ) T2Info("Telemetry 2.0 Component Init Success\n"); + /* Main dispatch loop: poll signal flags set by sig_handler. + * All signal-unsafe work (uninit, logging, mutex ops) happens here + * in normal thread context, not in signal handler context. */ while(1) { - sleep(30); + if(g_sig_shutdown) + { + T2Info("Shutdown signal received, terminating\n"); + terminate(); + exit(0); + } + if(g_sig_log_upload) + { + g_sig_log_upload = 0; + T2Info("LOG_UPLOAD received!\n"); + set_retainseekmap(false); + ReportProfiles_Interrupt(); + } + if(g_sig_log_upload_ondemand) + { + g_sig_log_upload_ondemand = 0; + T2Info("LOG_UPLOAD_ONDEMAND received!\n"); + ReportProfiles_Interrupt(); + } + if(g_sig_reload) + { + g_sig_reload = 0; + T2Info("EXEC_RELOAD signal received\n"); + { + int fd; + const char *path = "/tmp/telemetry_logupload"; + fd = open(path, O_RDONLY | O_CREAT, 0400); + if(fd == -1) + { + T2Warning("Failed to open the file\n"); + } + else + { + T2Debug("File is created\n"); + close(fd); + } + } +#ifndef DEVICE_EXTENDER + stopXConfClient(); + if(T2ERROR_SUCCESS == startXConfClient()) + { + T2Info("XCONF config reload - SUCCESS \n"); + } + else + { + T2Info("XCONF config reload - IN PROGRESS ... Ignore current reload request \n"); + } +#endif + } + sleep(1); } T2Info("Telemetry 2.0 Process Terminated\n"); } diff --git a/source/test/dcautils/dcautilTest.cpp b/source/test/dcautils/dcautilTest.cpp index 71964594..222a3628 100644 --- a/source/test/dcautils/dcautilTest.cpp +++ b/source/test/dcautils/dcautilTest.cpp @@ -371,7 +371,7 @@ TEST(GETPROCINFO, PMINFO_NULL) memset(&pInfo, '\0', sizeof(procMemCpuInfo)); memcpy(pInfo.processName, processName, strlen(processName) + 1); pInfo.total_instance = 0; - EXPECT_EQ(0,getProcInfo(&pInfo, NULL)); + EXPECT_NE(0,getProcInfo(&pInfo, NULL)); free(filename); } @@ -1085,9 +1085,7 @@ TEST_F(dcaTestFixture, processTopPattern2) //saveTopoutput EXPECT_CALL(*g_systemMock, access(_,_)) - .Times(2) - .WillOnce(Return(0)) - .WillOnce(Return(0)); + .WillRepeatedly(Return(0)); #ifdef LIBSYSWRAPPER_BUILD EXPECT_CALL(*g_systemMock, v_secure_system(_)) .Times(4) @@ -1104,33 +1102,38 @@ TEST_F(dcaTestFixture, processTopPattern2) .WillOnce(Return(0)); #endif FILE* fp = (FILE*)0xFFFFFFFF; - //getProcUsage + //getProcUsage - pidof #ifdef LIBSYSWRAPPER_BUILD EXPECT_CALL(*g_fileIOMock, v_secure_popen(_,_)) - .Times(3) - .WillOnce(Return(fp)) + .Times(2) .WillOnce(Return(fp)) .WillOnce(Return(fp)); #else EXPECT_CALL(*g_fileIOMock, popen(_,_)) - .Times(2) - .WillOnce(Return(fp)) + .Times(1) .WillOnce(Return(fp)); #endif #ifdef LIBSYSWRAPPER_BUILD EXPECT_CALL(*g_fileIOMock, v_secure_pclose(_)) - .Times(3) - .WillOnce(Return(0)) + .Times(2) .WillOnce(Return(0)) .WillOnce(Return(0)); #else EXPECT_CALL(*g_fileIOMock, pclose(_)) - .Times(2) - .WillOnce(Return(0)) + .Times(1) .WillOnce(Return(0)); #endif + // getCPUInfo reads TOPTEMP file directly via fopen + FILE* topfp = (FILE*)0xEEEEEEEE; + EXPECT_CALL(*g_fileIOMock, fopen(_,_)) + .Times(1) + .WillOnce(Return(topfp)); + EXPECT_CALL(*g_fileIOMock, fclose(_)) + .Times(1) + .WillOnce(Return(0)); + //getMEMinfo EXPECT_CALL(*g_fileIOMock, read(_,_,_)) .WillOnce([](int fd, void* buf, size_t count) { diff --git a/source/test/mocks/FileioMock.cpp b/source/test/mocks/FileioMock.cpp index 59f33257..215397be 100644 --- a/source/test/mocks/FileioMock.cpp +++ b/source/test/mocks/FileioMock.cpp @@ -140,9 +140,15 @@ extern "C" int fscanf(FILE *stream, const char *format, ...) va_list args; va_start(args, format); static int call_count = 0; + static FileMock *last_mock = nullptr; if (g_fileIOMock == nullptr){ return fscanf_func(stream, format, args); } + /* Reset call_count when a new mock instance is active (i.e. new test started) */ + if (g_fileIOMock != last_mock) { + call_count = 0; + last_mock = g_fileIOMock; + } if (strcmp(format, "%d") == 0) { int *out_ptr = va_arg(args, int *); if (call_count == 0) { diff --git a/source/xconf-client/xconfclient.c b/source/xconf-client/xconfclient.c index 788ce80e..600f03af 100644 --- a/source/xconf-client/xconfclient.c +++ b/source/xconf-client/xconfclient.c @@ -1099,14 +1099,18 @@ void uninitXConfClient() pthread_cond_signal(&xcThreadCond); pthread_mutex_unlock(&xcThreadMutex); pthread_join(xcrThread, NULL); - pthread_mutex_destroy(&xcMutex); - pthread_mutex_destroy(&xcThreadMutex); - pthread_cond_destroy(&xcCond); - pthread_cond_destroy(&xcThreadCond); #ifdef DCMAGENT bexitDCMThread = false; pthread_join(dcmThread, NULL); #endif + /* Destroy synchronization primitives only after ALL threads using + * them have been joined. Previously these were destroyed between + * the xcrThread join and dcmThread join, creating a window where + * the exiting thread could still reference destroyed mutexes. */ + pthread_mutex_destroy(&xcMutex); + pthread_mutex_destroy(&xcThreadMutex); + pthread_cond_destroy(&xcCond); + pthread_cond_destroy(&xcThreadCond); } else { diff --git a/test/run_l2.sh b/test/run_l2.sh index 00bcdc97..7dbd6410 100755 --- a/test/run_l2.sh +++ b/test/run_l2.sh @@ -23,6 +23,9 @@ export top_srcdir=`pwd` RESULT_DIR="/tmp/l2_test_report" mkdir -p "$RESULT_DIR" +# ThreadSanitizer support (disabled by default, adds ~5-15x runtime overhead) +ENABLE_TSAN=false + if ! grep -q "LOG_PATH=/opt/logs/" /etc/include.properties; then echo "LOG_PATH=/opt/logs/" >> /etc/include.properties fi @@ -46,4 +49,70 @@ else echo "All tests passed successfully." fi +# ThreadSanitizer: rebuild with TSan and re-run the tests to collect race reports. +# Tests above ran against the normal (fast) binary for correctness (final_result). +# TSan adds ~5-15x slowdown which may break timing-sensitive tests, so TSan test +# pass/fail is ignored — we only care about the race detection output. +if [ "$ENABLE_TSAN" = true ]; then + echo "" + echo "=== Rebuilding with ThreadSanitizer for race detection ===" + autoreconf --install + ./configure --enable-tsan + make clean + make + make install + echo "TSan-instrumented build complete." + + export TSAN_OPTIONS="halt_on_error=0 second_deadlock_stack=1 history_size=4 log_path=$RESULT_DIR/tsan.log" + + # Re-run a subset of tests against the TSan-instrumented binary to exercise + # threaded code paths (profile processing, signal handling, report generation). + # Pass/fail is ignored; only TSan log output matters. + # + # We skip test_xconf_communications (~10min without TSan, up to 2.5h with TSan) + # as it primarily exercises HTTP retry logic with less thread coverage. + # To run TSan against ALL suites, uncomment the xconf line below: + # timeout $TSAN_TEST_TIMEOUT pytest -v test/functional-tests/tests/test_xconf_communications.py || true + # + # Timeout guards against potential deadlocks from signal-unsafe calls + # (sig_handler -> T2Log -> malloc can deadlock under TSan interceptors). + # Longest remaining suite is multiprofile_msgpacket (~6min without TSan, + # up to ~90min with TSan overhead). 1800s (30min) allows ample headroom. + # If enabling xconf_communications (~10min normal), increase to 3600. + TSAN_TEST_TIMEOUT=1800 + echo "" + echo "=== Running tests with TSan-instrumented binary (pass/fail ignored, ${TSAN_TEST_TIMEOUT}s timeout per suite) ===" + timeout $TSAN_TEST_TIMEOUT pytest -v test/functional-tests/tests/test_runs_as_daemon.py || true + timeout $TSAN_TEST_TIMEOUT pytest -v test/functional-tests/tests/test_bootup_sequence.py || true + timeout $TSAN_TEST_TIMEOUT pytest -v test/functional-tests/tests/test_multiprofile_msgpacket.py || true + # Allow TSan log files to flush + sleep 2 + + # Report TSan findings (only telemetry source files, not external dependencies) + tsan_files=$(find "$RESULT_DIR" -name "tsan.log*" -size +0c 2>/dev/null) + if [ -n "$tsan_files" ]; then + echo "" + echo "=== ThreadSanitizer Report (telemetry source only) ===" + total_warnings=0 + for f in $tsan_files; do + # Filter to only show warnings involving telemetry source files + telemetry_warnings=$(grep -A 20 "WARNING: ThreadSanitizer:" "$f" 2>/dev/null | grep -c "L2_CONTAINER_SHARED_VOLUME/source/" 2>/dev/null) || telemetry_warnings=0 + if [ "$telemetry_warnings" -gt 0 ]; then + echo "" + echo "--- $f ---" + # Print only SUMMARY lines that reference telemetry source + grep "SUMMARY: ThreadSanitizer:" "$f" 2>/dev/null | grep "L2_CONTAINER_SHARED_VOLUME/source/" | sort -u + count=$(grep "SUMMARY: ThreadSanitizer:" "$f" 2>/dev/null | grep -c "L2_CONTAINER_SHARED_VOLUME/source/" 2>/dev/null) || count=0 + total_warnings=$((total_warnings + count)) + fi + done + echo "" + echo "=== TSan Summary: $total_warnings telemetry-specific warning(s) ===" + echo "Full logs (including external deps): $RESULT_DIR/tsan.log*" + else + echo "" + echo "=== ThreadSanitizer: No data races detected ===" + fi +fi + exit $final_result