Skip to content

feat: Improves meilisearch configuration step#38384

Open
farhaanbukhsh wants to merge 5 commits intoopenedx:masterfrom
open-craft:farhaan/improve-meilisearch-configuration
Open

feat: Improves meilisearch configuration step#38384
farhaanbukhsh wants to merge 5 commits intoopenedx:masterfrom
open-craft:farhaan/improve-meilisearch-configuration

Conversation

@farhaanbukhsh
Copy link
Copy Markdown
Member

@farhaanbukhsh farhaanbukhsh commented Apr 20, 2026

Description

The changes here are to add a Drift Calculator for the Meilisearch index to help us configure Meiliseach at the fresh installation or upgrade. This mechanism triggers on each run of migrate so wether it is a new installation or upgrade. This makes sure that we calculate and gauge the status of Meiliseach studio index and have a plan to mitigate it.

Useful information to include:

We try to caculate and see here how far the changes have gone from codebase and try to bring it back to codebase. Specially with change in PK there is not much we can do we have to drop the index and create and configure a new one.

We are making use to make that happen when migrate runs. Hence, whenever ./manage.py cms migrate runs this command follows it. The diff is calculate so only when an action is needed it will be taken into effect.

Supporting information

  1. Change to this branch in edx-platform
  2. Stop all the containers tutor dev stop
  3. Build the openedx image tutor images build openedx-dev
  4. tutor dev start -d
  5. tutor dev status | rg meilisearch --> This helps us find out if meilisearch is running.
  6. Drop into the shell tutor dev exec -it cms -- /bin/bash
  7. Run ./manage.py cms migrate
  8. There should be a line api.py:580 - Index is populated and correctly configured. No action needed. in the log

Lets do a bit more tests

  1. Now from bash shell open the django shell ./manage.py cms shell
  from openedx.core.djangoapps.content.search.api import (                                                                                                                 
       _get_meilisearch_client,                                                                                                                                             
       _wait_for_meili_task,                                                                                                                                                
       STUDIO_INDEX_NAME,                                                                                                                                                   
   )                                                                                                                                                                        
                                                                                                                                                                            
   client = _get_meilisearch_client()                                                                                                                                       
   index = client.get_index(STUDIO_INDEX_NAME)                                                                                                                              
                                                                                                                                                                            
   # Break a setting to simulate drift                                                                                                                                      
   _wait_for_meili_task(index.update_sortable_attributes(["display_name"]))                                                                                                 
   print("Introduced drift: removed some sortable attributes")
  1. This introduces an anomaly and we should see if the code fixes it.
  2. Drop out of the Django shell after running the above code and run ./manage.py cms migrate again, you will the script is fixing the changes.
  3. I used the below script to check the status of the index while developing
   from openedx.core.djangoapps.content.search.api import (                                                                                                                 
       _get_meilisearch_client,                                                                                                                                             
       _detect_index_drift,                                                                                                                                                 
       STUDIO_INDEX_NAME,                                                                                                                                                   
   )                                                                                                                                                                        
                                                                                                                                                                            
   client = _get_meilisearch_client()                                                                                                                                       
   drift = _detect_index_drift(STUDIO_INDEX_NAME)                                                                                                                           
                                                                                                                                                                            
   print(f"Index: {STUDIO_INDEX_NAME}")                                                                                                                                     
   print(f"  exists:                      {drift.exists}")                                                                                                                  
   print(f"  is_empty:                    {drift.is_empty}")                                                                                                                
   print(f"  primary_key_correct:         {drift.primary_key_correct}")                                                                                                     
   print(f"  distinct_attribute_match:    {drift.distinct_attribute_match}")                                                                                                
   print(f"  filterable_attributes_match: {drift.filterable_attributes_match}")                                                                                             
   print(f"  searchable_attributes_match: {drift.searchable_attributes_match}")                                                                                             
   print(f"  sortable_attributes_match:   {drift.sortable_attributes_match}")                                                                                               
   print(f"  ranking_rules_match:         {drift.ranking_rules_match}")                                                                                                     
   print(f"  ---")                                                                                                                                                          
   print(f"  is_settings_drifted:         {drift.is_settings_drifted}")                                                                                                                                                                                                       

Deadline

ASAP

Other information

TBD
Private Ref: BB-10767

@openedx-webhooks openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Apr 20, 2026
@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @farhaanbukhsh!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

…up with a migration plan

and configuration plan depending on the state. This introduces a mechanism it or a drift engine which drill down the Meiliseach configuration
and figures out what has changed:

- settings
- primary key

depending on the change we follow a strategy wether to migrate the data or recreate the index

Signed-off-by: Farhaan Bukhsh <farhaan@opencraft.com>
@farhaanbukhsh farhaanbukhsh force-pushed the farhaan/improve-meilisearch-configuration branch from e37d5a1 to 71169b2 Compare April 20, 2026 12:54
…index.

Signed-off-by: Farhaan Bukhsh <farhaan@opencraft.com>
Copy link
Copy Markdown
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great!

class Command(BaseCommand):
"""
Build or re-build the Meilisearch search index for courses and libraries in Studio.
Queue incremental population of the Meilisearch search index for Studio.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Queue incremental population of the Meilisearch search index for Studio.
Add all course and library content to the Studio search index.

I think "queue incremental population" sounds a bit weird because of how "population" is being used. So I just suggest re-wording this a bit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright :)

"""

# TODO: improve this - see https://github.com/openedx/edx-platform/issues/36868
help = "Queue incremental population of the Studio Meilisearch search index via Celery."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help = "Queue incremental population of the Studio Meilisearch search index via Celery."
help = "Add all course and library content to the Studio search index."

Comment on lines +72 to +75
raise CommandError(
"The --incremental flag has been removed. "
"Incremental population is now the default behavior of this command."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could print this as a warning but still run the command ? Because what the user is asking for is the same as what you're about to do.

"""

exists: bool
is_empty: bool | None = field(default=None) # None if index doesn't exist
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_empty: bool | None = field(default=None) # None if index doesn't exist
is_empty: bool | None = None # None if index doesn't exist

Nit: For all of these attrs fields, I think you can just use = None to set the default value. You only really need to use field(...) to specify a default with a factory function, like field(factory=list) for empty list defaults.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw attrs documentation and they were using fields to set default values, hence I made that call. :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use both in the documentation :)

Screenshot 2026-04-21 at 12 20 49 PM

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh! thank you for pointing this out :)

@property
def is_drifted(self) -> bool:
"""True if settings are drifted OR primary key is incorrect."""
return self.is_settings_drifted or self.primary_key_correct is False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would be good to have a comment here explaining why we have is_settings_drifted separate from is_drifted (Because changing the primary key requires re-creating the index?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_drifted and is_configured both helped me a lot in writing proper tests. I am just wondering about removing these.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, feel free to leave them both in .

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed them and changed the tests a bit. I think it's good to remove code so that there is less confusion when we are maintaining it.

Comment on lines +473 to +475
Unlike _configure_index() which fires-and-forgets, this function waits for
confirmation that each setting has been applied. This is used during reconciliation
where we need a definitive state before returning.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about combining these into one function with a wait=True/False parameter?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with it, it makes sense to do it.

if not drift.exists:
status_cb("Studio search index not found. Creating and configuring...")
reset_index(status_cb)
status_cb("Index created. Run ./manage.py cms reindex_studio to populate.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
status_cb("Index created. Run ./manage.py cms reindex_studio to populate.")
status_cb("Index created. Run './manage.py cms reindex_studio' to populate.")

return

reset_index(status_cb)
# CASE: Index populated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# CASE: Index populated
# CASE: Index populated, but configuration is outdated

"""
Initialize the Meilisearch index, creating it and configuring it if it doesn't exist.

This is a compatibility wrapper around reconcile_index().
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mark this API method as deprecated?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we do that? Log a warning that it is going to be deprecated?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just mention in the docstring that it is deprecated as of Verawood. I don't think the formal DEPR process is necessary just for one API method like this that isn't super well known nor widely used, when there's a new alternative.

_apply_settings_with_waits(STUDIO_INDEX_NAME, status_cb)
warn_cb(
"Settings applied. Meilisearch will re-index documents in the background. "
"Consider running ./manage.py cms reindex_studio for a full rebuild "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Consider running ./manage.py cms reindex_studio for a full rebuild "
"Consider running './manage.py cms reindex_studio' for a full rebuild "

…d come up with a migration plan and configuration plan depending on the state. This introduces a mechanism it or a drift engine which drill down the Meiliseach configuration and figures out what has changed:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core contributor PR author is a Core Contributor (who may or may not have write access to this repo). open-source-contribution PR author is not from Axim or 2U

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

4 participants