Skip to content

Error handling for Grobid when not responding#35

Open
Sanakhamassi wants to merge 6 commits intoScienciaLAB:mainfrom
Sanakhamassi:fix/Display-an-error-message-when-grobid-is-not-responding
Open

Error handling for Grobid when not responding#35
Sanakhamassi wants to merge 6 commits intoScienciaLAB:mainfrom
Sanakhamassi:fix/Display-an-error-message-when-grobid-is-not-responding

Conversation

@Sanakhamassi
Copy link
Copy Markdown
Contributor

@Sanakhamassi Sanakhamassi commented Apr 23, 2026

Related to issue #11

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds explicit error handling for Grobid failures so the Streamlit UI can surface a clear “please try later” message instead of failing ambiguously (issue #11).

Changes:

  • Introduce GrobidServiceError and raise it when Grobid errors or returns non-200.
  • Catch GrobidServiceError in the Streamlit upload/embedding flow and display an error message.
  • Add a (currently redundant) guard in DocumentQAEngine for missing Grobid output.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
streamlit_app.py Catches Grobid failures during embedding creation and shows a user-facing error.
document_qa/grobid_processors.py Defines GrobidServiceError and raises it from Grobid processing on failure/non-200.
document_qa/document_qa_engine.py Imports/raises GrobidServiceError when Grobid structure is missing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread streamlit_app.py Outdated
Comment thread streamlit_app.py Outdated
Comment thread document_qa/grobid_processors.py
Comment thread document_qa/document_qa_engine.py
Comment thread document_qa/grobid_processors.py
Copy link
Copy Markdown
Collaborator

@lfoppiano lfoppiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a missing space, the rest looks fine. I did not test it, so please make sure you tested it before merge/squash.

Comment thread streamlit_app.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread streamlit_app.py Outdated
Comment on lines +319 to +327
tmp_file = NamedTemporaryFile()
tmp_file.write(bytearray(binary))
st.session_state['binary'] = binary

st.session_state['doc_id'] = hash = st.session_state['rqa'][model].create_memory_embeddings(
tmp_file.name,
chunk_size=chunk_size,
perc_overlap=0.1
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @Sanakhamassi here you need to either add tempFile in the with () or handle that somehow

Comment thread streamlit_app.py Outdated
st.session_state['doc_id'] = None
st.session_state['loaded_embeddings'] = False
st.session_state['uploaded'] = False
st.error(f"{message} Please try later.")
Comment on lines 221 to 223
if grobid_url:
self.grobid_processor = GrobidProcessor(grobid_url)
self.grobid_processor = GrobidProcessor(grobid_url, ping_server=False)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sanakhamassi why was this removed?

Comment on lines +108 to +125
try:
pdf_file, status, text = self.grobid_client.process_pdf("processFulltextDocument",
input_path,
consolidate_header=True,
consolidate_citations=False,
segment_sentences=False,
tei_coordinates=coordinates,
include_raw_citations=False,
include_raw_affiliations=False,
generateIDs=True)
except Exception as exc:
raise GrobidServiceError("Grobid service did not respond.") from exc

if status != 200:
return
raise GrobidServiceError(
f"Grobid service returned status {status}.",
status_code=status
)
Copy link
Copy Markdown
Collaborator

@lfoppiano lfoppiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better than before, however there are few changes that needs to be done.

Comment on lines 221 to 223
if grobid_url:
self.grobid_processor = GrobidProcessor(grobid_url)
self.grobid_processor = GrobidProcessor(grobid_url, ping_server=False)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sanakhamassi why was this removed?

Comment thread streamlit_app.py Outdated
Comment thread streamlit_app.py Outdated
Comment on lines +319 to +327
tmp_file = NamedTemporaryFile()
tmp_file.write(bytearray(binary))
st.session_state['binary'] = binary

st.session_state['doc_id'] = hash = st.session_state['rqa'][model].create_memory_embeddings(
tmp_file.name,
chunk_size=chunk_size,
perc_overlap=0.1
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @Sanakhamassi here you need to either add tempFile in the with () or handle that somehow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants