MediumRetrievalPython 3

RAG Chunk Selector

Rank retrieval chunks while deduplicating repeated text and limiting per-document dominance.

30m1 sample tests1 hidden tests

Select retrieval chunks while avoiding duplicate text and overloading one document.

Requirements

  • Define select_chunks(chunks, k, max_per_doc=2).
  • Each chunk is a dict with id, doc_id, text, and score.
  • Return selected chunk IDs.
  • Higher score wins.
  • Ties sort by id alphabetically.
  • Skip chunks whose normalized text was already selected.
  • Normalized text is lowercase with whitespace collapsed.
  • Select at most max_per_doc chunks per document.

Example

python
1chunks = [ 2 {"id": "a", "doc_id": "d1", "text": "Hello world", "score": 0.9}, 3 {"id": "b", "doc_id": "d2", "text": "hello world", "score": 0.8}, 4] 5assert select_chunks(chunks, 2) == ["a"]

Constraints

  • Do not mutate input chunks.
  • Use deterministic ordering.
  • Empty or whitespace-only text can still be selected once.

Editor
Results
Run sample tests or submit all tests.