MediumRetrievalPython 3
RAG Chunk Selector
Rank retrieval chunks while deduplicating repeated text and limiting per-document dominance.
30m1 sample tests1 hidden tests
Select retrieval chunks while avoiding duplicate text and overloading one document.
Requirements
- Define
select_chunks(chunks, k, max_per_doc=2). - Each chunk is a dict with
id,doc_id,text, andscore. - Return selected chunk IDs.
- Higher
scorewins. - Ties sort by
idalphabetically. - Skip chunks whose normalized text was already selected.
- Normalized text is lowercase with whitespace collapsed.
- Select at most
max_per_docchunks per document.
Example
python
1chunks = [
2 {"id": "a", "doc_id": "d1", "text": "Hello world", "score": 0.9},
3 {"id": "b", "doc_id": "d2", "text": "hello world", "score": 0.8},
4]
5assert select_chunks(chunks, 2) == ["a"]Constraints
- Do not mutate input chunks.
- Use deterministic ordering.
- Empty or whitespace-only text can still be selected once.
Editor
Results
Run sample tests or submit all tests.