Never let an agent evaluate its own work. Agents exhibit optimistic bias when self-grading. Use sprint contracts as evaluation interface, communicate via files not shared context. Key insight: harness components encode model-limitation assumptions that go stale — periodically test if each component still adds value.