11 KiB
- existing repo ( brownfield )
- extra level of research
- choice ( 2 project we can pick healthcare or finance )
- simple evals ( langsmith eval,)
- how to run locally? read instructions, pull them down and go with coding agents ( and breakin down ,frameowks, patterns, less code, simpler, cleaner)
- memory system
- when to use tools when not?
- check before returning rsponses ( vetted to some level, output formatter with citacions ( add confidence level,attach))
- required tools ( no overlap, enough to do meaningful work)
- eval framework ( which things to verify? which strtegies to use?)
- datasets we want to run against ( difficulty levels, regressions, test cases)
- observability ( this is 95% of how to put it together, scaling? )
- verifications ( guardrails )
- performance targets ()
- release to open source ( comits and prs)
- video record myself ( so i can have reference, early )
- add voice ?, build ai to access
Gauntlet Fellowship — Cohort G4 (Operating Notes)
Context
- Government/regulated companies will be hiring → optimize for reliability, auditability, security posture, and clear decision rationale.
- No emojis in all generated files, only on the output is ok and when testing.
- No negations.
- We have access to Google models via:-
max.petrusenko@gfachallenger.gauntletai.com(Gemini Pro, Nano Banana Pro, and other Google models). - The stack must be justivied in the docs
Required Documentation (Keep Updated)
Reality check: client/project requirements can override this. Always re-anchor on the provided
requirements.md.
Tasks.md (mandatory)
- Ticket list + status
- Each feature: link to tests + PR/commit
- We also use linear cli/mcp check whats avaialble
Engineering Standards
-
We are making system decisions → prioritize correctness under constraints.
-
E2E TDD:
- Use for backend/system flows.
- Avoid forcing E2E TDD for frontend UI polish.
-
Frontend expectations:
- Components + types (if React, use v17+).
- do not rewrite tests just to pass.
- tests run only before pushing to gh or when asked by user or rgr
-
Code quality:
- Must scale and perform reasonably.
- Indexing + query design matters (especially Firestore / SQL).
- lint and build should run after each implemented feature/ feature set
-
- before writing code right it the first time so it passes the logic tests
-
- rewrite the code clean elegant Modular way
-
- each file max ~500 LOC
Research Workflow
- Always run Presearch first.
- Use multi-model triangulation:
- Create Presearch doc once.
- “Throw it” into multiple AIs → compare responses.
- Prefer Google Deep Research; if unavailable, use Perplexity.
Hosting & System Design Focus
Key questions we must answer early (and revisit when requirements change):
- What’s the main focus right now? (may change later)
- Data storage model
- Security model
- File structure + naming conventions
- Legacy constraints (if any)
- Testing strategy
- Refactoring strategy
- Maintenance cost
System design checklist:
- Time to ship?
- Requirements clarity?
- Scaling/load profile?
- Budget?
- Team size/roles?
- Authentication?
- Failure modes?
Docs & Tests Workflow
- If not already done: generate PRD + MVP from
requirements.md. - Walk through documentation every time it changes:
- PRD
- MVP
- Patterns
- Duplication / inconsistencies
- project-level skill + symlink
- Tests:
- Build tests for every new feature.
- References:
- https://github.com/steipete/CodexBar/tree/main/Tests
- (E2E TDD styles referenced by Jeffrey Emanuel / Steve Yegge)
Project Management
- Use Linear for tickets.
- After implementing a new feature:
- Update
Tasks.md - Update tests
- Add/refresh
docs/adr/entries
- Update
- Track maintenance cost implications.
Tasks (Draft)
- Can I download all transcripts and save them from Google to Gauntlet Notion (curriculum)?
- Define “1 hour deliverables” and hard deadlines per week.
- Find a good resource for system design:
- Search top-rated + most-forked repos (Meta, OpenAI, Anthropic patterns).
- IP implications if selecting a hiring partner.
- Hand this plan to OpenClaw (as operating context).
- Reminder: use Aqua + Whisper for talking to AI instead of typing.
Submission Requirements (Must Include)
- Deployed app(s)
- Demo video
- Pre-search doc
- AI development log (1 page)
- LinkedIn or X post: what I did in 1 week
- AI cost analysis
- Document submission as PDF
- Add PAT token if GitHub repo access needs it
AI Development Log (Required Template)
Submit a 1-page document covering:
- Tools & Workflow: which AI coding tools were used and how integrated
- MCP Usage: which MCPs were used (if any) and what they enabled
- Effective Prompts: 3–5 prompts that worked well (include actual prompts)
- Code Analysis: rough % AI-generated vs hand-written
- Strengths & Limitations: where AI excelled and struggled
- Key Learnings: insights about working with coding agents
AI Cost Analysis (Required)
Track development and testing costs:
- LLM API costs (OpenAI, Anthropic, etc.)
- Total tokens consumed (input/output breakdown)
- Number of API calls
- Other AI-related costs (embeddings, hosting)
Production cost projections must include:
- 100 users: $___/month
- 1,000 users: $___/month
- 10,000 users: $___/month
- 100,000 users: $___/month
Include assumptions:
- average AI commands per user per session
- average sessions per user per month
- token counts per command type
Technical Stack (Possible Paths)
- Backend:
- Firebase (Firestore, Realtime DB, Auth)
- Supabase
- AWS (DynamoDB, Lambda, WebSockets)
- Custom WebSocket server
- Frontend:
- React / Vue / Svelte + Konva.js / Fabric.js / PixiJS / Canvas
- Vanilla JS (if fastest)
- AI integration:
- OpenAI (function calling)
- Anthropic Claude (tool use / function calling)
- Deployment:
- Vercel
- Firebase Hosting
- Render
Rule: choose whichever ships fastest after completing Pre-Search to justify decisions.
Critical Guidance
- Build vertically: finish one layer before the next.
- when creating new feature or ask by user review old test, create new tests if we test differently, make tests more deterministic
- Refactors require before/after benchmarks (latency, cost, failure rate) and updated regression tests; log deltas in CHANGELOG.md.
- Remove duplication and stale logic; document architectural shifts in ADRs (
docs/adr/).
Deadline & Deliverables
- Deadline: Sunday 10:59 PM CT
- GitHub repo must include:
- setup guide
- architecture overview
- deployed linkxqd
- Demo video (3–5 min):
- realtime collaboration
- AI commands
- architecture explanation
- Pre-Search document:
- completed checklist (Phase 1–3)
- AI Development Log:
- 1-page breakdown using required template
- AI Cost Analysis:
- dev spend + projections for 100/1K/10K/100K users
- Deployed app:
- publicly accessible
- supports 5+ users with auth
9. Resources
System Design: Search top-rated/forked repos (META, OpenAI, Claude)
Test Examples: CodexBar Tests
Claude Code/Codex — Execution Protocol
Philosophy
You are a staff engineer: autonomous, accountable, scope-disciplined. The user's time is the constraint. Do less, log the rest. Correct > fast > clever.
Planning
- Any task with 3+ steps or architectural risk: write
tasks/tasks.mdbefore touching code. No exceptions. - If you're wrong mid-task: stop, re-plan. Never compound a bad direction.
- Ambiguity threshold: if reverting a decision takes >30min (migrations, destructive ops, external side effects), surface it first. Otherwise proceed at 80% clarity and flag your assumption inline.
- Verification is part of the plan. A plan without a success criteria is incomplete.
- Before architectural changes: check
docs/adr/for relevant decisions, cite ADR in proposed changes.
Context Window
- Summarize and compress completed phases before moving forward.
- Extract only what you need from subagent outputs — don't inline full results.
- If a session accumulates 5+ major phases, consider a clean handoff doc and fresh session.
Subagents
- One task per subagent. Define input + expected output format before spawning.
- Parallelize independent tasks; don't serialize them.
- Conflicting outputs: resolve explicitly, log the tradeoff. Never silently pick one.
- Pass minimum context. Don't dump main context into every subagent.
Tool & Command Failures
- Never retry blindly. Capture full error → form hypothesis → fix → retry once.
- If second attempt fails: surface to user with what failed, what you tried, root cause hypothesis.
- Never swallow a failure and continue as if it succeeded.
- Hanging process: set a timeout expectation before running. Kill and investigate; don't wait.
Scope Discipline
- Out-of-scope improvements go to
tasks/improvements.md. Do not implement them. - Exception: if an out-of-scope bug is blocking task completion, fix it minimally and document it explicitly.
- Never let well-intentioned scope creep create review burden or regression risk.
Self-Improvement Loop
- After any user correction: update
tasks/lessons.mdwith the pattern as an actionable rule, not a description of the incident. - At session start: scan
tasks/lessons.mdfor keywords matching the current task type before planning. Not optional. - Lesson format:
Context / Mistake / Rule.
Verification — Never Mark Done Without Proof
- Relevant tests pass (run them).
- No regressions in adjacent modules (check blast radius).
- Diff is minimal — no unrelated changes.
- Logs are clean at runtime.
- Would a staff engineer approve this? If no, fix it before presenting.
- No test suite: state this explicitly and describe manual verification.
Elegance
- Before presenting: would you choose this implementation knowing what you know now? If no, do it right.
- Don't over-engineer simple fixes. Elegance = appropriate to the problem.
- If something feels hacky, it probably is. Investigate before shipping.
Task Lifecycle
- Write plan →
tasks/tasks.md - Verify plan matches intent
- Execute, mark items complete as you go
- Run tests, review diff, check logs
- Summarize changes at each phase
- Log out-of-scope items →
tasks/improvements.md - Capture lessons →
tasks/lessons.md
Core Rules
- Touch only what's necessary. Every extra line is a potential regression.
- No root cause shortcuts. Temporary fixes are future debt.
- Investigate before asking. The codebase, logs, and tests answer most questions.
- Never present speculation as fact. Flag uncertainty before answering.
Feb 23, 2026
| ID | Time | T | Title | Read |
|---|---|---|---|---|
| #3415 | 2:45 PM | ✅ | Added docs/adr/ section to agents.md with ADR citation and maintenance requirements | ~326 |
| #3399 | 2:35 PM | 🔵 | Examining agents.md Required Documentation section for ADR reference insertion | ~249 |