- existing repo ( brownfield )
- extra level of research 
- choice ( 2 project we can pick healthcare or finance )
- simple evals ( langsmith eval,)
- how to run locally? read instructions, pull them down and go with coding agents ( and breakin down ,frameowks, patterns, less code, simpler, cleaner)
- memory system
- when to use tools when not?
- check before returning rsponses ( vetted to some level, output formatter with citacions ( add confidence level,attach))
- required tools ( no overlap, enough to do meaningful work)
- eval framework ( which things to verify? which strtegies to use?)
- datasets we want to run against ( difficulty levels, regressions, test cases)
- observability ( this is 95% of how to put it together, scaling? )
- verifications ( guardrails )
- performance targets ()
- release to open source ( comits and prs)
- video record myself ( so i can have reference, early )
- add voice ?, build ai to access

-----------------------------------------
# Gauntlet Fellowship — Cohort G4 (Operating Notes)

## Context

- Government/regulated companies will be hiring → optimize for **reliability, auditability, security posture, and clear decision rationale**.
- No emojis in all generated files, only on the output is ok and when testing.
- No negations.
- We have access to Google models via:- `max.petrusenko@gfachallenger.gauntletai.com` (Gemini Pro, Nano Banana Pro, and other Google models).
- The stack must be justivied in the docs

## Required Documentation (Keep Updated)

> Reality check: client/project requirements can override this. Always re-anchor on the provided `requirements.md`.

### `docs/adr/` (Architecture Decision Records - mandatory for architectural changes)
- Check before any structural/architectural changes
- Cite relevant ADR in proposed changes
- Update ADR after refactors (prevents drift)
- Template: Context, Options (with rejected reasons), Decision, Trade-offs, What would change mind

### `Tasks.md` (mandatory)
- Ticket list + status
- Each feature: link to tests + PR/commit
- We also use linear cli/mcp check whats avaialble


## Engineering Standards

- We are making **system decisions** → prioritize correctness under constraints.

- **E2E TDD**:
  - Use for backend/system flows.
  - Avoid forcing E2E TDD for frontend UI polish.
- Frontend expectations:
  - Components + types (if React, use **v17+**).
  - **do not rewrite tests just to pass**.
  - tests run only before pushing to gh or when asked by user or rgr
- Code quality:
  - Must scale and perform reasonably.
  - Indexing + query design matters (especially Firestore / SQL).
  - lint and build should run after each implemented feature/ feature set
  - 1. before writing code right it the first time so it passes the logic tests
  - 2. rewrite the code clean elegant Modular way
  - 3. each file max ~500 LOC


---

## Research Workflow

- Always run **Presearch** first.
- Use **multi-model triangulation**:
  - Create Presearch doc once.
  - “Throw it” into multiple AIs → compare responses.
- Prefer Google Deep Research; if unavailable, use Perplexity.

---

## Hosting & System Design Focus

Key questions we must answer early (and revisit when requirements change):

- What’s the main focus *right now*? (may change later)
- Data storage model
- Security model
- File structure + naming conventions
- Legacy constraints (if any)
- Testing strategy
- Refactoring strategy
- Maintenance cost

System design checklist:
- Time to ship?
- Requirements clarity?
- Scaling/load profile?
- Budget?
- Team size/roles?
- Authentication?
- Failure modes?

---

## Docs & Tests Workflow

- If not already done: generate **PRD + MVP** from `requirements.md`.
- Walk through documentation *every time it changes*:
  - PRD
  - MVP
  - Patterns
  - Duplication / inconsistencies
  - project-level skill + symlink
- Tests:
  - Build tests for every new feature.
  - References:
    - https://github.com/steipete/CodexBar/tree/main/Tests
    - (E2E TDD styles referenced by Jeffrey Emanuel / Steve Yegge)

---

## Project Management

- Use **Linear** for tickets.
- After implementing a new feature:
  - Update `Tasks.md`
  - Update tests
  - Create or update ADR in `docs/adr/` (for architectural changes)
- Track maintenance cost implications.

---

## Tasks (Draft)

1. Can I download all transcripts and save them from Google to Gauntlet Notion (curriculum)?
2. Define “1 hour deliverables” and hard deadlines per week.
3. Find a good resource for system design:
   - Search top-rated + most-forked repos (Meta, OpenAI, Anthropic patterns).
4. IP implications if selecting a hiring partner.
6. Hand this plan to OpenClaw (as operating context).
7. Reminder: use Aqua + Whisper for talking to AI instead of typing.

---

## Submission Requirements (Must Include)

- Deployed app(s)
- Demo video
- Pre-search doc
- AI development log (1 page)
- LinkedIn or X post: what I did in 1 week
- AI cost analysis
- Document submission as **PDF**
- Add **PAT token** if GitHub repo access needs it


---

## AI Development Log (Required Template)

Submit a 1-page document covering:

- Tools & Workflow: which AI coding tools were used and how integrated
- MCP Usage: which MCPs were used (if any) and what they enabled
- Effective Prompts: 3–5 prompts that worked well (include actual prompts)
- Code Analysis: rough % AI-generated vs hand-written
- Strengths & Limitations: where AI excelled and struggled
- Key Learnings: insights about working with coding agents

---

## AI Cost Analysis (Required)

Track development and testing costs:

- LLM API costs (OpenAI, Anthropic, etc.)
- Total tokens consumed (input/output breakdown)
- Number of API calls
- Other AI-related costs (embeddings, hosting)

Production cost projections must include:

- 100 users: $___/month
- 1,000 users: $___/month
- 10,000 users: $___/month
- 100,000 users: $___/month

Include assumptions:

- average AI commands per user per session
- average sessions per user per month
- token counts per command type

---

## Technical Stack (Possible Paths)

- Backend:
  - Firebase (Firestore, Realtime DB, Auth)
  - Supabase
  - AWS (DynamoDB, Lambda, WebSockets)
  - Custom WebSocket server
- Frontend:
  - React / Vue / Svelte + Konva.js / Fabric.js / PixiJS / Canvas
  - Vanilla JS (if fastest)
- AI integration:
  - OpenAI (function calling)
  - Anthropic Claude (tool use / function calling)
- Deployment:
  - Vercel
  - Firebase Hosting
  - Render

> Rule: choose whichever ships fastest **after** completing Pre-Search to justify decisions.

---

## Build Strategy (Priority Order)

1. Cursor sync — two cursors moving across browsers
2. Object sync — sticky notes appear for all users
3. Conflict handling — simultaneous edits
4. State persistence — survive refresh + reconnect
5. Board features — shapes, frames, connectors, transforms
6. AI commands (basic) — single-step creation/manipulation
7. AI commands (complex) — multi-step template generation

---

## Critical Guidance


- Test simultaneous AI commands from multiple users.
- when creating new feature or ask by user review old test, create new tests if we test differently, make tests more deterministic
- Refactors require before/after benchmarks (latency, cost, failure rate) and updated regression tests; log deltas in CHANGELOG.md.
- Remove duplication and stale logic; document architectural shifts in ADRs (`docs/adr/`).

---

## Deadline & Deliverables

- Deadline: Sunday 10:59 PM CT
- GitHub repo must include:
  - setup guide
  - architecture overview
  - deployed linkxqd
- Demo video (3–5 min):
  - realtime collaboration
  - AI commands
  - architecture explanation
- Pre-Search document:
  - completed checklist (Phase 1–3)
- AI Development Log:
  - 1-page breakdown using required template
- AI Cost Analysis:
  - dev spend + projections for 100/1K/10K/100K users
- Deployed app:
  - publicly accessible
  - supports 5+ users with auth
  ## 9. Resources

**System Design**: Search top-rated/forked repos (META, OpenAI, Claude)

**Test Examples**: [CodexBar Tests](https://github.com/steipete/CodexBar/tree/main/Tests)


# Claude Code/Codex — Execution Protocol

## Philosophy
You are a staff engineer: autonomous, accountable, scope-disciplined. The user's time is the constraint. Do less, log the rest. Correct > fast > clever.

---

## Planning
- Any task with 3+ steps or architectural risk: write `tasks/tasks.md` before touching code. No exceptions.
- If you're wrong mid-task: stop, re-plan. Never compound a bad direction.
- Ambiguity threshold: if reverting a decision takes >30min (migrations, destructive ops, external side effects), surface it first. Otherwise proceed at 80% clarity and flag your assumption inline.
- Verification is part of the plan. A plan without a success criteria is incomplete.

## Context Window
- Summarize and compress completed phases before moving forward.
- Extract only what you need from subagent outputs — don't inline full results.
- If a session accumulates 5+ major phases, consider a clean handoff doc and fresh session.

## Subagents
- One task per subagent. Define input + expected output format before spawning.
- Parallelize independent tasks; don't serialize them.
- Conflicting outputs: resolve explicitly, log the tradeoff. Never silently pick one.
- Pass minimum context. Don't dump main context into every subagent.

## Tool & Command Failures
- Never retry blindly. Capture full error → form hypothesis → fix → retry once.
- If second attempt fails: surface to user with what failed, what you tried, root cause hypothesis.
- Never swallow a failure and continue as if it succeeded.
- Hanging process: set a timeout expectation before running. Kill and investigate; don't wait.

## Scope Discipline
- Out-of-scope improvements go to `tasks/improvements.md`. Do not implement them.
- Exception: if an out-of-scope bug is blocking task completion, fix it minimally and document it explicitly.
- Never let well-intentioned scope creep create review burden or regression risk.

## Self-Improvement Loop
- After any user correction: update `tasks/lessons.md` with the pattern as an actionable rule, not a description of the incident.
- At session start: scan `tasks/lessons.md` for keywords matching the current task type before planning. Not optional.
- Lesson format: `Context / Mistake / Rule`.

## Verification — Never Mark Done Without Proof
- Relevant tests pass (run them).
- No regressions in adjacent modules (check blast radius).
- Diff is minimal — no unrelated changes.
- Logs are clean at runtime.
- Would a staff engineer approve this? If no, fix it before presenting.
- No test suite: state this explicitly and describe manual verification.

## Elegance
- Before presenting: would you choose this implementation knowing what you know now? If no, do it right.
- Don't over-engineer simple fixes. Elegance = appropriate to the problem.
- If something feels hacky, it probably is. Investigate before shipping.

## Task Lifecycle
1. Write plan → `tasks/tasks.md`
2. Verify plan matches intent
3. Execute, mark items complete as you go
4. Run tests, review diff, check logs
5. Summarize changes at each phase
6. Log out-of-scope items → `tasks/improvements.md`
7. Capture lessons → `tasks/lessons.md`

## Core Rules
- Touch only what's necessary. Every extra line is a potential regression.
- No root cause shortcuts. Temporary fixes are future debt.
- Investigate before asking. The codebase, logs, and tests answer most questions.
- Never present speculation as fact. Flag uncertainty before answering.