Loom workflow system

This transcript is generated by parley.nvim.

Question

Loom: create a subsystem that uses coding to manage durable projects

I realize claude, codex is pretty good way to manage long running project, with project’s internal state kept in md files in a folder, and todos, and time they should be triggered or due encoded in some format in that folder. I guess essentially open crawl? But 1/ repo based; 2/ markdown files to make state visible to human inspection and iteration; 3/ use stock coding agent’s capability for multi-step reasoning, externalize its mental states in md, seems to fit knowledge worker more; 4/ otherwise (also using coding agents) to construct scripts for steps that are highly deterministic.

In an analogy, Parley already showed the start of this pattern:

1/ parley chat, brainstorm tool, it keeps state globally in chat files, but more recent we allow move chat tree into other directories, and the whole point was to move chat to repo’s design/ directory as the output of brainstorming. btw, will improve this so parley chat can be even more straightforward to be used as project specific brainstorming tool.

2/ issue system, keep things related in issues/, then a hybrid of scripts (created by claude) and claude itself (during coding sessions) to manipulate the state to form a workflow.

3/ vision, the new roadmapping tool I’m tinkering, storing states (yaml) in vision/, then currently using scripts (generated by claude) to manipulate, and not free form text. I can’t pinpoint why I made this choice, but feel this area needs more structure.

Now:

4/ loom, to support more wide variety of tasks, but at its core, leveraging coding agents, to manage some workflows, keeping states in repo.

Those together, forms the new “operating system” of modern startup.

Thoughts?

🤖:[Claude-Code-Opus]

🧠: This is a fascinating architectural vision. The user is describing an evolution from chat → issues → roadmapping → general workflow orchestration, all repo-based with markdown/yaml state files, leveraging coding agents as the “runtime.” They’re asking for my thoughts on this “loom” concept and the broader vision of repo-based, human-inspectable, agent-managed workflow systems as a “startup OS.” Let me think deeply about this.

The key insight: coding agents (Claude, Codex) are already good at multi-step reasoning, file manipulation, and maintaining context. By externalizing state into markdown/yaml files in a repo, you get: version control for free, human inspectability, collaborative editing (human + AI), and the agent’s reasoning capabilities applied to workflow management rather than just code generation.

The name “loom” is evocative — weaving threads together into fabric. Let me think about what this actually means architecturally and where the real leverage is.

I should engage seriously with the architectural implications, identify what’s novel vs. what exists, point out potential challenges, and help sharpen the vision. This is a design conversation, not a “build me X” request.

This is a genuinely compelling vision. Let me engage with it structurally.

The Pattern You’re Identifying

What you’re describing is essentially the repo as a runtime environment, where:

State = files (md, yaml) in well-known directories
Runtime = coding agents (Claude, Codex) invoked on-demand or triggered
Transitions = agent reads state → reasons → mutates files → commits
Observability = git log, human-readable files, diffs
Collaboration = humans and agents edit the same files, review each other’s work

The progression is clear:

Layer	Tool	State Location	What It Manages
Brainstorming	parley chat	`design/`	Ideas, conversations, exploration
Planning	issues	`issues/`	Work items, dependencies, status
Strategy	vision	`vision/`	Roadmap, priorities, timelines
Orchestration	loom	`loom/`	Arbitrary durable workflows

What Makes Loom Different from 1–3

Parley, issues, and vision are each domain-specific — they have a fixed schema and a known set of state transitions. Loom is the generalization: it’s a workflow engine where the agent defines the schema, the transitions, and the triggers based on the task at hand.

This is the leap from “tool” to “substrate.” And it’s where things get both powerful and dangerous.

The Core Architectural Question

The critical design tension is: how much structure does loom impose vs. leave to the agent?

Too much structure → you’re reimplementing Jira/Asana in markdown, and the agent is just a fancy CLI. Too little structure → the agent creates bespoke state formats every time, and humans can’t navigate without asking the agent to explain its own files.

I think the sweet spot is something like:

loom/
  loom.yaml              # registry of active "threads" (projects/workflows)
  threads/
    quarterly-planning/
      thread.yaml         # metadata: status, created, triggers, due
      state.md            # human-readable current state
      log.md              # append-only reasoning log
      artifacts/          # any outputs (reports, drafts, etc.)
    hiring-pipeline/
      thread.yaml
      state.md
      log.md
      artifacts/

Where thread.yaml has a minimal common schema:

id: quarterly-planning
status: active          # active | paused | completed | blocked
created: 2026-04-01
last_touched: 2026-04-05
trigger:
  schedule: "weekly/monday"   # or: on_change, manual, event
  condition: "when vision/roadmap.yaml changes"
owner: human            # or: agent, collaborative
summary: "Synthesize team updates into quarterly plan"
depends_on: []

And state.md is freeform but agent-maintained — this is where the agent externalizes its “mental state” about the workflow. The key insight: the agent writes state.md for its future self (and for humans), the same way a good engineer writes design docs.

Why This Actually Works (and Why Now)

Three things converge:

Coding agents are file-native. They already think in terms of reading files, reasoning, writing files. You’re not fighting the grain — you’re aligning the workflow engine with the agent’s natural modality.
Git gives you durability + auditability for free. Every state transition is a commit. You can revert. You can branch. You can diff. No database needed.
Markdown is the universal interface. Humans read it, agents read it, GitHub renders it, grep finds it. It’s the lowest-common-denominator that actually works.

The “Operating System” Analogy

I think the analogy is apt but worth being precise about. What you’re building isn’t an OS in the process-scheduling sense. It’s more like a file-system-based workflow kernel, where:

Processes = threads (active workflows)
System calls = agent invocations (read state, reason, write state)
Scheduler = trigger system (cron-like, event-driven, manual)
IPC = file-based (one thread reads another thread’s state.md)
Shell = the human, who can cat, edit, or kick off any thread

The missing piece — and the hardest piece — is the scheduler/trigger layer. Right now you’re describing manual invocation: human says “hey Claude, go work on the quarterly-planning thread.” The real power comes when threads can:

Self-schedule (“check back on this Monday”)
React to events (“when a new issue is created in issues/, triage it”)
Chain (“when hiring-pipeline reaches ‘offer-sent’, trigger onboarding thread”)

This is where you’d need a lightweight daemon or cron job that scans loom/threads/*/thread.yaml for due triggers and invokes the agent. That’s the “init system” of your OS.

Honest Concerns

State corruption. When the agent rewrites state.md, it can lose information, hallucinate progress, or drift from reality. Mitigation: the append-only log.md creates an audit trail, and humans periodically review. But this requires discipline.

Context window limits. A long-running thread accumulates context. Eventually the agent can’t hold the full history. You’ll need a summarization/compaction strategy — maybe the agent periodically rewrites state.md as a condensed “current understanding” while log.md retains the full history (but is only consulted when needed).

Agent reliability for scheduling. Agents are good at reasoning but unreliable as schedulers. The trigger/cron layer should probably be a simple script, not agent-driven. Let dumb code handle “when,” let the agent handle “what.”

What I’d Build First

If I were you, I’d start with:

The thread schema (thread.yaml + state.md + log.md) — nail the minimal structure
A loom CLI or set of scripts — loom new <thread>, loom list, loom run <thread> (which invokes Claude with the thread’s context)
One real workflow — something you actually do weekly, like “synthesize team standup notes into a status update.” Prove the pattern works before generalizing.
The trigger layer — a cron job that scans for due threads and invokes loom run

Skip building a UI. The repo is the UI. That’s the whole point.

The Bigger Picture

What you’re describing — parley + issues + vision + loom — is essentially externalizing the startup’s cognitive processes into version-controlled, agent-manipulable files. The startup’s “brain” becomes inspectable, forkable, and collaborative between humans and agents.

This is philosophically closer to a Zettelkasten for organizations than to project management software. And the fact that it’s built on git means it inherits all of git’s collaboration primitives (branches, PRs, merges) for free.

The real question isn’t whether this pattern works — I think it clearly does for a certain class of knowledge worker. The question is whether it scales beyond the founder/power-user who set it up. Can a new engineer join and navigate loom/threads/ without a 30-minute walkthrough? That’s the test.

📝: You asked about “loom” — a generalized repo-based workflow system using coding agents to manage durable projects via markdown/yaml state files, and the broader vision of parley+issues+vision+loom as a startup OS. I answered with architectural analysis: the thread-based file structure (thread.yaml + state.md + log.md), the key design tension of imposed vs. emergent structure, why the pattern works now (agents are file-native, git gives durability, markdown is universal), the OS analogy mapped precisely (threads=processes, agent=syscalls, triggers=scheduler), honest concerns (state corruption, context limits, scheduling reliability), and a concrete build-first plan starting with schema + CLI + one real workflow + cron triggers.

Question

I’m less sure about totally free form state for agent.

In the end, or at least what I’m comfortable at the beginning, is really to repeat the other tools I created as I need them as part of my workflow, e.g. issues for coding complex tasks, and keep track of ideas I want to build; vision to align bigger scoped items. you can imagine in the future, some issues can be automatically promoted if it’s sizable, maybe just with a keystroke, or even automatically by agent (human prompt: find all issues that’s big enough), or codify that prompt into code.

so, the next type of tool I think I want, would be some repeating tasks, e.g. checking system health dashboard every morning when I sip coffee, or last week’s oncall report. We can use this as first type of things what loom should support.

I don’t know if some common pattern or building block would emerge, but I felt I have gone through this several times it’s worth thinking about generalization now.

The other aspects: I’m highly confident that this is going to be very useful for my own personal productivity, organizing thoughts and execution. If we can lift this into some operating system for new generation company is TBD; but probably good enough to sell to venture capital?

🤖:[Claude-Code-Opus]

🧠: The user is narrowing the scope from my overly general “freeform state” proposal to something more concrete and practical. They want loom to start with a specific use case: recurring/scheduled tasks (morning dashboard checks, weekly oncall reports). They’re also reflecting on the meta-pattern across parley, issues, vision, and loom — wondering if common building blocks emerge. And they’re being honest about scope: personal productivity first, company OS maybe later, VC pitch possibly. Let me engage with: 1) the concrete first use case (recurring tasks/routines), 2) the emerging common patterns across their tools, 3) honest take on the VC angle. I should not repeat the architecture I already laid out — they’re course-correcting me toward something more grounded.

You’re right to push back on freeform. Let me recalibrate.

The Correction

What you’re describing is closer to typed tools that share a common substrate than a generic workflow engine. Each tool (issues, vision, loom) has its own schema, its own semantics, its own interaction patterns. The generalization isn’t “anything goes” — it’s that they all follow the same construction pattern:

state in repo files + scripts/agent to manipulate state + human oversight

That’s the building block. Not the state format.

Loom as “Routines”

The first concrete loom use case — recurring tasks with agent execution — is actually a very clean, well-bounded problem:

# loom/routines/morning-health-check.yaml
name: Morning System Health Check
schedule: "weekday 8:00am"
prompt: |
  Check the following dashboards and summarize:
  - AWS CloudWatch: error rates, latency p99
  - PagerDuty: open incidents from last 12h
  - Datadog: database connection pool utilization
  Output a concise status to loom/routines/morning-health-check/latest.md
  Flag anything that needs human attention at the top.
sources:
  - url: https://app.datadoghq.com/dashboard/xxx
  - command: "aws cloudwatch get-metric-statistics ..."
history: loom/routines/morning-health-check/history/

And the output is just:

<!-- loom/routines/morning-health-check/latest.md -->
# 2026-04-05 Morning Health Check

### 🔴 Needs Attention
- Aurora replica lag spiked to 800ms at 3:22am, recovered at 4:01am
  - Worth checking if the nightly ETL job timing shifted

### 🟢 All Clear
- Error rates nominal (0.02%)
- No open PagerDuty incidents
- Connection pools at 34% utilization

You sip coffee, open latest.md in Neovim, done. Or it shows up as a parley chat you can interrogate further: “tell me more about that replica lag.”

The Common Pattern Across Your Tools

Now, stepping back — here’s what I see emerging:

Concern	Issues	Vision	Loom (Routines)
Schema	issue yaml (status, priority, tasks)	roadmap yaml (items, horizons, status)	routine yaml (schedule, prompt, sources)
State mutation	scripts + agent during coding	scripts + agent	scripts + agent on schedule
Human interaction	read/edit files, kick off agent	read/edit yaml, run commands	read output, maybe follow up
Trigger	manual (human decides to work on issue)	manual (planning sessions)	scheduled (this is new)

The scheduled trigger is the genuinely new capability loom introduces. Everything else — yaml state, agent manipulation, human-readable output — you’ve already built. The delta is small, which is a good sign. It means you’re not inventing a new paradigm; you’re extending one that works.

The Minimal Implementation

Honestly, loom v0 might be:

A loom/routines/ directory with yaml definitions
A cron job (or GitHub Action, or a simple systemd timer) that scans for due routines
For each due routine: invoke claude with the prompt, write output to latest.md, git commit
A loom list script that shows what’s active and when each last ran

That’s maybe 100 lines of bash/python. The agent does the hard part (reasoning about dashboards, summarizing oncall reports). Your code just handles the boring scheduling plumbing.

On the Emerging Abstraction

You’ve now built the same pattern three times (issues, vision, loom). The common bones:

A directory convention (issues/, vision/, loom/)
Typed YAML for structured state (machine-readable, agent-readable)
Markdown for narrative state (human-readable, agent-writable)
Scripts for state transitions (generated by agent, run by human or cron)
Agent as reasoning engine (reads state, applies judgment, writes state)

If you wanted to generalize, you’d extract something like a tool template — a way to scaffold a new tool with its directory, schema, scripts, and agent prompts. But I’d resist doing that until you have 4-5 tools and the pattern is so obvious it’s annoying not to have it. You’re at 3-4 now. Maybe after loom ships, the abstraction crystallizes.

On the VC Angle

Honest take: the personal productivity story is strong but not fundable on its own. VCs want to see a wedge into a market, not a power user’s bespoke setup.

But the reframe that could work: “What if every startup’s operational state was version-controlled, agent-manipulable, and human-readable?” That’s a real pitch. It’s the anti-Notion, anti-Jira argument: those tools are databases with UIs that agents can’t reason over. Your approach is files that both humans and agents natively understand.

The challenge is the same one you identified: does it scale beyond the power user? The honest answer today is no. But the honest answer for most dev tools at seed stage is “the founder uses it and it’s transformative for them.” That’s enough for a certain class of investor, especially if you can show the pattern repeating (which you can — four tools, same substrate, compounding value).

The stronger VC angle might actually be: you’re not selling loom, you’re selling the pattern. The “repo-native operational OS” as a framework, with loom/issues/vision/parley as reference implementations. Think Rails, not Basecamp — though Basecamp funded Rails just fine.

What I’d Do Next

Build loom/routines/ for yourself this week. One routine: morning health check. Keep it dead simple. Once you’re using it daily and it’s saving you 15 minutes, you’ll know what the second routine should be. The tool will tell you what it wants to become.

📝: You asked about narrowing loom’s scope to recurring/scheduled tasks as first use case and whether common patterns emerge across your tools, I answered with a concrete routine schema, identified the common substrate pattern (typed yaml + markdown + scripts + agent), noted that scheduled triggers are the genuinely new capability loom adds, and gave honest assessment on VC angle (pattern is the product, not individual tools).

Question

interestingly, I did have basecamp in mind, but of their other aspect of how they define company culture and how they work, less so of Rails.

🤖:[Claude-Code-Opus]

🧠: The user is clarifying that their Basecamp reference is about Basecamp’s organizational philosophy — Shape Up, calm company, async-first, small teams with autonomy, “how we work” as a product — not about extracting a framework (Rails) from a product. This is a meaningful distinction. They’re thinking about the operating system for a company angle more literally: how a company runs, makes decisions, communicates, and coordinates. Basecamp sold their way of working (books like Rework, Shape Up, It Doesn’t Have to Be Crazy at Work) as much as they sold the software. The user sees a parallel: the repo-native toolchain isn’t just tooling, it’s a philosophy of how a modern startup should operate. I should engage with this distinction seriously and not just repeat what I already said.

That’s a much more interesting reference point.

Basecamp as “Company Philosophy, Productized”

Basecamp’s real innovation wasn’t the project management tool or Rails — it was codifying an opinionated way of working (Shape Up, six-week cycles, small teams, no backlogs, calm company) and then building software that enforced those opinions. The tool and the philosophy were inseparable. You couldn’t use Basecamp “wrong” because the constraints were baked in.

What you’re doing rhymes with that. The toolchain encodes a way of working:

State lives in the repo, not in someone’s head or a SaaS database
Agents are collaborators, not automation bots
Everything is inspectable, diffable, reviewable
Async by default — you write state, agent processes it later (or vice versa)
Small, composable tools over monolithic platforms

That’s not just a technical choice. It’s an organizational philosophy: the company’s operational state should have the same properties as its code — version-controlled, peer-reviewed, transparent, and forkable.

Where This Gets Interesting

Basecamp wrote Shape Up after they’d been working that way for years. The methodology emerged from practice, then got codified, then got evangelized. You’re on a similar trajectory — you’re in the “practicing it and discovering what works” phase.

The question is whether the philosophy that emerges is prescriptive enough to be compelling. Basecamp succeeded because their opinions were strong: no backlogs, period. Six-week cycles, period. If your philosophy is “use files in a repo,” that’s too weak. If it’s something like:

“Your company’s decisions, priorities, and operational state should be committed to git, manipulable by agents, and never locked in a tool you don’t control”

…that’s a stance people can rally around or argue against. Both are good.

The Honest Gap

Basecamp’s philosophy was legible to non-technical people — PMs, designers, founders who don’t code. The repo-native approach currently requires comfort with git, yaml, terminal, and agents. That’s a smaller audience.

But maybe that’s fine. Basecamp targeted “small teams who reject enterprise bloat.” You’d be targeting “technical founders and engineering-led teams who want their operational layer to be as hackable as their codebase.” That’s a real audience. It’s just not everyone — and it doesn’t need to be.

📝: You clarified that the Basecamp inspiration is about their company philosophy/way-of-working (Shape Up, calm company) rather than the Rails-from-product extraction pattern. I engaged with how your toolchain similarly encodes an opinionated organizational philosophy — operational state as code, agent-native, inspectable — and noted the gap is legibility to non-technical users, but that a technical-founder audience is viable.