← Back to the log

568 Sends. 0 Replies. I Put My Outbound on Hold.

568 outbound emails sent. 30 days. 0 replies logged.

That doesn’t mean nobody replied. It means the system tracked sends but not responses. And those are two completely different things.

Today I stopped the pipeline and enforced a send gate.

What Got Built

  • Send gate on all outbound. 312 sequence touches were due to fire today. They didn’t. 20 approved drafts sat in the queue and didn’t move. The gate doesn’t care how full the queue is. Nothing sends until a real inbox placement test passes.

  • Runtime guard on the @casualabsurdity post script. The agent had em-dash rules in its system prompt and still published a live post with em dashes and wrong day numbers. The fix was a validation script that runs before every post: strips banned characters, reads the day counter from the data bridge (not from the model’s memory), confirms account auth is live. If any check fails, the post stops. PR #3479.

  • X reply agent switched to approval-only. The social-listener agent was posting replies directly to X. Matt flagged tone and formatting issues in live replies. Now every reply routes to a draft inbox first. Nothing posts automatically.

  • 40 prospects advanced to outreach-ready. The prospect researcher moved 40 qualified contacts up the pipeline even without the Apollo API key available in the cron environment. The queue is full. The send gate is holding it.

  • Revenue radar scan: 25 WIMPER-to-Instabrain cross-sell candidates identified. Atlas ran a daily portfolio scan and flagged 25 prospects who have engaged with WIMPER content and might have adjacent need for automation consulting. Documented in plans/revenue-radar.md.

What Broke (And How I Fixed It)

@casualabsurdity posted em dashes on a live post. The rule existed. The rule didn’t matter.

The agent had a clear instruction in its system prompt: no em dashes. The same instruction was in memory. The agent published em dashes anyway, and the day counter was stale (it said Day 2 when it should have said Day 6).

This is a category of failure that’s easy to miss when you’re building agents. The rule was correct. The enforcement mechanism didn’t exist.

At post time, a language model recalls instructions imperfectly. Memory drifts. Edge cases slip through. If you need a character stripped, you need a script that strips it – not a prompt that says “don’t use it.”

The fix was a pre-post shell script: it reads the composed post, scans for em dashes and double-hyphens, corrects the day counter from a state key in the data bridge (a small database, think of it as the agent’s shared notepad), and confirms the X account auth is valid. The post only goes out if all three checks pass. Day numbers are now deterministic. Em dashes are now impossible.

PR #3479 dry run confirmed: Experiment 001 reads Day 6, Experiment 002 reads Day 5, em dash scan clean.

PR #3438 stuck for 3.5 hours with all checks passing.

A pull request sat open for three and a half hours with all checks green and no auto-merge execution. Nothing escalated. Atlas eventually caught it and merged manually after pipeline-health flagged the delay.

Low severity. Fully resolved. But there’s a gap between when pipeline-health notices a stuck PR and when the retry-merge workflow actually acts on it. Worth monitoring.

Content-review blocked for a second consecutive day. Still not fixed.

The strategic orchestrator ran its content-review job again today. Hit the same missing database endpoint it hit yesterday. Same failure. Same log entry. No Telegram ping.

The content queue has drafts in it. Agents can see the count. But the part of the system that lets agents read individual records from the queue was never built. So content-review files an accurate report of the blockage and stops. That report sits on the event bus. Nobody gets notified.

This is the one unresolved infrastructure gap from this week. The fix is a read endpoint on the dashboard API bridge. It’s queued. Until it ships, the job keeps running and failing in exactly the same way.

The Lesson

Prompts define taste. Guards enforce correctness.

If a post can have an em dash in it, and you don’t want em dashes, you need something that removes em dashes. Telling a model “don’t do this” is a preference expressed in language. A shell script that strips a character is a constraint expressed in code. Those two things are not equivalent.

Here’s what I’d tell someone building autonomous social agents: for any defect you can check deterministically, write the check. Don’t rely on the model to remember the rule in every session, under every condition, with every prompt variation. The model is the writer. The guard is the editor. Both are required. You can’t skip the editor and then be surprised when copy goes out wrong.

A send gate that tracks outbound but not replies is logging half a conversation.

568 sends with 0 reply tracking isn’t data. It’s a one-sided ledger. The system knew it was sending. It didn’t know whether any of those sends resulted in a real inbox placement, an open, or a reply. It was measuring activity, not outcomes.

When I realized the reply counter was empty, I didn’t know if nobody had replied or if reply tracking was broken. Either answer is bad. The gate went up until I can run a real placement test and confirm whether the sends are landing.

Here’s what I’d tell someone doing cold outreach with autonomous agents: track replies with the same rigor you track sends. If your reply count is zero after 30 days, that’s the first thing you investigate. Not a signal that the strategy isn’t working – a signal that your measurement is incomplete.

The Numbers

  • Commits: 100 total (100 agent, 0 Matt)
  • Agent jobs run: 65
  • Prospects added: 11
  • Emails sent: 0
  • Social posts: 2
  • Content published: 2

100 commits. None from me. The system built all day without a single manual code change.

The number I’m focused on isn’t the commits. It’s the 0 in the emails-sent row. Not because sending is good – it isn’t good when you don’t know the sends are landing. It’s because the decision to put the gate up was deliberate, and keeping it there until the deliverability test passes is the right call.

The queue has 40 outreach-ready prospects and 312 due sequence touches waiting. When the gate clears, the pipeline is ready to move.

What’s Next

Run a deliverability test. Send a small tracked batch through an inbox placement tool and confirm the sends are landing before the gate opens. Then add the content-queue read endpoint to the dashboard bridge so content-review can do its job.

Back to the timeline.