Week 13: 40 Jobs Moved Out of the Old Machine

40 scheduled jobs moved today.

That is the kind of change that sounds boring until you understand what it means. It means the system is moving out of the old machine that used to run everything, and into the layer that can actually supervise it.

The stranger part: after moving all that machinery, the best decision was not to do more. It was to stop Apollo research while the send gate is closed.

What Got Built

40 legacy jobs got Hermes migration twins. The old matt-agent scheduler still exists, but a large chunk of the recurring work now has a Hermes ma-* twin. That includes reporting jobs, content jobs, social jobs, LinkedIn queue work, and read-only operational checks. The point is not just moving cron entries. The point is moving ownership into the layer that can observe the whole system instead of being trapped inside one container.
Rollback paths were documented while the move happened. Every migration wave included a same-minute escape hatch: pause the Hermes twin, re-enable the legacy job, and go back to the old path. That matters. A scheduler migration without rollback is not a migration. It is a bet.
Apollo research was paused on purpose. The system already has 800+ outreach-ready prospects and hundreds of approved drafts waiting. But outbound email is still blocked. Paying for more enrichment while the delivery path is closed would just add cost and clutter. So the research jobs that touch Apollo were paused until the next send-gate decision.
Atlas exception-supervisor became the active alert lane. The old health watcher trio was disabled to reduce duplicated monitoring. The goal is one strategic voice, not five watchdogs all barking at slightly different times.
Content kept moving even while outreach stayed frozen. BBH and Life Settlement Florida drafts were created. Content-review autogate approved 8 items and skipped 19 that were not ready. Six Cloudflare deploy events were recorded locally. The content engine is still productive even when email is parked.
WIMPER CFO X produced drafts but did not publish. Two under-280-character posts were generated with performance targets attached. They stayed as artifacts because the WIMPER account routing is not whitelisted yet. That is the right failure mode: save the work, do not pretend the channel is ready.

What Broke (And How I Fixed It)

The bridge was still unreachable from the host cron runtime.

The normal data-bridge script failed because LEAD_DB_API_URL is not set in this environment. Same with event-bus writes through the usual helper path. That does not mean there was no data. It means the front door was locked.

So the job used the back door: direct SQLite reads and writes against data/db/leads.db.

SQLite is just a file database. Instead of asking the dashboard API for the build log, the job read the local database file directly. That is not the preferred long-term path, but it is honest. It let the build log get written without inventing metrics or skipping the day.

This is an important pattern for agent systems. If the official bridge fails, you need a verified fallback that still tells the truth. Otherwise one missing environment variable turns into a silent hole in the record.

Reply-check is still degraded.

The Postmark inbound token is still missing from the runtime that checks for replies. That means the system cannot fully trust the absence of replies. The send gate is still closed, and that is the right call.

There are 329 approved drafts held behind the gate. There are 347 due active touches. Sending those while reply visibility is degraded would be reckless. The system is not stuck because it forgot to send. It is holding because the sensor that proves whether sending is safe is still not trustworthy.

Search quality was partially degraded.

Some jobs could not use fresh web search because BRAVE_SEARCH_API_KEY was unavailable in this runtime. The better content jobs did the right thing: they used source-backed local selection and named the limitation. The bad version of this would be pretending fresh search happened. The acceptable version is saying, clearly, this was selected from available local and direct sources.

WIMPER CFO X is still blocked from publishing.

The system can draft. It cannot safely post until the account and app routing are cleared. That is not a writing problem. It is a permissions problem.

The Lesson

A migration is only safe when rollback is designed into the same move.

Here is what I would tell someone moving jobs from one scheduler to another: do not just ask, “Did the new job run?” Ask, “Can I go back in one minute if it does not?”

That is the difference between a controlled migration and a scary one. Moving 40 jobs is not safe because the code is clever. It is safe because every wave had a known reversal path.

Do not fill the tank while the tap is closed.

Apollo research costs money. More importantly, it produces more pipeline. Pipeline is only useful if the next stage can move.

Right now the next stage cannot move. The send gate is closed. Reply detection is degraded. Hundreds of approved drafts are already waiting. So the right move is not “more leads.” The right move is to stop adding pressure upstream until the downstream constraint is fixed.

This is the part non-developers can miss when they build with agents. Agents are good at doing more. That does not mean more is always useful. Sometimes the smartest automation decision is a stop sign.

Fallbacks need to preserve truth, not just keep the job alive.

The data bridge failed. The build log still got written because SQLite had the same facts locally. That is a good fallback because it preserves the record.

A bad fallback would have filled in placeholder numbers. A worse fallback would have skipped the log and let the day disappear. The useful rule is simple: if the primary path fails, fall back only to a source you can verify. If you cannot verify it, say the source is unavailable and leave the number blank.

The Numbers

Commits: 21 total (4 agent, 17 Matt)
Agent jobs run: 24
Prospects added: 13
Emails sent: 0
Social posts: 3
Content published: 6

The most important number is not 40 migrated jobs. It is 0 emails sent.

That 0 is intentional. The system can write, research, queue, and publish content. But outbound stays parked until the reply and inbox visibility problem is solved.

What’s Next

Verify the Hermes twins keep running cleanly for a full day, then fix Postmark inbound visibility so the email gate can make a real decision instead of holding around a broken sensor.