What We Learned Managing 140 Leads With an AI Agent

TL;DR

We ran an AI SDR in production across 140 leads, two languages, and two channels. 82 qualified, 15 with meetings on the calendar
The agent's drafts were rejected 40 times. One message took 10 rounds of revision. Voice quality is earned through feedback, not configured through prompts
Of 38 handoffs to the human manager, 23 were the agent recognizing a prospect went quiet. 10 were because the manager was too slow to review. The agent had better discipline than the human
Over 40% of agentic AI projects will be canceled by 2027 due to escalating costs. The difference between surviving and not is operational depth

The Numbers

These are real production numbers from our own system, not projections.

140 leads under active management
82 qualified (59%), 49 in progress, 7 disqualified
15 meetings on the calendar
680 outbound messages sent across LinkedIn and email
478 inbound messages received - prospects actually replied
40 rejected drafts, with revisions reaching round 10 on individual messages
Two languages running simultaneously (English and Hebrew)
Three continents - North America, Europe, and Israel This is a small, closely monitored operation. One human manager overseeing one AI agent. And that's exactly what made the patterns visible.

Lesson 1: Voice Is Earned, Not Configured

Going in, the assumption was that voice calibration would take a few rounds of feedback and then stabilize. Reality: about 1 in 12 drafts got sent back.

Some rejections were obvious. Tone too formal. Wrong angle for the prospect's industry. Missing context from an earlier exchange. But the harder ones were subtle. The agent would write a message that was technically correct and reasonably personalized, but it didn't sound like the person it was supposed to represent. It sounded like a competent stranger.

73% of B2B buyers actively avoid suppliers who send irrelevant outreach. And "irrelevant" doesn't just mean wrong topic. It means wrong voice. A message that's clearly AI-generated from someone you know personally is worse than silence.

The fix wasn't a better prompt. It was a feedback loop. Every rejection carried a reason. The agent learned from the correction and redrafted. One lead required ten rounds before the message was right. Two drafts got caught fabricating experience the manager never had - the system flagged and rewrote them before they went out.

That's not a failure. That's the system working as designed. Voice is earned, not configured.

Lesson 2: The Agent Had Better Discipline Than the Manager

Here's the number that surprised us most. Of 38 times the agent handed control back to a human, look at the breakdown:

23 times - the prospect went quiet after 3 follow-ups. The agent recognized the silence, stopped reaching out, and flagged it. No desperate fourth message. No "just circling back" email that annoys everyone.
10 times - the manager was too slow to review a draft. The agent had a message ready, but the human didn't approve it within the SLA window. The bottleneck was the person, not the machine.
4 times - the manager looked at the lead and decided it wasn't worth pursuing.
1 time - manual removal.

Read that again. The majority of handoffs weren't the agent failing. They were the agent showing restraint the human didn't always match.

48% of salespeople never follow up even once. 80% of deals close after the 5th touch. The agent followed up every time, on schedule, for every lead. And when the follow-ups didn't land, it knew when to stop. Most humans either don't follow up at all or don't know when to quit.

The 10 review timeouts tell a different story. The agent was ready. The manager had client work, meetings, other priorities. The draft sat there. This is exactly the problem the system is supposed to solve - and even with the system in place, the human was still the constraint on 26% of handoffs.

Lesson 3: Qualification Is a Conversation, Not a Score

Most sales tools assign a lead score based on firmographic data: company size, industry, title, funding stage. You get a number, and you decide whether to pursue.

We started there. It wasn't enough.

Progressive qualification across multiple conversation turns surfaced information that no database could provide. A CEO at a 50-person firm who "handles sales himself" is a different prospect than a CEO at the same firm who "has two SDRs but they can't keep up". Both score the same on paper. Only the conversation reveals the gap.

Of 140 leads, 82 qualified through this approach - 59%. Another 49 are still in progress, with qualification evolving as the conversation continues. Only 7 were disqualified outright.

That ratio isn't because we loaded the pipeline with perfect leads. It's because multi-turn qualification finds angles that single-pass scoring misses. A lead that looks marginal on paper might reveal a strong buying signal three messages in. A lead that looks perfect might reveal they're already locked into a competitor during the second exchange.

77 of the 134 leads we contacted replied - a 57% reply rate. Each reply made the qualification sharper.

Lesson 4: Two Languages, Three Continents

Operating in English and Hebrew across North America, Europe, and Israel wasn't a translation challenge. It was a communication-style challenge.

Hebrew business communication is more direct, less padded with pleasantries, and follows different norms around formality. A message that reads as professional in English can read as stiff in Hebrew. A message that's appropriately casual in Hebrew can read as unprofessional in English.

The agent needed separate voice models, not just separate vocabularies. McKinsey's analysis of agentic AI deployments found that efforts focused on fundamentally reimagining workflows deliver the strongest results. Language-specific communication is a workflow issue, not a feature toggle.

25 of our 140 leads operated in the Israel market. Without dedicated attention to how communication norms differ, those 25 leads would have received messages that felt off - even if they were technically accurate.

And this isn't unique to Hebrew. Any business selling across markets faces the same gap. The vocabulary translates. The communication style doesn't. That difference is where leads go cold without anyone understanding why.

Lesson 5: The Service Layer Is Harder Than the Agent

Building the AI model that drafts messages was the easy part. Building everything around it took months.

Enrichment from LinkedIn profiles, company websites, web searches, and existing conversation history. Follow-up cadence management with rules for timing, channel selection, and when to hand off. Calendar integration with timezone handling across three continents. State management for every lead across every stage of the pipeline. 139 active interactions tracked across LinkedIn and email simultaneously.

Nearly eight in ten organizations report no significant bottom-line gains from their AI investments, largely because they automate fragments rather than rethinking the workflow end to end. Our experience confirms this. The agent that drafts messages is maybe 20% of the system. The other 80% is data operations, qualification logic, compliance checks, and the human-in-the-loop approval workflow that keeps everything on track.

This is why Gartner predicts over 40% of agentic AI projects will be canceled by end of 2027, citing escalating costs and unclear value. The costs aren't in the model. They're in the operational infrastructure that makes the model useful.

What We'd Do Differently

Start with edge cases, not happy paths. The first week of testing should involve the hardest scenarios: rescheduling, opt-outs, multi-language conversations, leads who go silent and come back. If the system handles those, the normal flow takes care of itself.

Invest in rejection quality. A "rejected" draft with no feedback is a wasted signal. Every rejection should carry a reason. That reason becomes training data - not for the model, but for the operational rules that guide the model.

Treat voice calibration as ongoing. Even after 680 approved messages, the voice still gets refined. New prospect types, new industries, new objection patterns. Voice quality isn't a milestone. It's a practice.

Plan for the human bottleneck. 10 of our 38 handoffs were caused by the manager being too slow. Build your SLAs around realistic human availability, not ideal scenarios. The agent will be ready. Make sure the human can keep up.