Every major AI lab — OpenAI, Anthropic, Google, Meta — depends on tens of thousands of contractors to make their models work well. The work is real, the pay is real, and the entry barrier is lower than you'd think. Here's what's actually going on.
Who pays for this work
AI labs don't usually hire annotators directly. They contract with vendor platforms — Scale AI, Surge AI, Mercor, Invisible, and a long tail of smaller players — who recruit, vet, and pay the contributors. You apply to the platform, not the lab.
The lab buys hours, evaluations, or completed tasks from the platform. The platform pays you a per-hour or per-task rate, taking a cut. Pay flow is typically weekly or bi-weekly via PayPal.
What you actually do
Tasks fall into a few buckets: rating model outputs against a rubric, writing reference responses for the model to learn from, comparing two model outputs and picking the better one, and adversarially probing for failure modes.
For coding work you might write a Python script and have it graded against a model attempt. For medical work you might evaluate a differential diagnosis. For translation you might rate fluency. The shape varies; the rhythm is similar.
How much it pays
General tasks (text rating, simple labeling) pay $15–25/hr in most markets. Specialist work scales aggressively: senior engineers, MDs, lawyers, and PhDs routinely clear $50–100/hr. Niche language pairs and rare expertise can go higher.
Where to start
Start at DataAnnotation.tech if you want a low-friction trial. Apply to Outlier and Surge AI in parallel; both are larger and pay more for verified expertise. If you have credentials (MD, JD, PhD), apply to Mercor — they place specialists at the top of the market.