Whether working in computer vision or natural language processing, data is king.
It’s better to train a bad model on good data than a good model on bad. By focusing on data quality, not only will your model perform better, but you’ll be able to reach your desired accuracy with fewer labeled examples.
You don’t need big data to build accurate models, but data quality is even more important with limited training data.
Recent advances in transfer learning mean you can do more with less: for example, ULMFiT can perform as well as previous state-of-the-art models with 1/10th or even 1/100th of the data. But when working with smaller datasets (hundreds to thousands of labels), bad data can negate these improvements.
You won’t know if your hypothesis is valid until you test it. This entails exploring the data, labeling a small number of examples, validating your hypothesis, and then scaling your labeling process.
Markers is a workflow tool and integrates with all stages of this process: from exploration in a Jupyter notebook, to scaling work to your team of label experts, to connecting live model predictions for human verification.
Explore and label your documents with the annotation tool. Work within your workflow: in a Jupyter notebook widget, or inside our application.
Supported task types:
Track your project over time. Catch errors early in the process and correct before your team’s spent hours incorrectly labeling. Easily detect label bias and take corrective measures.
Markers allows multiple labeling cycles per project. New cycles can be created with distinct rules (e.g. you don’t need to create a new project to add validation tasks for previous cycle labels).
Compare one worker’s labels against another’s. Or establish a gold standard (aka ground truth) and evaluate all workers against that.
In addition to the annotation widget, use the public API, Python client, or webhooks to fully integrate with your existing process.
All plans include all features (except SSO and on-prem). The difference is in capacity: larger plans allow for more data, more labels, and larger teams.
A data scientist seat is required to manage projects, upload documents, and use the Jupyter widget. Annotator seats can only use the web app to label documents.
Indie and team plans start with a 7-day free trial. Annual plans get 2 months free.
We’re launching soon (August 2019). Sign up for early access and launch-related news!