Insights
- Good overall taxonomy coverage on Data Operations roles in Israel (92%)
- Taxonomy titles are usually not adequate standalone labels to describe listings:
- 77% of jobs had multiple labels
- Multiple labels were often used to describe variations in skills and specialty requirements detected in the job description.
- The taxonomy is possibly too granular:
- 42% of total labels used a
parent_title
, indicating parent nodes are often better suited labels.
- Listings’ titles as advertised by employers are mostly relevant, but there is still some progress to be made in this domain:
- 66% exact or partial raw title match overlapped with true taxonomy-based labels.
<aside>
đź’ˇ
The evaluation set is very small (27 listings, with labels covering only 22% of total canonical titles) and insights should be further reseached on wider datasets.
</aside>
Dataset details
- 28 listings in total (1/28 is not valid)
- Random collection over a year period (bet. 02.2024 & 03.2025)
- Source: Data Quality and Data Operations online job listings in Israel (or Israel remote)
- Labels: taxonomy canonical titles, when applicable
- Labeling methodology: labelled by 1 annotator based on guidelines, then sent to Gemini for critique and revised accordingly
Full Statistics
- Taxonomy coverage on dataset:
- 25 listings labeled only with taxonomy titles / 27 listings in total = ~92%
- Total missing taxonomy entries:
- missing canonical titles: 3
- missing aliases: 8
- Jobs mistakenly included in dataset (not Data Operations): 1/28
- Dataset labels coverage of the taxonomy: ~22%
- 24 unique v1 canonical title labels were used
- Percent of jobs labelled with multiple labels:
- Percent of exact or partial raw title match overlaps with true taxonomy labels:
- Percent of labels that were a parent title (mapped to children titles):
Detailed Analysis
- Suggested areas for job listing titles improvement:
- vague titles (Data Manager, Data Expert, Data Specialist)
- “Data Manager” at Revuze was labeled as Quality Data Analyst | ML Data Analyst
- “Data Manager” at CytoReason was labeled as Clinical Data Analyst | Data Curation Specialist
- ambiguous titles that are used outside of Data Operations (Senior Data Analyst)
- generic titles where responsiblity range and seniority level vary greatly from one company to another (Data Operations Specialist, Data Quality Specialist)
- Examples of precise listing titles (raw title to taxonomy match):
- Localization Project Manager
- Data Annotation Specialist
- NLP Data Analyst