This position is offered through the ANU Computing Internship courses (COMP4820 / COMP8830).
Semester 2, 2026 applications open on Monday 18th May 2026 and close on Sunday 31st May 2026.
Company
Clairva.ai is a Singapore-incorporated startup building licensed video and audio data infrastructure for multimodal AI training. We source content from global media libraries and commission specialised video data collection, then produce annotated datasets that meet Tier-1 ingestion standards for foundation-model labs and dataset platforms. We prioritise rights-cleared provenance and high-quality annotation pipelines.
Project
Dataset provenance and consent tracking is a frontier topic in AI data governance. MLCommons released Croissant 1.1 in February 2026. The new version adds machine-actionable provenance and structured consent and licensing policy fields. About 700,000 datasets on HuggingFace, Kaggle, and OpenML now use it.
In this project students will first review the Croissant 1.1 standard and then build a small reference implementation of license provenance for video datasets. Then the student will write a structured analysis of consent workflows under EU AI Act expectations.
- The student will design a Croissant 1.1 compatible JSON Schema for license provenance.
- They will build a Python tool that produces a structured audit report from a manifest.
- They will write 20 synthetic test fixtures. Ten fixtures are compliant. Ten have deliberate gaps.
- They will write a consent-workflow specification template, flowchart, and checklist.
- The final output is a written research report and a 30-minute presentation.
All work uses synthetic example contracts and consent forms. No real customer or licensor data is used.
Required technical skills
Required skills are Python 3.10+, JSON Schema 2020-12, and Git. The student should be comfortable generating PDF reports from Python.
Examples are reportlab and weasyprint. Preferred skills are prior coursework in software engineering or data engineering.
Familiarity with schema.org or JSON-LD is a plus. Awareness of GDPR, EU AI Act, or data-governance terminology also helps.
Required/preferred professional and other skills
Required skills are attention to compliance detail and plain-English technical writing. Careful schema design is needed. The student should be able to read regulatory and standards documents.
Preferred skills are prior experience with privacy or compliance projects. The ability to design clear flowcharts and templates for non-engineering users is also useful.
Delivery Mode
Hybrid
Student location
Project’s Special Requirements/ Conditions
None
Type of internship
Unpaid
How to apply
Applications are invited from eligible students to apply for the Computing Internship courses COMP4820 or COMP8830. Eligibility details of COMP4820 / COMP8830 and further information about the Computing Internship can be found on the Computing Internship page.
Eligible students can apply through the Computing Internship application form which will be available via the Computing Internship page between Monday 18th May 2026 and close on Sunday 31st May 2026.
You can nominate multiple preferred Internship projects and host organisations through the one online application form.
Eligibility and room available in degree to undertake COMP4820/COMP8830 will be assessed at the time of application. If you do not meet the eligibility criteria or do not have room in your degree to fit COMP4820/COMP8830, your application will not be progressed.
Your application will require you to upload the following documents:
- an updated copy of your resume, and
- an expression of interest (limit 350 words) for each organisation you wish to apply to (organisations with multiple projects may only submit one expression of interest, so state clearly which project/s you wish to be considered for).