Conversational Interaction Specialist

New York, NY

Job Description

We are looking for a talented Conversational Interaction Specialist to deliver cutting-edge voice-driven interactions on Fauna’s robots. This role sits at the intersection of machine learning, human-computer interaction, and software engineering, requiring both deep technical expertise in speech technologies and a strong sense of user experience design. Building world-class speech interactions for robots demands both rapid prototyping and a mastery of state-of-the-art techniques.

As part of this role, you will work with automatic speech recognition (ASR), text-to-speech (TTS), and conversational AI systems to create seamless, expressive, and intelligent voice-driven interfaces on our robotic platform. The ideal candidate has experience working deeply with raw audio signals, speech models, prototyping interactive voice applications, and pushing the boundaries of conversational AI.

If you are passionate about voice as a primary mode of interaction, thrive in multidisciplinary teams, and want to shape the future of human-machine communication, we’d love to hear from you!

A portfolio of academic research or industry projects demonstrating work on voice-driven technology is highly encouraged.

Key Responsibilities

Design and build voice-driven experiences that enable natural, engaging, and intuitive human-machine interactions.
Prototype speech-first interactive experiences, integrating ASR, TTS, and conversation management systems.
Research and implement state-of-the-art voice and conversational AI techniques, working at the intersection of machine learning and human-computer interaction.
Collaborate with engineers, designers, and researchers to improve speech UX, including latency and accuracy.
Evaluate and improve the usability and performance of speech-driven systems through user testing and iterative development.

Required Skills & Qualifications

Work Experience: 4+ years of professional software development experience, or PhD-level research experience.
Education: Bachelor’s, Master’s, or PhD in Computer Science, Computational Linguistics, or a related field – or equivalent practical experience.
Technical Expertise:
- Expertise in developing and tuning ASR and TTS models in real-time applications.
- Ability to develop and characterize processing techniques for audio signals, such as noise reduction or echo cancellation.
- Deep understanding of design factors for conversational systems, including turn-taking, prosody, and intent recognition.
- Strong programming skills in Python, C++, or similar, with experience in signal processing and/or machine learning.

Nice-to-have Skills

Expertise in training end-to-end ML models for speech.
Familiarity with large language models (LLMs) for conversational AI.
Experience with robotics, ROS/ROS2, IoT, or other physical computing platforms (microcomputers, microcontrollers).
Experience conducting research evaluations with human participants.

What We Offer

The opportunity to work on groundbreaking robotics technology, enabling the next generation of humanoid robots to interact dynamically with their environments.
A collaborative and innovative environment that fosters creativity and exploration.
Equity ownership in the company
Health Benefits (Medical, Dental, and Vision)

Compensation

$100k - $200k/yr, plus equity