Skip to main content
The Start node is the entry point of every workflow. It defines the foundational behavior of the agent — including instructions, default models, runtime inputs, and timing rules — and provides the four connectors required to begin building the call flow. The Start node is automatically created when a new workflow is made and cannot be deleted.

Connectors

The Start node exposes four required connectors, each representing one part of the conversational stack:
  • Hear — Select the transcriber model
  • Think — Select the chat model
  • Speak — Select the voice model
  • Agent / Path — The first node in the workflow’s logic
All four connectors must be configured for the workflow to run.
Agent nodes inherit the Start node’s Hear/Think/Speak settings unless explicitly overridden.

Main Settings

Opening the Start node shows the configuration panel where you set the defaults for the entire workflow.

External Inputs

External Inputs allow you to pass runtime parameters into the workflow. These are commonly used for:
  • CRM or backend identifiers (customer_id, phone_number)
  • Preloaded context (appointment_time, plan_type)
  • Dynamic variables needed by later nodes
Each input includes:
  • Parameter Name
    A variable name (e.g., customer_name, case_id).
  • Input Type. Allowed types:
    • String
    • Number
    • Integer
    • Boolean
  • Required
    When enabled, the workflow cannot run unless the parameter is provided.
    For outbound calls:
    • If triggered inside Breez → must be supplied when Testing or Calling
    • For API-triggered calls → must be included in the request payload.
    Inbound calls cannot run if there are any mandatory external inputs defined here.
Users may add multiple inputs.

Master Instruction

The Master Instruction is the top-level system directive that governs the agent’s behavior across the entire workflow. Important:
This instruction is automatically appended to the end of every Agent node’s instruction, regardless of the Agent’s individual settings.
This ensures:
  • Consistent personality
  • Shared rules across the call
  • A unified global prompt
The editor supports markdown-style formatting.
The bundled default instruction that exists upon loading a new workflow is simply a template — users can modify it freely.

Fast Turn Delay (Semantic/Provider)

Fast Turn Delay determines how quickly the agent responds when smart, semantic turn detection is available.
In this mode, the agent can tell whether the user is mid-sentence or actually finished, so it can react after very short pauses.
Lower values (e.g., 0.05s)
  • Responds almost immediately
  • Fast, snappy interactions
  • Risk: may interrupt if the user pauses briefly mid-sentence
Higher values (e.g., 0.2–2s)
  • Waits a bit longer before replying
  • Feels more natural and less interruptive
  • Trade-off: responses start slightly slower

Slow Turn Delay (VAD Only)

Slow Turn Delay is used when only basic Voice Activity Detection (VAD) is available.
VAD can detect sound vs. silence but cannot interpret meaning, so the agent needs a longer pause to avoid interrupting the user.
Lower values (e.g., 0.1s)
  • Responds quickly once silence is detected
  • Suitable for fast exchanges
  • Risk: may cut in when users pause to think or breathe
Higher values (e.g., 0.8–3s)
  • Gives users more room to pause mid-sentence
  • Feels smoother on calls or with slower speech
  • Trade-off: responses feel slower

Key Difference

  • Fast Turn Delay → Used when the agent can understand meaning and determine if the user is finished.
  • Slow Turn Delay → Used when the agent only knows that the user stopped making sound.

Advanced Timing (Optional)

Enabling Show Advanced Timing reveals fine-grained control of turn-taking behavior, interruptions, and silence handling. These settings allow you to tune how natural, fast, or cautious the agent feels in conversation. Below is a reference for each setting.

Maximum Semantic Delay

Sets the maximum time the agent will wait before responding when the semantic turn detector is unsure whether the caller has finished speaking. Most interactions are handled confidently by the turn detector. However, in ambiguous moments - such as trailing speech, mumbling, or background noise - the detector may hesitate. Max Semantic Delay acts as a safety cap for these uncertain cases.
  • Higher values: the agent waits longer, reducing the chance of interrupting the caller but increasing response latency.
  • Lower values: the agent responds faster, but with a higher risk of speaking over the caller.
In short: it’s the “when in doubt, wait no more than this long” timer.

Minimum Interruption Duration

Defines how long (in seconds) the caller must speak (continuously) before the agent accepts it as a valid interruption and stops talking - also known as a barge-in. This prevents the agent from pausing its response due to tiny noises or accidental sounds.
  • Higher values: the caller must speak for longer before interrupting the agent.
  • Lower values: even very short utterances can interrupt the agent.
In short: it filters out brief noises so only real attempts to speak interrupt the agent.

Minimum Interruption Words

Specifies how many words the caller must speak before the agent treats it as a valid interruption (barge-in). This works alongside Minimum Interruption Duration to ensure the agent only stops talking when the caller clearly intends to speak — not due to background noise or accidental sounds.
  • Higher values: the caller must say more words before interrupting.
  • Lower values: even very short phrases (e.g., “wait”) can interrupt the agent.
In short: it prevents the agent from stopping mid-sentence unless it hears a meaningful amount of speech.

VAD Activation Threshold

Controls how sensitive the local voice-activity detector (VAD) is when deciding whether the caller is actually speaking. VAD listens to the audio signal and tries to distinguish real speech from background noise. This setting adjusts how strong the speech signal must be before VAD treats it as intentional speech.
  • Lower values (more sensitive):
    The system treats quieter or softer sounds as speech. Useful in quiet environments, but may trigger false positives in noisy ones.
  • Higher values (less sensitive):
    The caller must speak more clearly or with stronger volume for VAD to recognize it. Helps reduce accidental triggers in noisy conditions.
In short: it defines how “loud” or “clear” speech must be before the system decides the caller is talking.

VAD Minimum Speech Duration

Defines how long the caller must speak continuously before VAD treats the sound as real speech rather than background noise. This helps the system avoid reacting to coughs, mic pops, keyboard clicks, or other brief sounds.
  • Higher values: the caller must speak for longer before being recognized as actively talking.
  • Lower values: shorter bursts of speech are accepted, making the agent more responsive but more prone to accidental triggers.
In short: this setting tells VAD, “Don’t count it as speech unless it lasts at least this long.”

Prefix Padding

Specifies how much audio to retain from just before VAD detects speech, which helps capture the very beginning of what the caller says. Without prefix padding, the system might clip off the first syllable or two when the caller interrupts (barge-in). Adding padding ensures a more complete transcription.
  • Higher values: more audio is included before the detected speech; safer for capturing soft or quick interruptions.
  • Lower values: less pre-roll audio; tighter and faster barge-in handling, but may miss the start of a sentence.
In short: it adds a small “audio buffer” before detected speech so interruptions aren’t missing their first words.

Allow Interruptions

Controls whether the caller is allowed to interrupt the agent while it is speaking (a behavior known as barge-in).
  • Enabled: the caller can speak at any time, and the agent will stop talking as soon as an interruption is detected. This creates a more natural, conversational experience.
  • Disabled: the agent completes its full message before listening again, preventing mid-sentence interruptions.
In short: turn this on for natural dialogue; turn it off for structured or compliance-driven scripts.

Resume After Brief Sounds

Determines whether the agent should continue speaking after short, non-meaningful sounds from the caller — such as coughs, “mm-hmm,” throat clears, or other quick noises. When enabled, the agent will pause momentarily but then resume its message automatically unless the caller continues speaking long enough to trigger a full interruption.
  • Enabled: brief noises do not interrupt the agent. It resumes speaking after a short delay.
  • Disabled: any detected sound is treated as speech, making interruptions more sensitive.
In short: this setting helps prevent accidental interruptions caused by small background or acknowledgment sounds.

Resume Timeout

Determines how long the agent waits before continuing its response after a non-interrupting sound (such as a cough, “mm-hmm,” or short vocalization). This setting only applies when:
  1. The user makes a brief noise that does not meet the interruption thresholds
    (e.g., Minimum Interruption Duration or Minimum Interruption Words), and
  2. Resume After Brief Sounds is enabled.
In these cases, the agent pauses briefly, then resumes speaking once the timeout expires.
  • Shorter values: the agent continues speaking sooner after minor noises.
  • Longer values: the agent waits longer before resuming, which can feel more natural for callers who make short filler sounds.
In short: this setting controls when the agent should resume talking after determining that the user did not actually interrupt.

End Call on Silence

Enables the workflow to automatically end the call if the caller remains silent for too long.
This prevents the agent from staying on the line indefinitely when a caller has hung up, walked away, or abandoned the session.
When enabled:
  • The system begins monitoring for extended silence.
  • If no meaningful speech is detected within the configured timeout window, the call ends gracefully.
This setting is especially useful for:
  • Outbound calls where recipients often pick up but say nothing.
  • Workflows with long pauses that need guardrails to avoid unnecessary usage charges.
  • Any scenario where abandoned calls should resolve without agent intervention.
In short: this protects your minutes and ensures workflows cleanly terminate when a caller is silent for an extended period.

Silence Timeout (s)

Defines how long the system should wait (in seconds) before ending the call due to sustained silence.
This value is only used when End Call on Silence is enabled.
  • Lower values (10–30s): end abandoned calls quickly and reduce wasted minutes.
  • Higher values (60–120s): allow more time for callers who pause frequently or require a longer thinking period.
The allowed range is 10–120 seconds. In short: this setting controls the exact silence duration that triggers the automatic call termination.

Best Practices

  • Keep Master Instructions concise and universal — use Agent nodes for case-specific instructions.
  • Use External Inputs only for data you actually need; avoid cluttering the interface.
  • Adjust timing carefully — overly aggressive interruption settings may feel unnatural.
  • Always verify test runs after changing timing or transcriber/voice models.

Summary

The Start node establishes the baseline configuration for the workflow:
  • Default models (Hear/Think/Speak)
  • Master instruction
  • Runtime inputs
  • Timing & interruption behavior
  • The starting point of the node graph
Every workflow execution begins here, and all agent behavior stems from these foundational settings.