Why doesn't the model generate |DSML|special tokens?

#29
by Zchu - opened

I manually encoded the messages using the encode_messages method from encoding_dsv32.py from the repository and made requests using the completion API, deployed by vLLM.
I noticed that the model-generated tool call content does not include the |DSML| special symbols, which causes parse_message_from_completion_text to fail when parsing tool calls.

Here's my code snippet:

messages = json.load(open("./encoding/test_input_search_wo_date.json"))["messages"][0:1]
encoded = encode_messages(messages, thinking_mode="thinking", context=None, drop_thinking=True, add_default_bos_token=True)
response = client.completions.create(
    model="deepseek-v3.2",
    prompt=encoded,
    max_tokens=4096,
    temperature=0.0,
)
response_text = response.choices[0].text
response_text:
...thinking content is parsed correct.</think>

<function_calls>
<invoke name="search">
<parameter name="query" string="true">rescinding previous administration’s plan to open most of the 22 million acres of the National Petroleum Reserve in Alaska to oil and gas drilling</parameter>
<parameter name="topn" string="false">10</parameter>
<parameter name="source" string="true">web</parameter>
</invoke>
</function_calls>

where the correct function call content format should be:

<|DSML|function_calls>
<|DSML|invoke name="search">....

The same issue
https://github.com/sgl-project/sglang/issues/14695

Perhaps I have identified the cause of the error. The current DeepSeek-V3.2 model outputs two formats of tool calls when invoking tools: the old version without DSML and the new version with DSML.

The current encoding_v32 parser requires DSML tokens, which frequently leads to tool_call_parser errors.

A simple solution is to manually encode the messages using encoding_v32, call the vllm/sgl using completion endpoint, and then replace the function_call tokens in the returned text.

    self.dsml_token = "|DSML|"
    self.eos_token = "<|end▁of▁sentence|>"
    self.special_token_map = {
            "<function_calls>": "<{dsml_token}function_calls>".format(dsml_token=self.dsml_token),
            "</function_calls>": "</{dsml_token}function_calls>".format(dsml_token=self.dsml_token),
            "<invoke name=": "<{dsml_token}invoke name=".format(dsml_token=self.dsml_token),
            "</invoke>": "</{dsml_token}invoke>".format(dsml_token=self.dsml_token),
            "<parameter name=": "<{dsml_token}parameter name=".format(dsml_token=self.dsml_token),
            "</parameter>": "</{dsml_token}parameter>".format(dsml_token=self.dsml_token),
        }

   def _fix_completion_text(self, completion_text: str) -> str:
        if not completion_text.endswith("function_calls>") or not completion_text.endswith(self.eos_token):
            completion_text += self.eos_token

        for token, replacement in self.special_token_map.items():
            completion_text = completion_text.replace(token, replacement)
        return completion_text

## encode and parse
encoded_messages = encode_messages(messages)
# use completion endpoint
response = await async_completion(encoded_messages, self.config["llm"]["local"]["model_name"], self.llm_endpoint)
response_text = self._fix_completion_text(response.text)
response_msg = parse_message_from_completion_text(response_text, thinking_mode="thinking")

Sign up or log in to comment