Why doesn't the model generate |DSML|special tokens?
I manually encoded the messages using the encode_messages method from encoding_dsv32.py from the repository and made requests using the completion API, deployed by vLLM.
I noticed that the model-generated tool call content does not include the |DSML| special symbols, which causes parse_message_from_completion_text to fail when parsing tool calls.
Here's my code snippet:
messages = json.load(open("./encoding/test_input_search_wo_date.json"))["messages"][0:1]
encoded = encode_messages(messages, thinking_mode="thinking", context=None, drop_thinking=True, add_default_bos_token=True)
response = client.completions.create(
model="deepseek-v3.2",
prompt=encoded,
max_tokens=4096,
temperature=0.0,
)
response_text = response.choices[0].text
response_text:
...thinking content is parsed correct.</think>
<function_calls>
<invoke name="search">
<parameter name="query" string="true">rescinding previous administration’s plan to open most of the 22 million acres of the National Petroleum Reserve in Alaska to oil and gas drilling</parameter>
<parameter name="topn" string="false">10</parameter>
<parameter name="source" string="true">web</parameter>
</invoke>
</function_calls>
where the correct function call content format should be:
<|DSML|function_calls>
<|DSML|invoke name="search">....
The same issue
https://github.com/sgl-project/sglang/issues/14695
Perhaps I have identified the cause of the error. The current DeepSeek-V3.2 model outputs two formats of tool calls when invoking tools: the old version without DSML and the new version with DSML.
The current encoding_v32 parser requires DSML tokens, which frequently leads to tool_call_parser errors.
A simple solution is to manually encode the messages using encoding_v32, call the vllm/sgl using completion endpoint, and then replace the function_call tokens in the returned text.
self.dsml_token = "|DSML|"
self.eos_token = "<|end▁of▁sentence|>"
self.special_token_map = {
"<function_calls>": "<{dsml_token}function_calls>".format(dsml_token=self.dsml_token),
"</function_calls>": "</{dsml_token}function_calls>".format(dsml_token=self.dsml_token),
"<invoke name=": "<{dsml_token}invoke name=".format(dsml_token=self.dsml_token),
"</invoke>": "</{dsml_token}invoke>".format(dsml_token=self.dsml_token),
"<parameter name=": "<{dsml_token}parameter name=".format(dsml_token=self.dsml_token),
"</parameter>": "</{dsml_token}parameter>".format(dsml_token=self.dsml_token),
}
def _fix_completion_text(self, completion_text: str) -> str:
if not completion_text.endswith("function_calls>") or not completion_text.endswith(self.eos_token):
completion_text += self.eos_token
for token, replacement in self.special_token_map.items():
completion_text = completion_text.replace(token, replacement)
return completion_text
## encode and parse
encoded_messages = encode_messages(messages)
# use completion endpoint
response = await async_completion(encoded_messages, self.config["llm"]["local"]["model_name"], self.llm_endpoint)
response_text = self._fix_completion_text(response.text)
response_msg = parse_message_from_completion_text(response_text, thinking_mode="thinking")