SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the code-search-net/code_search_net dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-mpnet-base-v2
Maximum Sequence Length: 384 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Dataset:
- code-search-net/code_search_net
Language: code

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BoghdadyJR/al-MiniLM-L6-v2")
# Run inference
sentences = [
    'Keypoint.copy',
    'def copy(self, x=None, y=None):\n        """\n        Create a shallow copy of the Keypoint object.\n\n        Parameters\n        ----------\n        x : None or number, optional\n            Coordinate of the keypoint on the x axis.\n            If ``None``, the instance\'s value will be copied.\n\n        y : None or number, optional\n            Coordinate of the keypoint on the y axis.\n            If ``None``, the instance\'s value will be copied.\n\n        Returns\n        -------\n        imgaug.Keypoint\n            Shallow copy.\n\n        """\n        return self.deepcopy(x=x, y=y)',
    'def build_words_dataset(words=None, vocabulary_size=50000, printable=True, unk_key=\'UNK\'):\n    """Build the words dictionary and replace rare words with \'UNK\' token.\n    The most common word has the smallest integer id.\n\n    Parameters\n    ----------\n    words : list of str or byte\n        The context in list format. You may need to do preprocessing on the words, such as lower case, remove marks etc.\n    vocabulary_size : int\n        The maximum vocabulary size, limiting the vocabulary size. Then the script replaces rare words with \'UNK\' token.\n    printable : boolean\n        Whether to print the read vocabulary size of the given words.\n    unk_key : str\n        Represent the unknown words.\n\n    Returns\n    --------\n    data : list of int\n        The context in a list of ID.\n    count : list of tuple and list\n        Pair words and IDs.\n            - count[0] is a list : the number of rare words\n            - count[1:] are tuples : the number of occurrence of each word\n            - e.g. [[\'UNK\', 418391], (b\'the\', 1061396), (b\'of\', 593677), (b\'and\', 416629), (b\'one\', 411764)]\n    dictionary : dictionary\n        It is `word_to_id` that maps word to ID.\n    reverse_dictionary : a dictionary\n        It is `id_to_word` that maps ID to word.\n\n    Examples\n    --------\n    >>> words = tl.files.load_matt_mahoney_text8_dataset()\n    >>> vocabulary_size = 50000\n    >>> data, count, dictionary, reverse_dictionary = tl.nlp.build_words_dataset(words, vocabulary_size)\n\n    References\n    -----------------\n    - `tensorflow/examples/tutorials/word2vec/word2vec_basic.py <https://github.com/tensorflow/tensorflow/blob/r0.7/tensorflow/examples/tutorials/word2vec/word2vec_basic.py>`__\n\n    """\n    if words is None:\n        raise Exception("words : list of str or byte")\n\n    count = [[unk_key, -1]]\n    count.extend(collections.Counter(words).most_common(vocabulary_size - 1))\n    dictionary = dict()\n    for word, _ in count:\n        dictionary[word] = len(dictionary)\n    data = list()\n    unk_count = 0\n    for word in words:\n        if word in dictionary:\n            index = dictionary[word]\n        else:\n            index = 0  # dictionary[\'UNK\']\n            unk_count += 1\n        data.append(index)\n    count[0][1] = unk_count\n    reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n    if printable:\n        tl.logging.info(\'Real vocabulary size    %d\' % len(collections.Counter(words).keys()))\n        tl.logging.info(\'Limited vocabulary size {}\'.format(vocabulary_size))\n    if len(collections.Counter(words).keys()) < vocabulary_size:\n        raise Exception(\n            "len(collections.Counter(words).keys()) >= vocabulary_size , the limited vocabulary_size must be less than or equal to the read vocabulary_size"\n        )\n    return data, count, dictionary, reverse_dictionary',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.8806
spearman_cosine	0.881
pearson_manhattan	0.8781
spearman_manhattan	0.8798
pearson_euclidean	0.8794
spearman_euclidean	0.881
pearson_dot	0.8806
spearman_dot	0.881
pearson_max	0.8806
spearman_max	0.881

Training Details

Training Dataset

code-search-net/code_search_net

Dataset: code-search-net/code_search_net
Size: 20,000 training samples
Columns: func_name and whole_func_string
Approximate statistics based on the first 1000 samples:
func_name whole_func_string
type string string
details
min: 3 tokens
mean: 8.18 tokens
max: 21 tokens

min: 38 tokens
mean: 192.0 tokens
max: 384 tokens

	func_name	whole_func_string
type	string	string
details	min: 3 tokens mean: 8.18 tokens max: 21 tokens	min: 38 tokens mean: 192.0 tokens max: 384 tokens

Samples:

func_name	whole_func_string
`ImageGraphCut.__msgc_step3_discontinuity_localization`	def __msgc_step3_discontinuity_localization(self): """ Estimate discontinuity in basis of low resolution image segmentation. :return: discontinuity in low resolution """ import scipy start = self._start_time seg = 1 - self.segmentation.astype(np.int8) self.stats["low level object voxels"] = np.sum(seg) self.stats["low level image voxels"] = np.prod(seg.shape) # in seg is now stored low resolution segmentation # back to normal parameters # step 2: discontinuity localization # self.segparams = sparams_hi seg_border = scipy.ndimage.filters.laplace(seg, mode="constant") logger.debug("seg_border: %s", scipy.stats.describe(seg_border, axis=None)) # logger.debug(str(np.max(seg_border))) # logger.debug(str(np.min(seg_border))) seg_border[seg_border != 0] = 1 logger.debug("seg_border: %s", scipy.stats.describe(seg_border, axis=None)) # scipy.ndimage.morphology.distance_transform_edt boundary_dilatation_distance = self.segparams["boundary_dilatation_distance"] seg = scipy.ndimage.morphology.binary_dilation( seg_border, # seg, np.ones( [ (boundary_dilatation_distance * 2) + 1, (boundary_dilatation_distance * 2) + 1, (boundary_dilatation_distance * 2) + 1, ] ), ) if self.keep_temp_properties: self.temp_msgc_lowres_discontinuity = seg else: self.temp_msgc_lowres_discontinuity = None if self.debug_images: import sed3 pd = sed3.sed3(seg_border) # ), contour=seg) pd.show() pd = sed3.sed3(seg) # ), contour=seg) pd.show() # segzoom = scipy.ndimage.interpolation.zoom(seg.astype('float'), zoom, # order=0).astype('int8') self.stats["t3"] = time.time() - start return seg
`ImageGraphCut.__multiscale_gc_lo2hi_run`	def __multiscale_gc_lo2hi_run(self): # , pyed): """ Run Graph-Cut segmentation with refinement of low resolution multiscale graph. In first step is performed normal GC on low resolution data Second step construct finer grid on edges of segmentation from first step. There is no option for use without use_boundary_penalties """ # from PyQt4.QtCore import pyqtRemoveInputHook # pyqtRemoveInputHook() self._msgc_lo2hi_resize_init() self.__msgc_step0_init() hard_constraints = self.__msgc_step12_low_resolution_segmentation() # ===== high resolution data processing seg = self.__msgc_step3_discontinuity_localization() self.stats["t3.1"] = (time.time() - self._start_time) graph = Graph( seg, voxelsize=self.voxelsize, nsplit=self.segparams["block_size"], edge_weight_table=self._msgc_npenalty_table, compute_low_nodes_index=True, ) # graph.run() = graph.generate_base_grid() + graph.split_voxels() # graph.run() graph.generate_base_grid() self.stats["t3.2"] = (time.time() - self._start_time) graph.split_voxels() self.stats["t3.3"] = (time.time() - self._start_time) self.stats.update(graph.stats) self.stats["t4"] = (time.time() - self._start_time) mul_mask, mul_val = self.__msgc_tlinks_area_weight_from_low_segmentation(seg) area_weight = 1 unariesalt = self.__create_tlinks( self.img, self.voxelsize, self.seeds, area_weight=area_weight, hard_constraints=hard_constraints, mul_mask=None, mul_val=None, ) # N-links prepared self.stats["t5"] = (time.time() - self._start_time) un, ind = np.unique(graph.msinds, return_index=True) self.stats["t6"] = (time.time() - self._start_time) self.stats["t7"] = (time.time() - self._start_time) unariesalt2_lo2hi = np.hstack( [unariesalt[ind, 0, 0].reshape(-1, 1), unariesalt[ind, 0, 1].reshape(-1, 1)] ) nlinks_lo2hi = np.hstack([graph.edges, graph.edges_weights.reshape(-1, 1)]) if self.debug_images: import sed3 ed = sed3.sed3(unariesalt[:, :, 0].reshape(self.img.shape)) ed.show() import sed3 ed = sed3.sed3(unariesalt[:, :, 1].reshape(self.img.shape)) ed.show() # ed = sed3.sed3(seg) # ed.show() # import sed3 # ed = sed3.sed3(graph.data) # ed.show() # import sed3 # ed = sed3.sed3(graph.msinds) # ed.show() # nlinks, unariesalt2, msinds = self.__msgc_step45678_construct_graph(area_weight, hard_constraints, seg) # self.__msgc_step9_finish_perform_gc_and_reshape(nlinks, unariesalt2, msinds) self.__msgc_step9_finish_perform_gc_and_reshape( nlinks_lo2hi, unariesalt2_lo2hi, graph.msinds ) self._msgc_lo2hi_resize_clean_finish()
`ImageGraphCut.__multiscale_gc_hi2lo_run`	def __multiscale_gc_hi2lo_run(self): # , pyed): """ Run Graph-Cut segmentation with simplifiyng of high resolution multiscale graph. In first step is performed normal GC on low resolution data Second step construct finer grid on edges of segmentation from first step. There is no option for use without use_boundary_penalties """ # from PyQt4.QtCore import pyqtRemoveInputHook # pyqtRemoveInputHook() self.__msgc_step0_init() hard_constraints = self.__msgc_step12_low_resolution_segmentation() # ===== high resolution data processing seg = self.__msgc_step3_discontinuity_localization() nlinks, unariesalt2, msinds = self.__msgc_step45678_hi2lo_construct_graph( hard_constraints, seg ) self.__msgc_step9_finish_perform_gc_and_reshape(nlinks, unariesalt2, msinds)

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

code-search-net/code_search_net

Dataset: code-search-net/code_search_net
Size: 15,000 evaluation samples
Columns: func_name and whole_func_string
Approximate statistics based on the first 1000 samples:
func_name whole_func_string
type string string
details
min: 3 tokens
mean: 9.23 tokens
max: 24 tokens

min: 50 tokens
mean: 276.31 tokens
max: 384 tokens

	func_name	whole_func_string
type	string	string
details	min: 3 tokens mean: 9.23 tokens max: 24 tokens	min: 50 tokens mean: 276.31 tokens max: 384 tokens

Samples:

func_name	whole_func_string
`learn`	def learn(env, network, seed=None, lr=5e-4, total_timesteps=100000, buffer_size=50000, exploration_fraction=0.1, exploration_final_eps=0.02, train_freq=1, batch_size=32, print_freq=100, checkpoint_freq=10000, checkpoint_path=None, learning_starts=1000, gamma=1.0, target_network_update_freq=500, prioritized_replay=False, prioritized_replay_alpha=0.6, prioritized_replay_beta0=0.4, prioritized_replay_beta_iters=None, prioritized_replay_eps=1e-6, param_noise=False, callback=None, load_path=None, network_kwargs ): """Train a deepq model. Parameters ------- env: gym.Env environment to train on network: string or a function neural network to use as a q function approximator. If string, has to be one of the names of registered models in baselines.common.models (mlp, cnn, conv_only). If a function, should take an observation tensor and return a latent variable tensor, which will be mapped to the Q function heads (see build_q_func in baselines.deepq.models for details on that) seed: int or None prng seed. The runs with the same seed "should" give the same results. If None, no seeding is used. lr: float learning rate for adam optimizer total_timesteps: int number of env steps to optimizer for buffer_size: int size of the replay buffer exploration_fraction: float fraction of entire training period over which the exploration rate is annealed exploration_final_eps: float final value of random action probability train_freq: int update the model every train_freq steps. set to None to disable printing batch_size: int size of a batched sampled from replay buffer for training print_freq: int how often to print out training progress set to None to disable printing checkpoint_freq: int how often to save the model. This is so that the best version is restored at the end of the training. If you do not wish to restore the best version at the end of the training set this variable to None. learning_starts: int how many steps of the model to collect transitions for before learning starts gamma: float discount factor target_network_update_freq: int update the target network every target_network_update_freq steps. prioritized_replay: True if True prioritized replay buffer will be used. prioritized_replay_alpha: float alpha parameter for prioritized replay buffer prioritized_replay_beta0: float initial value of beta for prioritized replay buffer prioritized_replay_beta_iters: int number of iterations over which beta will be annealed from initial value to 1.0. If set to None equals to total_timesteps. prioritized_replay_eps: float epsilon to add to the TD errors when updating priorities. param_noise: bool whether or not to use parameter space noise (https://arxiv.org/abs/1706.01905) callback: (locals, globals) -> None function called at every steps with state of the algorithm. If callback returns true training stops. load_path: str path to load the model from. (default: None) network_kwargs additional keyword arguments to pass to the network builder. Returns ------- act: ActWrapper Wrapper over act function. Adds ability to save it and load it. See header of baselines/deepq/categorical.py for details on the act function. """ # Create all the functions necessary to train the model sess = get_session() set_global_seeds(seed) q_func = build_q_func(network, *network_kwargs) # capture the shape outside the closure so that the env object is not serialized # by cloudpickle when serializing make_obs_ph observation_space = env.observation_space def make_obs_ph(name): return ObservationInput(observation_space, name=name) act, train, update_target, debug = deepq.build_train( make_obs_ph=make_obs_ph, q_func=q_func, num_actions=env.action_space.n, optimizer=tf.train.AdamOptimizer(learning_rate=lr), gamma=gamma, grad_norm_clipping=10, param_noise=param_noise ) act_params = { 'make_obs_ph': make_obs_ph, 'q_func': q_func, 'num_actions': env.action_space.n, } act = ActWrapper(act, act_params) # Create the replay buffer if prioritized_replay: replay_buffer = PrioritizedReplayBuffer(buffer_size, alpha=prioritized_replay_alpha) if prioritized_replay_beta_iters is None: prioritized_replay_beta_iters = total_timesteps beta_schedule = LinearSchedule(prioritized_replay_beta_iters, initial_p=prioritized_replay_beta0, final_p=1.0) else: replay_buffer = ReplayBuffer(buffer_size) beta_schedule = None # Create the schedule for exploration starting from 1. exploration = LinearSchedule(schedule_timesteps=int(exploration_fraction total_timesteps), initial_p=1.0, final_p=exploration_final_eps) # Initialize the parameters and copy them to the target network. U.initialize() update_target() episode_rewards = [0.0] saved_mean_reward = None obs = env.reset() reset = True with tempfile.TemporaryDirectory() as td: td = checkpoint_path or td model_file = os.path.join(td, "model") model_saved = False if tf.train.latest_checkpoint(td) is not None: load_variables(model_file) logger.log('Loaded model from {}'.format(model_file)) model_saved = True elif load_path is not None: load_variables(load_path) logger.log('Loaded model from {}'.format(load_path)) for t in range(total_timesteps): if callback is not None: if callback(locals(), globals()): break # Take action and update exploration to the newest value kwargs = {} if not param_noise: update_eps = exploration.value(t) update_param_noise_threshold = 0. else: update_eps = 0. # Compute the threshold such that the KL divergence between perturbed and non-perturbed # policy is comparable to eps-greedy exploration with eps = exploration.value(t). # See Appendix C.1 in Parameter Space Noise for Exploration, Plappert et al., 2017 # for detailed explanation. update_param_noise_threshold = -np.log(1. - exploration.value(t) + exploration.value(t) / float(env.action_space.n)) kwargs['reset'] = reset kwargs['update_param_noise_threshold'] = update_param_noise_threshold kwargs['update_param_noise_scale'] = True action = act(np.array(obs)[None], update_eps=update_eps, *kwargs)[0] env_action = action reset = False new_obs, rew, done, _ = env.step(env_action) # Store transition in the replay buffer. replay_buffer.add(obs, action, rew, new_obs, float(done)) obs = new_obs episode_rewards[-1] += rew if done: obs = env.reset() episode_rewards.append(0.0) reset = True if t > learning_starts and t % train_freq == 0: # Minimize the error in Bellman's equation on a batch sampled from replay buffer. if prioritized_replay: experience = replay_buffer.sample(batch_size, beta=beta_schedule.value(t)) (obses_t, actions, rewards, obses_tp1, dones, weights, batch_idxes) = experience else: obses_t, actions, rewards, obses_tp1, dones = replay_buffer.sample(batch_size) weights, batch_idxes = np.ones_like(rewards), None td_errors = train(obses_t, actions, rewards, obses_tp1, dones, weights) if prioritized_replay: new_priorities = np.abs(td_errors) + prioritized_replay_eps replay_buffer.update_priorities(batch_idxes, new_priorities) if t > learning_starts and t % target_network_update_freq == 0: # Update target network periodically. update_target() mean_100ep_reward = round(np.mean(episode_rewards[-101:-1]), 1) num_episodes = len(episode_rewards) if done and print_freq is not None and len(episode_rewards) % print_freq == 0: logger.record_tabular("steps", t) logger.record_tabular("episodes", num_episodes) logger.record_tabular("mean 100 episode reward", mean_100ep_reward) logger.record_tabular("% time spent exploring", int(100 exploration.value(t))) logger.dump_tabular() if (checkpoint_freq is not None and t > learning_starts and num_episodes > 100 and t % checkpoint_freq == 0): if saved_mean_reward is None or mean_100ep_reward > saved_mean_reward: if print_freq is not None: logger.log("Saving model due to mean reward increase: {} -> {}".format( saved_mean_reward, mean_100ep_reward)) save_variables(model_file) model_saved = True saved_mean_reward = mean_100ep_reward if model_saved: if print_freq is not None: logger.log("Restored model with mean reward: {}".format(saved_mean_reward)) load_variables(model_file) return act
`ActWrapper.save_act`	def save_act(self, path=None): """Save model to a pickle located at path""" if path is None: path = os.path.join(logger.get_dir(), "model.pkl") with tempfile.TemporaryDirectory() as td: save_variables(os.path.join(td, "model")) arc_name = os.path.join(td, "packed.zip") with zipfile.ZipFile(arc_name, 'w') as zipf: for root, dirs, files in os.walk(td): for fname in files: file_path = os.path.join(root, fname) if file_path != arc_name: zipf.write(file_path, os.path.relpath(file_path, td)) with open(arc_name, "rb") as f: model_data = f.read() with open(path, "wb") as f: cloudpickle.dump((model_data, self._act_params), f)
`nature_cnn`	def nature_cnn(unscaled_images, conv_kwargs): """ CNN from Nature paper. """ scaled_images = tf.cast(unscaled_images, tf.float32) / 255. activ = tf.nn.relu h = activ(conv(scaled_images, 'c1', nf=32, rf=8, stride=4, init_scale=np.sqrt(2), conv_kwargs)) h2 = activ(conv(h, 'c2', nf=64, rf=4, stride=2, init_scale=np.sqrt(2), conv_kwargs)) h3 = activ(conv(h2, 'c3', nf=64, rf=3, stride=1, init_scale=np.sqrt(2), conv_kwargs)) h3 = conv_to_fc(h3) return activ(fc(h3, 'fc1', nh=512, init_scale=np.sqrt(2)))

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	loss	sts-dev_spearman_cosine
0	0	-	-	0.8810
0.08	100	0.4124	0.2191	-
0.16	200	0.108	0.0993	-
0.24	300	0.127	0.0756	-
0.32	400	0.0728	-	-
0.08	100	0.0662	0.0683	-
0.16	200	0.0321	0.0660	-
0.24	300	0.0815	0.0584	-
0.32	400	0.049	0.0591	-
0.4	500	0.0636	0.0612	-
0.48	600	0.0929	0.0577	-
0.56	700	0.0342	0.0568	-
0.64	800	0.0265	0.0572	-
0.72	900	0.0406	0.0551	-
0.8	1000	0.039	0.0549	-
0.88	1100	0.0376	0.0551	-
0.96	1200	0.0823	0.0556	-

Framework Versions

Python: 3.10.13
Sentence Transformers: 3.0.1
Transformers: 4.42.3
PyTorch: 2.1.2
Accelerate: 0.32.1
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for BoghdadyJR/al-MiniLM-L6-v2

Base model

sentence-transformers/all-mpnet-base-v2

Finetuned

(336)

this model

Dataset used to train BoghdadyJR/al-MiniLM-L6-v2

Papers for BoghdadyJR/al-MiniLM-L6-v2

Evaluation results

Pearson Cosine on sts dev
self-reported

0.881
Spearman Cosine on sts dev
self-reported

0.881
Pearson Manhattan on sts dev
self-reported

0.878
Spearman Manhattan on sts dev
self-reported

0.880
Pearson Euclidean on sts dev
self-reported

0.879
Spearman Euclidean on sts dev
self-reported

0.881
Pearson Dot on sts dev
self-reported

0.881
Spearman Dot on sts dev
self-reported

0.881
Pearson Max on sts dev
self-reported

0.881
Spearman Max on sts dev
self-reported

0.881