Revised for training

This commit is contained in:
ldy
2025-07-23 14:54:46 +08:00
parent 6dbd2f3281
commit 229f6bb027
32 changed files with 59884 additions and 11081 deletions

View File

@@ -118,28 +118,34 @@ python scripts/comprehensive_validation.py \
**Purpose**: Head-to-head performance comparisons with baseline models.
#### Retriever Comparison (`compare_retriever.py`)
#### Unified Model Comparison (`compare_models.py`)
```bash
python scripts/compare_retriever.py \
--finetuned_model_path ./output/bge-m3-enhanced/final_model \
--baseline_model_path BAAI/bge-m3 \
# Compare retriever
python scripts/compare_models.py \
--model_type retriever \
--finetuned_model ./output/bge-m3-enhanced/final_model \
--baseline_model BAAI/bge-m3 \
--data_path data/datasets/examples/embedding_data.jsonl \
--batch_size 16 \
--max_samples 1000 \
--output ./retriever_comparison.txt
```
--max_samples 1000
#### Reranker Comparison (`compare_reranker.py`)
```bash
python scripts/compare_reranker.py \
--finetuned_model_path ./output/bge-reranker/final_model \
--baseline_model_path BAAI/bge-reranker-base \
# Compare reranker
python scripts/compare_models.py \
--model_type reranker \
--finetuned_model ./output/bge-reranker/final_model \
--baseline_model BAAI/bge-reranker-base \
--data_path data/datasets/examples/reranker_data.jsonl \
--batch_size 16 \
--max_samples 1000 \
--output ./reranker_comparison.txt
--max_samples 1000
# Compare both at once
python scripts/compare_models.py \
--model_type both \
--finetuned_retriever ./output/bge-m3-enhanced/final_model \
--finetuned_reranker ./output/bge-reranker/final_model \
--retriever_data data/datasets/examples/embedding_data.jsonl \
--reranker_data data/datasets/examples/reranker_data.jsonl
```
**Output**: Tab-separated comparison tables with metrics and performance deltas.

View File

@@ -0,0 +1 @@