| | ---
|
| | language: zh
|
| | license: apache-2.0
|
| | tags:
|
| | - sentiment-analysis
|
| | - chinese
|
| | - finance
|
| | - finbert
|
| | - crypto
|
| | - text-classification
|
| | - news
|
| | datasets:
|
| | - custom
|
| | metrics:
|
| | - accuracy
|
| | - f1
|
| | - precision
|
| | - recall
|
| | model-index:
|
| | - name: Chinese Financial Sentiment Analysis (Crypto)
|
| | results:
|
| | - task:
|
| | type: text-classification
|
| | name: Sentiment Analysis
|
| | metrics:
|
| | - type: accuracy
|
| | value: 0.8484
|
| | name: Accuracy
|
| | - type: f1
|
| | value: 0.8488
|
| | name: F1 Score
|
| | - type: precision
|
| | value: 0.8536
|
| | name: Precision
|
| | - type: recall
|
| | value: 0.8484
|
| | name: Recall
|
| | ---
|
| |
|
| | # Chinese Financial Sentiment Analysis Model (Crypto Focus)
|
| |
|
| | 中文金融情感分析模型(加密货币领域)
|
| |
|
| | ## 模型描述 | Model Description
|
| |
|
| | 本模型基于 `yiyanghkust/finbert-tone-chinese` 经过多轮迭代微调,专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别:正面(Positive)、中性(Neutral)和负面(Negative)。
|
| |
|
| | 训练数据经过 Claude AI 逐条人工审阅、纠正标注错误,确保数据质量。
|
| |
|
| | This model is iteratively fine-tuned from `yiyanghkust/finbert-tone-chinese`, specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It classifies text into three sentiment categories: Positive, Neutral, and Negative.
|
| |
|
| | Training data is manually reviewed and corrected entry-by-entry by Claude AI to ensure annotation quality.
|
| |
|
| | ## 训练数据 | Training Data
|
| |
|
| | - **数据量 | Size**: 2208条人工审阅标注的中文金融新闻 | 2208 manually reviewed Chinese financial news articles
|
| | - **数据来源 | Source**: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets
|
| | - **标注方式 | Annotation**: 模型预测 + Claude AI 逐条审阅纠正 | Model prediction + Claude AI manual review & correction
|
| | - **数据分布 | Distribution**:
|
| | - Positive(正面): 734条 (33.2%)
|
| | - Neutral(中性): 899条 (40.7%)
|
| | - Negative(负面): 575条 (26.0%)
|
| |
|
| | ## 性能指标 | Performance Metrics
|
| |
|
| | 在442条测试集上的表现(80/20分层划分) | Performance on 442 test samples (80/20 stratified split):
|
| |
|
| | | 指标 Metric | 数值 Value |
|
| | |-------------|-----------|
|
| | | 准确率 Accuracy | 84.84% |
|
| | | F1分数 F1 Score | 84.88% |
|
| | | 精确率 Precision | 85.36% |
|
| | | 召回率 Recall | 84.84% |
|
| |
|
| | ### 各类别详细指标 | Per-class Metrics
|
| |
|
| | | 类别 Class | Precision | Recall | F1 |
|
| | |-----------|-----------|--------|----|
|
| | | negative | 0.938 | 0.791 | 0.858 |
|
| | | neutral | 0.806 | 0.878 | 0.840 |
|
| | | positive | 0.846 | 0.857 | 0.851 |
|
| | | **weighted avg** | **0.854** | **0.848** | **0.849** |
|
| |
|
| | ### 性能迭代历史 | Performance History
|
| |
|
| | | 版本 Version | 训练数据 Data | F1 Score | Accuracy |
|
| | |------|----------|----------|----------|
|
| | | v1.0 | 500条 | 61.65% | — |
|
| | | v2.0 | 1000条 | 63.65% | 64.50% |
|
| | | v3.5 | 1500条 | 67.16% | 68.33% |
|
| | | v4.0 | 1700条 | 70.91% | 72.06% |
|
| | | v5.0 | 2008条 | 76.88% | 77.36% |
|
| | | **v6.0** | **2208条** | **84.88%** | **84.84%** |
|
| |
|
| | ## 使用方法 | Usage
|
| |
|
| | ### 快速开始 | Quick Start
|
| |
|
| | ```python
|
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| | import torch
|
| |
|
| | # 加载模型和分词器 | Load model and tokenizer
|
| | model_name = "LocalOptimum/chinese-crypto-sentiment"
|
| | tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| | model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
| |
|
| | # 分析文本 | Analyze text
|
| | text = "比特币突破10万美元创历史新高"
|
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
|
| |
|
| | # 预测 | Predict
|
| | with torch.no_grad():
|
| | outputs = model(**inputs)
|
| | predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
|
| | predicted_class = torch.argmax(predictions, dim=-1).item()
|
| |
|
| | # 结果映射 | Result mapping
|
| | labels = ['positive', 'neutral', 'negative']
|
| | sentiment = labels[predicted_class]
|
| | confidence = predictions[0][predicted_class].item()
|
| |
|
| | print(f"情感: {sentiment}")
|
| | print(f"置信度: {confidence:.4f}")
|
| | ```
|
| |
|
| | ### 批量处理 | Batch Processing
|
| |
|
| | ```python
|
| | texts = [
|
| | "币安获得阿布扎比监管授权",
|
| | "以太坊完成Fusaka升级",
|
| | "某交易所遭攻击损失100万美元"
|
| | ]
|
| |
|
| | inputs = tokenizer(texts, return_tensors="pt", truncation=True,
|
| | max_length=128, padding=True)
|
| |
|
| | with torch.no_grad():
|
| | outputs = model(**inputs)
|
| | predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
|
| | predicted_classes = torch.argmax(predictions, dim=-1)
|
| |
|
| | labels = ['positive', 'neutral', 'negative']
|
| | for text, pred in zip(texts, predicted_classes):
|
| | print(f"{text} -> {labels[pred]}")
|
| | ```
|
| |
|
| | ## 训练参数 | Training Configuration
|
| |
|
| | - **基础模型 | Base Model**: yiyanghkust/finbert-tone-chinese(经多轮迭代微调)
|
| | - **训练轮数 | Epochs**: 5(Early Stopping patience=3,Epoch 5 达到最佳)
|
| | - **批次大小 | Batch Size**: 16
|
| | - **学习率 | Learning Rate**: 2e-5
|
| | - **最大序列长度 | Max Length**: 128
|
| | - **训练设备 | Device**: NVIDIA GeForce RTX 5080 Laptop GPU (16GB)
|
| | - **混合精度 | Mixed Precision**: FP16
|
| | - **最佳模型选择 | Best Model**: metric_for_best_model='f1'
|
| |
|
| | ## 适用场景 | Use Cases
|
| |
|
| | - ✅ 加密货币新闻情感分析
|
| | - ✅ 社交媒体舆情监控
|
| | - ✅ 金融市场情绪指标
|
| | - ✅ 实时新闻情感跟踪
|
| | - ✅ 投资决策辅助参考
|
| |
|
| | ## 核心标注原则 | Annotation Principles
|
| |
|
| | - 加密货币是**风险资产**(类似美股),不是避险资产(类似黄金)
|
| | - 战争、地缘冲突、关税 → **negative**(利空风险资产)
|
| | - 平台上线新币种/功能 → **neutral**(常规运营,非利好)
|
| | - 个人观点/分析师预测 → **neutral**(主观意见)
|
| | - 明确利好(ETF通过、大额买入、政策支持)→ **positive**
|
| | - 明确利空(清算、暴跌、诈骗、监管打压)→ **negative**
|
| |
|
| | ## 局限性 | Limitations
|
| |
|
| | - ⚠️ 主要针对加密货币领域的金融新闻,其他金融领域可能表现不佳
|
| | - ⚠️ 短文本(少于10字)的分析准确率可能下降
|
| | - ⚠️ 仅支持简体中文
|
| | - ⚠️ 模型不能替代人工判断,仅供参考
|
| |
|
| | ## 许可证 | License
|
| |
|
| | Apache-2.0
|
| |
|
| | ## 引用 | Citation
|
| |
|
| | 如果使用本模型,请引用:
|
| |
|
| | ```bibtex
|
| | @misc{watchtower-sentiment-2026,
|
| | title={Chinese Financial Sentiment Analysis Model (Crypto Focus)},
|
| | author={Onefly},
|
| | year={2026},
|
| | howpublished={\url{https://huggingface.co/LocalOptimum/chinese-crypto-sentiment}},
|
| | note={Fine-tuned from yiyanghkust/finbert-tone-chinese, 2208 samples, F1=84.88\%}
|
| | }
|
| | ```
|
| |
|
| | ## 基础模型 | Base Model
|
| |
|
| | 本模型基于以下模型微调:
|
| | - [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese)
|
| |
|
| | 感谢原作者的贡献!
|
| |
|
| | ## 更新日志 | Changelog
|
| |
|
| | ### v6.0 (2026-02-28)
|
| | - ✅ 扩充训练数据至2208条(+200条Claude人工审阅数据)
|
| | - ✅ F1分数大幅提升(76.88% → 84.88%,+8.00%)
|
| | - ✅ 大规模纠正地缘政治/战争新闻标注(97条 positive→negative,修复"美以打击伊朗"系统性错误)
|
| | - ✅ negative recall 显著提升(67.0% → 79.1%,+12.1pp)
|
| | - ✅ 地缘政治专项验证:14条测试全部几乎正确(92.9%),8条战争新闻置信度1.00判为negative
|
| |
|
| | ### v5.0 (2026-02-28)
|
| | - ✅ 扩充训练数据至2008条(+308条Claude人工审阅数据)
|
| | - ✅ F1分数大幅提升(70.91% → 76.88%,+5.97%)
|
| | - ✅ 纠正模型系统性错误(positive→neutral 过度预测等)
|
| | - ✅ 数据分布优化:negative从362增至431条
|
| |
|
| | ### v4.0 (2026-02-28)
|
| | - ✅ 扩充训练数据至1700条
|
| | - ✅ F1分数提升(67.16% → 70.91%,+3.75%)
|
| | - ✅ 引入Claude AI逐条审阅标注流程
|
| |
|
| | ### v3.5 (2026-02-27)
|
| | - ✅ 扩充训练数据至1500条
|
| | - ✅ F1分数提升(63.65% → 67.16%,+3.51%)
|
| | - ✅ 大幅修正战争/地缘冲突→positive的系统性错误
|
| |
|
| | ### v2.0 (2025-12-09)
|
| | - ✅ 扩充训练数据至1000条
|
| | - ✅ 修正标注错误,提升数据质量
|
| | - ✅ F1分数提升(61.65% → 63.65%,+2.01%)
|
| |
|
| | ### v1.0 (Initial Release)
|
| | - 基于500条标注数据的初始版本
|
| |
|
| | ## 联系方式 | Contact
|
| |
|
| | 如有问题或建议,欢迎提 issue 或 PR。
|
| |
|
| | ---
|
| |
|
| | **维护者 | Maintainer**: Onefly
|
| | **最后更新 | Last Updated**: 2026-02-28
|
| | |