| --- |
| tags: |
| - computer_vision |
| - pose_estimation |
| - animal_pose_estimation |
| - deeplabcut |
| pipeline_tag: keypoint-detection |
| --- |
| |
| # MODEL CARD: |
|
|
| ## Model Details |
|
|
| • SuperAnimal-Quadruped model(s) developed by the [M.W.Mathis Lab](http://www.mackenziemathislab.org/) in 2023, trained to predict quadruped pose from images. |
| Please see [Shaokai Ye et al. 2023](https://arxiv.org/abs/2203.07436) for details. |
|
|
| • The there are three main models (and several auxiliary model checkpoints): |
| - `pose_model.pth` is an HRNet-w32 compatable for DLC3.0+ Pytorch code, trained on our Quadruped-80K dataset. |
| - `detector.pt` is a Faster R-CNN that can be used as a detector for top-down detection. |
| - `hrnet_w32_quadruped80k.pth` is an HRNet-w32 trained with mmpose on our Quadruped-80K dataset. |
|
|
| • Full training details can be found in Ye et al. 2023. |
| You can use the pose_model and detector simply with our light-weight loading package called [DLCLibrary](https://github.com/DeepLabCut/DLClibrary). |
| Here is an example useage: |
| |
| ```python |
| from pathlib import Path |
| from dlclibrary import download_huggingface_model |
| |
| # Creates a folder and downloads the model to it |
| model_dir = Path("./superanimal_quadruped_model_pytorch") |
| model_dir.mkdir() |
| download_huggingface_model("superanimal_quadruped_pytorch", model_dir) |
| ``` |
| |
| ## Intended Use |
| • Intended to be used for pose estimation of quadruped images taken from side-view. The model serves a better starting |
| point than ImageNet weights in downstream datasets such as AP-10K. |
| |
| • Intended for academic and research professionals working in fields related to animal behavior, such as neuroscience |
| and ecology. |
| |
| • Not suitable as a zeros-shot model for applications that require high keypiont precision, but can be fine-tuned with |
| minimal data to reach human-level accuracy. Also not suitable for videos that look dramatically different from those |
| we show in the paper. |
| |
| ## Factors |
| |
| • Based on the known robustness issues of neural networks, the relevant factors include the lighting, contrast and |
| resolution of the video frames. The present of objects might also cause false detections and erroneous keypoints. |
| When two or more animals are extremely close, it could cause the top-down detectors to only detect only one animal, |
| if used without further fine-tuning or with a method such as BUCTD (Zhou et al. 2023 ICCV). |
| |
| ## Metrics |
| • Mean Average Precision (mAP) |
| |
| ## Evaluation Data |
| • In the paper we benchmark on AP-10K, AnimalPose, Horse-10, and iRodent using a leave-one-out strategy. Here, |
| we provide the model that has been trained on all datasets (see below), therefore it should be considered “fine-tuned" |
| on all animal training data listed below. This model is meant for production and evaluation in downstream scientific |
| applications. |
| |
| ## Training Data: |
| |
| It consists of being trained together on the following datasets: |
| |
| - **AwA-Pose** Quadruped dataset, see full details at (1). |
| - **AnimalPose** See full details at (2). |
| - **AcinoSet** See full details at (3). |
| - **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at (4). |
| - **StanfordDogs** See full details at (5, 6). |
| - **AP-10K** See full details at (7). |
| - **iRodent** We utilized the iNaturalist API functions for scraping observations |
| with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the |
| ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are |
| Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid |
| Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse |
| (Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then |
| generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that |
| uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (9) model with a ResNet-50-FPN backbone (10), |
| pretrained on the COCO datasets (11). The processed 443 images were then manually labeled with both pose annotations and |
| segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392. |
| **APT-36K** See full details at (12). |
| |
| Here is an image with the keypoint guide: |
| <p align="center"> |
| <img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" width="95%"> |
| </p> |
| |
| Please note that each dataset was labeled by separate labs \& separate individuals, therefore while we map names |
| to a unified pose vocabulary (found here: https://github.com/AdaptiveMotorControlLab/modelzoo-figures), there will be annotator bias in keypoint placement (See the Supplementary Note on annotator bias). |
| You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle. |
| We recommend if performance is not as good as you need it to be, first try video adaptation (see Ye et al. 2023), |
| or fine-tune these weights with your own labeling. |
| |
| |
| ## Ethical Considerations |
| |
| • No experimental data was collected for this model; all datasets used are cited. |
| |
| ## Caveats and Recommendations |
| |
| • The model may have reduced accuracy in scenarios with extremely varied lighting conditions or atypical animal |
| characteristics not well-represented in the training data. |
| |
| • Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a |
| unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2023 for our Supplementary |
| Note on annotator bias). You will also note the dataset is highly diverse across species, but collectively has more |
| representation of domesticated animals like dogs, cats, horses, and cattle. We recommend if performance is not as |
| good as you need it to be, first try video adaptation (see Ye et al. 2023), or fine-tune these weights with your own |
| labeling. |
| |
| ## License |
| |
| Modified MIT. |
| |
| Copyright 2023 by Mackenzie Mathis, Shaokai Ye, and contributors. |
| |
| Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive, |
| and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”) |
| to use the "MODEL" weights (hereafter "MODEL"), subject to the following conditions: |
| |
| The above copyright notice and this permission notice shall be included in all copies or substantial |
| portions of the Software: |
| |
| This software may not be used to harm any animal deliberately. |
| |
| LICENSEE acknowledges that the MODEL is a research tool. |
| THE MODEL IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING |
| BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. |
| IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, |
| WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL |
| OR THE USE OR OTHER DEALINGS IN THE MODEL. |
| |
| If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis |
| (mackenzie@post.harvard.edu) and/or the TTO office at EPFL (tto@epfl.ch) for a commercial use license. |
| |
| Please cite **Ye et al** if you use this model in your work https://arxiv.org/abs/2203.07436v2. |
| |
| |
| ## References |
| |
| 1. Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021 |
| 2. Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation. |
| 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019. |
| 3. Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset: |
| A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation |
| (ICRA), pages 13901–13908, 2021. |
| 4. Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining |
| boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, |
| pages 1859–1868, 2021. |
| 5. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop |
| on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011. |
| 6. Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of |
| animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018. |
| 7. Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth |
| Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. |
| 8. iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020 |
| 9. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer |
| vision, pages 2961–2969, 2017. |
| 10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016. |
| 11. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar, |
| and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014 |
| 12. Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, and Dacheng Tao. Apt-36k: A large-scale benchmark for animal pose estimation and |
| tracking. Advances in Neural Information Processing Systems, 35:17301–17313, 2022 |