Orchid2024: A Cultivar-Level Dataset and Methodology for Fine-Grained Classification of Chinese Cymbidium Orchids
Orchid2024 is a fine-grained classification dataset specifically designed for Chinese Cymbidium orchid cultivars. It includes data collected from 20 cities across 12 provincial administrative regions in China and encompasses 1,269 cultivars from 8 Chinese Cymbidium orchid species and 6 additional categories, totaling 156,630 images. The dataset covers nearly all common Chinese Cymbidium cultivars currently found in China, with its fine granularity and focus on the real world making it a unique and practical resource for researchers and practitioners. Our work introduced various visually parameter-efficient fine-tuning methods to expedite model development, achieving the highest top-1 accuracy of 86.14% and top-5 accuracy of 95.44%. The use of this dataset is permitted for non-commercial research and educational activities only.
For a preview of the dataset structure, please refer to: Orchid2024-dataset-structure. The dataset can be downloaded via the following link:
- Compressed version(max side length 512 pixels): Download Link(~3.8GB)
- Full version: Download Link(~10.4GB)
The dataset comprises training images, training information files, and data description files. The training images are divided into training, validation, and test sets in a ratio of 6:2:2, with specific training image structures outlined in datesets/orchid2024. The training information files include train.json, val.json and test.json, generated by datasets/create_json.py for training model. The datesets/orchid2024/classes.csv
file contains the index and name for each cultivar within the Orchid2024 dataset.
Research home page: https://pengyingshu.github.io/orchid/Orchid2024.
configs
: configuration file storage directorydatasets
: dataset storage directorysrc/configs
: handles config parameters for the experimentssrc/config/config.py
: main config setups for experiments and explanation for each of them
src/data
: loading and setup input datasetssrc/engine
: main training and eval actions heresrc/models
: handles backbone archs and heads for different fine-tuning protocolssrc/models/vit_prompt
: a folder contains the same backbones invit_backbones
folder, specified for VPT. This folder should contain the same file names as those invit_backbones
src/models/vit_models.py
: main model for transformer-based modelssrc/models/build_model.py
: main action here to utilize the config and build the model to train / eval
src/solver
: optimization, losses and learning rate schedulessrc/utils
: helper functions for io, loggings, training, visualizationsrun.py
: call this one for training and eval a model with a specified transfer type
Please refer to the env_setup.sh file or execute the command "pip install -r requirements.txt" to install all necessary dependencies.
System Requirements:
- Supported operating systems: Ubuntu 18.08 LTS to 22.04 LTS
- Supported python versions: 3.8 to 3.10
- Windows systems compatibility notice: due to potential library reference issues, the Transformer model loading code may require adjustments to function correctly on Windows systems. One known issue is the
Transformer KeyError
. For further details and potential solutions, please refer to this GitHub link: jeonsworld/ViT-pytorch#11.
The Orchid dataset is stored in datesets/Orchid2024. For both the Orchid dataset and custom datasets, a corresponding training information file should be generated. Run the following Python code in the project root directory to generate the corresponding training information file in the Orchid2024 dataset directory:
python datasets/create_json.py
Then, modify the DATA.DATAPATH item and DATA.NUMBER_CLASSES item in the configuration file. For information on how to modify the configuration document, please refer to Config file description.
The project needs to load the pre-trained model by default. The configuration file specifies DATA.FEATURE as the name of the pre-trained model. Based on this configuration, the pre-trained model is downloaded to the specified directory MODEL.MODEL_ROOT with a corresponding name.
All the model links are obtained directly from official download repositories like ViT, MAE and Swin.
Please note that models downloaded from the "link" column require renaming of the "DATA.FEATURE" item according to the corresponding value in the "name" column. Details as follows:
ViT(Vision Transformer)
By default, the DATA.FEATURE configuration is set to "sup_vitb16_imagenet21k", indicating the use of a pre-trained Vision Transformer (ViT)/B-16 model trained on the ImageNet-21K dataset. After downloading the pre-trained model to the default location specified by MODEL.MODEL_ROOT (typically model/pretrain). For models following the naming convention "sup_vitb16_imagenet21k", downloaded checkpoint files should be renamed from ViT-B_16.npz
to imagenet21k_ViT-B_16.npz
for correct identification and compatibility.
ImageNet-21k pre-training
name | link |
---|---|
sup_vitb8_imagenet21k | https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_8.npz |
sup_vitb16_imagenet21k | https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz |
sup_vitb32_imagenet21k | https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_32.npz |
sup_vitl16_imagenet21k | https://storage.googleapis.com/vit_models/imagenet21k/ViT-L_16.npz |
sup_vitl32_imagenet21k | https://storage.googleapis.com/vit_models/imagenet21k/ViT-L_32.npz |
sup_vith14_imagenet21k | https://storage.googleapis.com/vit_models/imagenet21k/ViT-H_14.npz |
ImageNet-21k pre-training/ImageNet-1k fine-tuning
MAE(Masked Autoencoder)
Imagenet-1k pre-training
Swin(Swin Transformer)
ImageNet-1k pre-training or ImageNet-21k(also called ImageNet-22k) pre-training
After preparing the dataset and pre-trained model, execute the following command (single GPU only):
training
Run the code directly based on the yaml file:
python run.py --config-file configs/prompt/orchid_prompt_deep.yaml
Specify yaml file configuration parameters in the command:
python run.py --config-file configs/linear/orchid_linear.yaml \
MODEL.TYPE "vit" \
DATA.FEATURE "sup_vitb16_imagenet21k"
eval
You can find the best evaluation model here: Download Link(~332MB).
python run.py --eval --eval-dataset 'test' --config-file \
configs/finetune/orchid_lora.yaml \
MODEL.LORA.TUNE_KEY True MODEL.LORA.TUNE_OUT True \
MODEL.WEIGHT_PATH <path>/best_model.pth
Build diverse models by adjusting various parameters in the configuration file. All the following parameters can be modified within the respective yaml file or specified as runtime parameters while executing the training code.
- Base configuration
- _BASE_: path to the base configuration file
- OUTPUT_DIR: output directory
- SEED: random seed
- Data configuration
- DATA.NAME: name of the Python dataclass used to read image data, rather than the dataset name
- DATA.DATAPATH: directory location of the datasets
- DATA.NUMBER_CLASSES: mumber of dataset categories
- DATA.FEATURE: specify which representation to use
- DATA.CROPSIZE = 224 # or 384
- DATA.BATCH_SIZE: batch size
- Model configuration
- MODEL.TRANSFER_TYPE: fine-tuning method specification
- MODEL.TYPE: the general backbone type, e.g., "vit" or "ssl-vit"
- MODEL.MODEL_ROOT: folder with pre-trained model checkpoints
- MODEL.SAVE_CKPT: if set to True, will save model ckpts and final output of both val and test set
- MODEL.WEIGHT_PATH: path to the model weight file, usually used to test model accuracy
- Solver configuration
- SOLVER.BASE_LR: initial learning rate for the experiment
- SOLVER.WARMUP_EPOCH: number of warm-up epochs
- SOLVER.WEIGHT_DECAY: weight decay value for the experiment
Full-parameter fine-tuning(end2end)
configs/finetune/orchid_finetune.yaml
update all backbone and classification head parameters
- MODEL.TRANSFER_TYPE "end2end"
Linear
configs/linear/orchid_linear.yaml
employ a linear layer as the classification head and solely update its parameters
- MODEL.TRANSFER_TYPE "linear"
Partial-k
configs/finetune/orchid_partial1.yaml configs/finetune/orchid_partial2.yaml
fine-tune the last k layers of the backbone while freezing the others
- MODEL.TRANSFER_TYPE "partial-1"/"partial-2"/"partial-4"
Prompt
configs/prompt/orchid_prompt_deep.yaml configs/prompt/orchid_prompt_shallow.yaml
visual prompt tuning
- MODEL.TRANSFER_TYPE "prompt"
- MODEL.PROMPT.DEEP: prompt length
- MODEL.PROMPT.NUM_TOKENS: deep or shallow prompt
- MODEL.PROMPT.DROPOUT: the dropout rate to be used in the prompt
MLP-k
employ a multilayer perceptron (MLP) with k layers as the classification head, in place of a linear layer.
- MODEL.TRANSFER_TYPE "linear"
- MODEL.MLP_NUM: number of layers in the MLP excluding the classification head
Side tuning(side)
configs/linear/orchid_side.yaml
Train a "side" network(e.g., AlexNet) and linearly interpolate between pre-trained features and side features to generate a new vector which is fed into the classification head.
- MODEL.TRANSFER_TYPE "side"
Bias tuning(tinytl-bias)
configs/finetune/orchid_tinytl-bias.yaml
fine-tune only the bias modules of the backbone
- MODEL.TRANSFER_TYPE "tinytl-bias"
Adapter
configs/finetune/orchid_adapter.yaml
insert new MLP modules with residual connection inside Transformer layers
- MODEL.TRANSFER_TYPE "adapter"
- MODEL.ADAPTER.REDUCATION_FACTOR: defines the ratio between a model’s layer hidden dimension and the bottleneck dimension
LoRA
configs/finetune/orchid_lora.yaml
introduce low rank adaptation to each attention module for Vision Transformer
- MODEL.TRANSFER_TYPE "lora"
- RANK: rank of lora
- TUNE_QUERY: whether to add the key weight to lora
- TUNE_VALUE: whether to add the value weight to lora
- TUNE_KEY: whether to add the tune weight to lora
- TUNE_OUT: whether to add the projection linear layer weight to lora
We can select the desired model architecture by specifying MODEL.TYPE and opt for different variations of the model by utilizing DATA.FEATURE.
ViT-B
- MODEL.TYPE "vit"
- DATA.FEATURE "sup_vitb16_imagenet21k"
MAE
- MODEL.TYPE "ssl-vit"
- DATA.FEATURE "mae_vitb16"
Swin-B
- MODEL.TYPE "swin"
- DATA.FEATURE "swinb_imagenet22k_224"
ResNet50
- MODEL.TYPE "resnet"
- DATA.FEATURE "imagenet_sup_rn50"
- MODEL.PROMPT.LOCATION "pad"
- MODEL.PROMPT.NUM_TOKENS "5"
We utilize code derived from VPT and LoRA-ViT, and extend our gratitude to the authors for sharing their work.
If you find the Orchid2024 dataset valuable for your research, please cite it as follows:
@article{Peng2024Orchid,
title={Orchid2024: A cultivar-level dataset and methodology for fine-grained classification of Chinese Cymbidium Orchids},
author={Peng, Yingshu and Zhou, Yuxia and Zhang, Li and Fu, Hongyan and Tang, Guimei and Huang, Guolin and Li, Weidong},
journal={Plant Methods},
volume={20},
number={1},
pages={124},
year={2024},
}