Generating Pre-Training Data for TAPAS #174

DominikKowieski · 2023-05-15T07:21:54Z

Hello,

I am trying to redo the whole training process with German data.
I already collected data for the fine-tuning process but struggle to understand on how the pre-training data is obtained.
Based on this link (https://github.com/google-research/tapas/blob/9f2163958d1a6ffa15b9ac346eebe0a140460fb9/PRETRAIN_DATA.md) I understand one has to extract data in the proto text format and then convert it into TF examples with the "tapas/create_pretrain_examples_main.py" script.
Now I'm having difficulty understanding how this data was obtained, especially on how to fill the question keys with values.
Am I missing something? Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating Pre-Training Data for TAPAS #174

Generating Pre-Training Data for TAPAS #174

DominikKowieski commented May 15, 2023

Generating Pre-Training Data for TAPAS #174

Generating Pre-Training Data for TAPAS #174

Comments

DominikKowieski commented May 15, 2023