> ## Documentation Index
> Fetch the complete documentation index at: https://docs.neynar.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Ingest

> Simple guide on how to ingest parquet files for Farcaster data

<Info>
  ### Ingestion code available in this [github repo](https://github.com/neynarxyz/neynar_parquet_importer), clone repo onto a server with a large disk and you should be importing in no time

  Reach out to us for credentials to try it out.
</Info>

1. Install Homebrew:

<CodeGroup>
  ```bash cURL theme={"system"}
  /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  ```
</CodeGroup>

1. Install amazon’s command line tool:

<CodeGroup>
  ```bash cURL theme={"system"}
  brew install awscli parquet-cli
  ```
</CodeGroup>

1. Configure amazon’s command line tool:

<CodeGroup>
  ```bash cURL theme={"system"}
  aws configure --profile neynar_parquet_exports
  ```
</CodeGroup>

<CodeGroup>
  ```bash cURL theme={"system"}
  AWS Access Key ID [None]: the username from your 1Password entry
  AWS Secret Access Key [None]: the password from your 1Password entry
  Default region name [None]: us-east-1
  Default output format [None]: json
  ```
</CodeGroup>

1. Set this new profile to be the default (or you can use `--profile ...` on all of your `aws` commands):

<CodeGroup>
  ```bash cURL theme={"system"}
  export AWS_PROFILE=neynar_parquet_exports
  ```
</CodeGroup>

1. List all the archive exports:

<CodeGroup>
  ```bash cURL theme={"system"}
  aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/full/
  ```
</CodeGroup>

You’ll see some output that will look something like this (the timestamps will likely be different):

<CodeGroup>
  ```bash cURL theme={"system"}
  2024-03-28 16:20:05          0
  2024-03-29 14:34:06 1877462159 farcaster-casts-0-1711678800.parquet
  2024-03-29 14:39:11   21672633 farcaster-fids-0-1711678800.parquet
  2024-03-29 14:40:07   15824832 farcaster-fnames-0-1711678800.parquet
  2024-03-29 14:50:44 2823873376 farcaster-links-0-1711678800.parquet
  2024-03-29 14:35:42 2851749377 farcaster-reactions-0-1711678800.parquet
  2024-03-29 14:35:54   22202796 farcaster-signers-0-1711678800.parquet
  2024-03-29 14:35:55   12937057 farcaster-storage-0-1711678800.parquet
  2024-03-29 14:35:57   67192450 farcaster-user_data-0-1711678800.parquet
  2024-03-29 14:35:59   72782965 farcaster-verifications-0-1711678800.parquet
  ```
</CodeGroup>

The filename format is `${DATABASE}-${TABLE}-${START_TIME}-${END_TIME}.parquet`. The timestamps bound the `updated_at` column.

You probably want to fetch the latest versions of each table the first time you build your database.

1. List all the incremental exports:

<CodeGroup>
  ```bash cURL theme={"system"}
  aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/
  ```
</CodeGroup>

<CodeGroup>
  ```bash cURL theme={"system"}
  2024-03-28 16:20:05          0
  2024-04-09 11:14:29    1011988 farcaster-casts-1712685900-1712686200.parquet
  2024-04-09 11:14:25     200515 farcaster-fids-1712685900-1712686200.parquet
  2024-04-09 11:14:25     231552 farcaster-fnames-1712685900-1712686200.parquet
  2024-04-09 11:14:30     827338 farcaster-links-1712685900-1712686200.parquet
  2024-03-29 14:35:42   51749377 farcaster-reactions-1712685900-1712686200.parquet
  2024-04-09 11:14:26       8778 farcaster-signers-1712685900-1712686200.parquet
  2024-04-09 11:14:26       6960 farcaster-storage-1712685900-1712686200.parquet
  2024-04-09 11:14:30    1012332 farcaster-user_data-1712685900-1712686200.parquet
  2024-04-09 11:14:30      10909 farcaster-verifications-1712685900-1712686200.parquet
  ```
</CodeGroup>

1. List all the files for a specific time range:

<CodeGroup>
  ```bash cURL theme={"system"}
  aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/ | grep "\-1712685900\-1712686200"
  ```
</CodeGroup>

1. Download a specific file:

<CodeGroup>
  ```bash cURL theme={"system"}
  aws s3 cp \
  	s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/farcaster-fids-1712685900-1712686200.parquet \
  	~/Downloads/farcaster-fids-1712685900-1712686200.parquet
  ```
</CodeGroup>

1. Download all the tables for a specific time range:

<CodeGroup>
  ```bash cURL theme={"system"}
  aws s3 cp s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/ ~/Downloads/ \
      --recursive \
      --exclude "*" \
      --include "*-1712685900-1712686200.parquet"
  ```
</CodeGroup>

1. Use the parquet cli:

<CodeGroup>
  ```bash cURL theme={"system"}
  parquet --help
  ```
</CodeGroup>

1. Check some data:

<CodeGroup>
  ```bash cURL theme={"system"}
  parquet head ~/Downloads/farcaster-fids-0-1711678800.parquet
  ```
</CodeGroup>

<CodeGroup>
  ```bash cURL theme={"system"}
  {"created_at": 1711832371491883, "updated_at": 1713814200213000, "custody_address": "F\u009Aè\u0091¾Vc\u0094Ô\u008Aô\u009F\ní\u0017\u0090\u009Bd\u0093«", "fid": 421819}
  {"created_at": 1711832359411772, "updated_at": 1713814200246000, "custody_address": "\u0098ªÜvÌí½Í\fiî\\\u00919\u0011S\u001Ba\u0099\u009E", "fid": 421818}
  {"created_at": 1711832371493221, "updated_at": 1713814200271000, "custody_address": "=Ï\u0099fÅ\u0084\u007FLð\b\"u\u0005\u0093\u000B\u000B\u0099µ}ã", "fid": 421820}
  {"created_at": 1711832391626517, "updated_at": 1713814200357000, "custody_address": "\u0014é\u0089PO©ÉþÓòM\u0083Ü.\u0016H\u008CMef", "fid": 421821}
  {"created_at": 1711832399774843, "updated_at": 1713814200426000, "custody_address": "o^MoÎÔÎÄêMjwÌÒlïXC\u0096°", "fid": 421822}
  {"created_at": 1711832399778591, "updated_at": 1713814200463000, "custody_address": "­D¼ãñå\u0080ÿi\u0092Z­Ì\u0093¢´\u001E¡¦$", "fid": 421823}
  {"created_at": 1711832431907945, "updated_at": 1713814200502000, "custody_address": "\u0015\u0091þ!1c\n\u008E\u0092>V\u0006ä!\u0014E\"\u0017ÄÐ", "fid": 421824}
  {"created_at": 1711832431907986, "updated_at": 1713814200608000, "custody_address": "óic\u0006!p\u0004Ý\u0005e\u001CÙ½1\u009CU¤\u0091*2", "fid": 421825}
  {"created_at": 1711832456106275, "updated_at": 1713814200903000, "custody_address": "\u00186ê¨ Âé·Ì-\u0092\u0092t¨\u0006a\u0099`\u0005\u0084", "fid": 421826}
  {"created_at": 1711832480265145, "updated_at": 1713814201318000, "custody_address": "(SÞ\u008EÏ\u009Cbû4ÛÙn\u0014+?èÑb\u0089¡", "fid": 421827}
  ```
</CodeGroup>
