How to ingest

Simple guide on how to ingest parquet files for Farcaster data

📘

Ingestion code available in this github repo, clone repo onto a server with a large disk and you should be importing in no time

Reach out to us for credentials to try it out.

  1. Install Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  1. Install amazon’s command line tool:
brew install awscli parquet-cli
  1. Configure amazon’s command line tool:
aws configure --profile neynar_parquet_exports
AWS Access Key ID [None]: the username from your 1Password entry
AWS Secret Access Key [None]: the password from your 1Password entry
Default region name [None]: us-east-1
Default output format [None]: json
  1. Set this new profile to be the default (or you can use --profile ... on all of your aws commands):
export AWS_PROFILE=neynar_parquet_exports
  1. List all the archive exports:
aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/full/

You’ll see some output that will look something like this (the timestamps will likely be different):

2024-03-28 16:20:05          0 
2024-03-29 14:34:06 1877462159 farcaster-casts-0-1711678800.parquet
2024-03-29 14:39:11   21672633 farcaster-fids-0-1711678800.parquet
2024-03-29 14:40:07   15824832 farcaster-fnames-0-1711678800.parquet
2024-03-29 14:50:44 2823873376 farcaster-links-0-1711678800.parquet
2024-03-29 14:35:42 2851749377 farcaster-reactions-0-1711678800.parquet
2024-03-29 14:35:54   22202796 farcaster-signers-0-1711678800.parquet
2024-03-29 14:35:55   12937057 farcaster-storage-0-1711678800.parquet
2024-03-29 14:35:57   67192450 farcaster-user_data-0-1711678800.parquet
2024-03-29 14:35:59   72782965 farcaster-verifications-0-1711678800.parquet

The filename format is ${DATABASE}-${TABLE}-${START_TIME}-${END_TIME}.parquet. The timestamps bound the updated_at column.

You probably want to fetch the latest versions of each table the first time you build your database.

  1. List all the incremental exports:
aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/
2024-03-28 16:20:05          0 
2024-04-09 11:14:29    1011988 farcaster-casts-1712685900-1712686200.parquet
2024-04-09 11:14:25     200515 farcaster-fids-1712685900-1712686200.parquet
2024-04-09 11:14:25     231552 farcaster-fnames-1712685900-1712686200.parquet
2024-04-09 11:14:30     827338 farcaster-links-1712685900-1712686200.parquet
2024-03-29 14:35:42   51749377 farcaster-reactions-1712685900-1712686200.parquet
2024-04-09 11:14:26       8778 farcaster-signers-1712685900-1712686200.parquet
2024-04-09 11:14:26       6960 farcaster-storage-1712685900-1712686200.parquet
2024-04-09 11:14:30    1012332 farcaster-user_data-1712685900-1712686200.parquet
2024-04-09 11:14:30      10909 farcaster-verifications-1712685900-1712686200.parquet
  1. List all the files for a specific time range:
aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/ | grep "\-1712685900\-1712686200"
  1. Download a specific file:
aws s3 cp \
	s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/farcaster-fids-1712685900-1712686200.parquet \
	~/Downloads/farcaster-fids-1712685900-1712686200.parquet
  1. Download all the tables for a specific time range:
aws s3 cp s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/ ~/Downloads/ \
    --recursive \
    --exclude "*" \
    --include "*-1712685900-1712686200.parquet"
  1. Use the parquet cli:
parquet --help
  1. Check some data:
parquet head ~/Downloads/farcaster-fids-0-1711678800.parquet
{"created_at": 1711832371491883, "updated_at": 1713814200213000, "custody_address": "F\u009Aè\u0091¾Vc\u0094Ô\u008Aô\u009F\ní\u0017\u0090\u009Bd\u0093«", "fid": 421819}
{"created_at": 1711832359411772, "updated_at": 1713814200246000, "custody_address": "\u0098ªÜvÌí½Í\fiî\\\u00919\u0011S\u001Ba\u0099\u009E", "fid": 421818}
{"created_at": 1711832371493221, "updated_at": 1713814200271000, "custody_address": "=Ï\u0099fÅ\u0084\u007FLð\b\"u\u0005\u0093\u000B\u000B\u0099µ}ã", "fid": 421820}
{"created_at": 1711832391626517, "updated_at": 1713814200357000, "custody_address": "\u0014é\u0089PO©ÉþÓòM\u0083Ü.\u0016H\u008CMef", "fid": 421821}
{"created_at": 1711832399774843, "updated_at": 1713814200426000, "custody_address": "o^MoÎÔÎÄêMjwÌÒlïXC\u0096°", "fid": 421822}
{"created_at": 1711832399778591, "updated_at": 1713814200463000, "custody_address": "­D¼ãñå\u0080ÿi\u0092Z­Ì\u0093¢´\u001E¡¦$", "fid": 421823}
{"created_at": 1711832431907945, "updated_at": 1713814200502000, "custody_address": "\u0015\u0091þ!1c\n\u008E\u0092>V\u0006ä!\u0014E\"\u0017ÄÐ", "fid": 421824}
{"created_at": 1711832431907986, "updated_at": 1713814200608000, "custody_address": "óic\u0006!p\u0004Ý\u0005e\u001CÙ½1\u009CU¤\u0091*2", "fid": 421825}
{"created_at": 1711832456106275, "updated_at": 1713814200903000, "custody_address": "\u00186ê¨ Âé·Ì-\u0092\u0092t¨\u0006a\u0099`\u0005\u0084", "fid": 421826}
{"created_at": 1711832480265145, "updated_at": 1713814201318000, "custody_address": "(SÞ\u008EÏ\u009Cbû4ÛÙn\u0014+?èÑb\u0089¡", "fid": 421827}