How to ingest
Simple guide on how to ingest parquet files for Farcaster data
Ingestion code available in this github repo, clone repo onto a server with a large disk and you should be importing in no time
Reach out to us for credentials to try it out.
- Install Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install amazon’s command line tool:
brew install awscli parquet-cli
- Configure amazon’s command line tool:
aws configure --profile neynar_parquet_exports
AWS Access Key ID [None]: the username from your 1Password entry
AWS Secret Access Key [None]: the password from your 1Password entry
Default region name [None]: us-east-1
Default output format [None]: json
- Set this new profile to be the default (or you can use
--profile ...
on all of youraws
commands):
export AWS_PROFILE=neynar_parquet_exports
- List all the archive exports:
aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/full/
You’ll see some output that will look something like this (the timestamps will likely be different):
2024-03-28 16:20:05 0
2024-03-29 14:34:06 1877462159 farcaster-casts-0-1711678800.parquet
2024-03-29 14:39:11 21672633 farcaster-fids-0-1711678800.parquet
2024-03-29 14:40:07 15824832 farcaster-fnames-0-1711678800.parquet
2024-03-29 14:50:44 2823873376 farcaster-links-0-1711678800.parquet
2024-03-29 14:35:42 2851749377 farcaster-reactions-0-1711678800.parquet
2024-03-29 14:35:54 22202796 farcaster-signers-0-1711678800.parquet
2024-03-29 14:35:55 12937057 farcaster-storage-0-1711678800.parquet
2024-03-29 14:35:57 67192450 farcaster-user_data-0-1711678800.parquet
2024-03-29 14:35:59 72782965 farcaster-verifications-0-1711678800.parquet
The filename format is ${DATABASE}-${TABLE}-${START_TIME}-${END_TIME}.parquet
. The timestamps bound the updated_at
column.
You probably want to fetch the latest versions of each table the first time you build your database.
- List all the incremental exports:
aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/
2024-03-28 16:20:05 0
2024-04-09 11:14:29 1011988 farcaster-casts-1712685900-1712686200.parquet
2024-04-09 11:14:25 200515 farcaster-fids-1712685900-1712686200.parquet
2024-04-09 11:14:25 231552 farcaster-fnames-1712685900-1712686200.parquet
2024-04-09 11:14:30 827338 farcaster-links-1712685900-1712686200.parquet
2024-03-29 14:35:42 51749377 farcaster-reactions-1712685900-1712686200.parquet
2024-04-09 11:14:26 8778 farcaster-signers-1712685900-1712686200.parquet
2024-04-09 11:14:26 6960 farcaster-storage-1712685900-1712686200.parquet
2024-04-09 11:14:30 1012332 farcaster-user_data-1712685900-1712686200.parquet
2024-04-09 11:14:30 10909 farcaster-verifications-1712685900-1712686200.parquet
- List all the files for a specific time range:
aws s3 ls s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/ | grep "\-1712685900\-1712686200"
- Download a specific file:
aws s3 cp \
s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/farcaster-fids-1712685900-1712686200.parquet \
~/Downloads/farcaster-fids-1712685900-1712686200.parquet
- Download all the tables for a specific time range:
aws s3 cp s3://tf-premium-parquet/public-postgres/farcaster/v2/incremental/ ~/Downloads/ \
--recursive \
--exclude "*" \
--include "*-1712685900-1712686200.parquet"
- Use the parquet cli:
parquet --help
- Check some data:
parquet head ~/Downloads/farcaster-fids-0-1711678800.parquet
{"created_at": 1711832371491883, "updated_at": 1713814200213000, "custody_address": "F\u009Aè\u0091¾Vc\u0094Ô\u008Aô\u009F\ní\u0017\u0090\u009Bd\u0093«", "fid": 421819}
{"created_at": 1711832359411772, "updated_at": 1713814200246000, "custody_address": "\u0098ªÜvÌí½Í\fiî\\\u00919\u0011S\u001Ba\u0099\u009E", "fid": 421818}
{"created_at": 1711832371493221, "updated_at": 1713814200271000, "custody_address": "=Ï\u0099fÅ\u0084\u007FLð\b\"u\u0005\u0093\u000B\u000B\u0099µ}ã", "fid": 421820}
{"created_at": 1711832391626517, "updated_at": 1713814200357000, "custody_address": "\u0014é\u0089PO©ÉþÓòM\u0083Ü.\u0016H\u008CMef", "fid": 421821}
{"created_at": 1711832399774843, "updated_at": 1713814200426000, "custody_address": "o^MoÎÔÎÄêMjwÌÒlïXC\u0096°", "fid": 421822}
{"created_at": 1711832399778591, "updated_at": 1713814200463000, "custody_address": "D¼ãñå\u0080ÿi\u0092ZÌ\u0093¢´\u001E¡¦$", "fid": 421823}
{"created_at": 1711832431907945, "updated_at": 1713814200502000, "custody_address": "\u0015\u0091þ!1c\n\u008E\u0092>V\u0006ä!\u0014E\"\u0017ÄÐ", "fid": 421824}
{"created_at": 1711832431907986, "updated_at": 1713814200608000, "custody_address": "óic\u0006!p\u0004Ý\u0005e\u001CÙ½1\u009CU¤\u0091*2", "fid": 421825}
{"created_at": 1711832456106275, "updated_at": 1713814200903000, "custody_address": "\u00186ê¨ Âé·Ì-\u0092\u0092t¨\u0006a\u0099`\u0005\u0084", "fid": 421826}
{"created_at": 1711832480265145, "updated_at": 1713814201318000, "custody_address": "(SÞ\u008EÏ\u009Cbû4ÛÙn\u0014+?èÑb\u0089¡", "fid": 421827}
Updated 4 months ago