
S3 Storage
PRODFeature List
✓ Metadata
Requirements
OpenMetadata 1.0 or later
To deploy OpenMetadata, check the Deployment guides.
S3 Permissions
For all the buckets that we want to ingest, we need to provide the following:s3:ListBuckets3:GetObjects3:GetBucketLocations3:ListAllMyBucketsNote that theResourcesshould be all the buckets that you’d like to scan. A possible policy could be:
CloudWatch Permissions
Which is used to fetch the total size in bytes for a bucket and the total number of files. It requires:cloudwatch:GetMetricDatacloudwatch:ListMetricsThe policy would look like:
Python Requirements
To run the Athena ingestion, you will need to install:OpenMetadata Manifest
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level metadata from buckets, but in order to understand their internal structure we need users to provide anopenmetadata.json
file at the bucket root.
Supported File Formats: [ "csv", "tsv", "avro", "parquet", "json", "json.gz", "json.zip" ]
You can learn more about this here. Keep reading for an example on the shape of the manifest file.
OpenMetadata Manifest
Our manifest file is defined as a JSON Schema, and can look like this:Global Manifest
You can also manage a single manifest file to centralize the ingestion process for any container, namedopenmetadata_storage_manifest.json.
You can also keep local manifests openmetadata.json in each container, but if possible, we will always try to pick up the global manifest during the ingestion.