One of many key challenges that organizations face when adopting the open information lakehouse is deciding on the optimum format for his or her information. Among the many out there choices, Linux Basis Delta Lake, Apache Iceberg, and Apache Hudi are all wonderful storage codecs that allow information democratization and interoperability. Any of those codecs is best than placing your information right into a proprietary format. Nevertheless, selecting a single storage format to standardize on is usually a daunting job, which may end up in resolution fatigue and worry of irreversible penalties.
Delta UniForm (brief for Delta Lake Common Format) presents a easy, simple to implement, seamless unification of desk codecs with out creating extra information copies or silos. On this weblog, we’ll cowl the next:
A number of codecs, single copy of knowledge
Delta UniForm takes benefit of the truth that Delta Lake, Iceberg, and Hudi are all constructed on Apache Parquet information information. The primary distinction among the many codecs is within the metadata layer, and even then, the variations are refined. The metadata for all three codecs serves the identical goal and comprises overlapping units of knowledge.
Previous to the discharge of Delta UniForm, the methods to change between open desk codecs had been copy- or conversion-based and solely supplied a point-in-time view of the info. In distinction, Delta UniForm solves interoperability wants extra elegantly by offering a reside view of the info for all readers, no matter format.
Beneath the hood, Delta UniForm works by mechanically producing the metadata for Iceberg and Hudi alongside Delta Lake – all towards a single copy of the Parquet information. In consequence, groups can use essentially the most appropriate device for every information workload and all function on a single information supply, with good interoperability throughout the three totally different ecosystems.

Quick setup, minimal overhead
Delta UniForm is extraordinarily simple to arrange, and as soon as it is enabled it really works seamlessly and mechanically.
To begin, let’s create a Delta UniForm desk to generate Iceberg metadata:
CREATE TABLE foremost.default.UniForm_demo_table (msg STRING)
TBLPROPERTIES('delta.universalFormat.enabledFormats' = 'iceberg');
With Delta UniForm tables, the metadata for the extra codecs is mechanically created upon desk creation and up to date each time the desk is modified. This implies there isn’t any want for guide refresh instructions or working pointless compute to translate desk codecs. For instance, let’s write a row to this desk:
INSERT INTO foremost.default.UniForm_demo_table (msg) VALUES ("whats up UniForm!");
This command triggers a Delta Lake commit, which then mechanically and asynchronously generates the Iceberg metadata for this desk. By doing this, Delta UniForm ensures information pipelines are uninterrupted, enabling seamless entry to essentially the most up-to-date data for all readers.
Delta UniForm has negligible efficiency and useful resource overhead, guaranteeing optimum utilization of computational sources. Even for petabyte-scale tables, the metadata is often a tiny fraction of the info file measurement. As well as, Delta UniForm is ready to incrementally generate metadata scoped to solely the modifications because the earlier commit.

Studying Delta UniForm as Iceberg
Delta UniForm generates Iceberg metadata in accordance with the Apache Iceberg specification, which suggests when information is written to a Delta UniForm desk, the desk could be learn as Iceberg by any consumer within the Iceberg ecosystem that adheres to the open supply Iceberg specification.
Per the Iceberg specification, reader purchasers should determine which Iceberg metadata represents the newest, newest model of the Iceberg desk. Throughout the Iceberg ecosystem, we have seen purchasers take two totally different approaches to this, each of that are supported by UniForm. We’ll clarify the variations right here after which present examples within the subsequent part.
Some Iceberg readers require customers to supply the trail to a metadata file representing the newest snapshot of the Iceberg desk. This strategy could be cumbersome for patrons because it requires customers to supply up to date metadata file paths each time the desk modifications.
Instead, the Iceberg neighborhood recommends utilizing the REST catalog API. The consumer talks to the catalog to get the newest state of the desk, permitting customers to learn the newest state of an Iceberg desk with out guide refreshes or worrying about metadata paths.
Unity Catalog now implements the open Iceberg Catalog REST API in accordance with the Apache Iceberg specification. That is aligned with Unity Catalog’s dedication to supporting open APIs, and builds on the momentum of Unity Catalog’s HMS API assist. The Unity Catalog Iceberg REST API presents open entry to UniForm tables within the Iceberg format with none fees for Databricks compute, whereas permitting interoperability and auto-refresh assist for accessing the newest information. As a byproduct, this could allow different catalogs to federate to Unity Catalog and assist Delta UniForm tables.

The Apache Iceberg consumer libraries come prepackaged with the power to interface with the Iceberg REST API Catalog – which means that any consumer that absolutely implements the Apache Iceberg commonplace and has assist for configuring catalog endpoints ought to have the ability to simply entry the Unity Catalog Iceberg REST API Catalog and retrieve the newest metadata for his or her tables. This eliminates the duty of managing desk metadata.
Within the subsequent part, we’ll stroll by examples of Delta UniForm’s assist for each the metadata path and Iceberg REST Catalog API approaches.
Instance: learn Delta Lake as Iceberg in BigQuery by supplying metadata location
When studying Iceberg in an current catalog, BigQuery requires you to supply a pointer to the JSON file representing the newest Iceberg snapshot (BigQuery documentation), like the next:
In BigQuery:
CREATE EXTERNAL TABLE myexternal-desk
WITH CONNECTION `myproject.us.myconnection`
OPTIONS (
format = 'ICEBERG',
uris = ["gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json"]
)
Delta UniForm with Unity Catalog makes it simple so that you can discover the required Iceberg metadata file path. Unity Catalog exposes quite a lot of Delta Lake desk properties, together with this path. You’ll be able to retrieve metadata location to your Delta UniForm desk by way of UI or API.
Retrieving Delta UniForm Iceberg metadata path by way of UI:
Navigate to your Delta UniForm desk within the Databricks Information Explorer, then click on on the Particulars tab. Right here, you’ll find the Delta UniForm Iceberg row containing the metadata path.
In Databricks:

Retrieving Delta UniForm Iceberg metadata location by way of API:
From a device of your selecting, submit the next GET request to retrieve your Delta UniForm desk’s Iceberg metadata location.
GET api/2.1/unity-catalog/tables/<catalog-title>.<schema-title>.<desk-title>
The delta_uniform_iceberg.metadata_location
discipline within the response comprises the metadata location for the newest Iceberg snapshot.
Merely paste the situation from both the UI or API strategies outlined above into the aforementioned BigQuery command, and BigQuery will learn the snapshot as Iceberg.
In case your desk will get up to date, you’ll have to present BigQuery with the up to date metadata location to learn the newest information. For manufacturing use circumstances, it is best to add a step in your ingestion pipeline that updates BigQuery with the newest Iceberg metadata path(s) each time you write to the Delta UniForm desk. Word that the necessity for metadata path updates is a normal limitation with this strategy, and isn’t particular to UniForm.
Instance: Learn Delta Lake as Iceberg in Trino by way of REST Catalog API
Let’s now learn the identical Delta UniForm desk we created earlier by Trino utilizing Unity Catalog’s Iceberg REST Catalog API.
Word: Uniform is just not vital for studying Delta tables with Trino as Trino instantly helps Delta tables. That is simply as an instance how Uniform additional expands the interoperability within the open supply ecosystem.
After establishing Trino, you possibly can regulate Iceberg properties by updating the and so on/catalog/iceberg.properties
file to configure Trino to make use of Unity Catalog’s Iceberg REST API Catalog endpoint:
connector.title=iceberg
iceberg.catalog.kind=relaxation
iceberg.rest-catalog.uri={UNITY_CATALOG_ICEBERG_URL}
iceberg.rest-catalog.safety=OAUTH2
iceberg.rest-catalog.oauth2.token={PERSONAL_ACCESS_TOKEN}
The place:
As soon as your properties file is configured, you possibly can run the Trino CLI and concern an Iceberg question to the Delta UniForm desk:
SELECT * FROM iceberg."foremost.default".UniForm_demo_table
Since Trino implements the Apache Iceberg REST Catalog API, we did not create any exterior desk, nor did we have to provide the trail to the newest Iceberg metadata information. Trino mechanically fetches the newest Iceberg metadata from UC after which reads the newest information within the Delta UniForm desk.
You will need to observe that, from Trino’s perspective, there may be nothing Delta UniForm-specific occurring right here. It’s studying an Iceberg desk, whose metadata has been generated to spec, and retrieving that metadata with a normal REST API name to an Iceberg catalog.
That is the simplicity of Delta UniForm. To Delta Lake writers and readers, the Delta UniForm desk is a Delta Lake desk. To Iceberg readers, the Delta UniForm desk is an Iceberg desk – all on a single set of knowledge information with out pointless copies of knowledge and tables.
Delta UniForm Affect
All through its Preview, we have already helped many purchasers speed up in direction of the open information lakehouse interoperability with Delta UniForm. Organizations can write as soon as to Delta Lake, after which entry this information any method, reaching optimum efficiency, cost-effectiveness, and information flexibility throughout numerous workloads comparable to ETL, BI, and AI – all with out the burden of expensive and sophisticated migrations.
“At Instacart, our imaginative and prescient is to have an open information lakehouse with a single copy of knowledge that’s interoperable with all compute platforms. Delta UniForm is instrumental to that purpose. With Delta UniForm, we will shortly and simply generate tables that may be learn as both Delta Lake or Iceberg, unlocking interoperability with all of the instruments in our ecosystem.”
— Doug Hyde, a Sr. Workers Software program Engineer at Instacart, shared his expertise with Delta UniForm
Databricks’ mission is to assist information groups clear up the world’s hardest issues, and that begins with with the ability to use the best device for the best job with out having to make copies of your information. We’re excited in regards to the enhancements in interoperability that Delta UniForm brings and can proceed to speculate on this space for years to return.
Delta UniForm is on the market as a part of the preview launch candidate for Delta Lake 3.0. Databricks clients can even preview Delta UniForm with Databricks Runtime model 13.2 or the Databricks SQL 2023.35 preview channel.