Wednesday, May 14, 2025

Aaand the New NiFi Champion is…


On Might 3, 2023, Cloudera kicked off a contest referred to as “Finest in Circulation” for NiFi builders to compete to construct the very best information pipelines. This weblog is to congratulate our winner and evaluation the highest submissions.  

On the verge of the discharge of NiFi 2.0, Cloudera VP of Engineering and NiFi founder Joe Witt, joined by principal committers Mark Payne and Matt Gillman, addressed the worldwide neighborhood through a digital occasion dubbed “Meet the Committers.” The crew mentioned NiFi’s origins and the journey to NiFi 2.0 in addition to important options within the upcoming launch, and surveyed the neighborhood concerning the dev/ops challenges of managing their very own nodes. As a part of the occasion, Cloudera kicked off the “Finest in Circulation” contest. The competition challenged builders to construct information pipelines that characterize their enterprise use circumstances utilizing Cloudera DataFlow. DataFlow is a cloud-native information service powered by Apache NiFi with a streamlined consumer expertise for improvement and deployment enabling true common information distribution. For the competition, Cloudera made a sandbox surroundings accessible for builders to make use of DataFlow Public Cloud. We had greater than 40 builders lively within the surroundings and plenty of high-quality contest submissions. However ultimately there might solely be one winner.

Finest in Circulation champion

So with none additional ado, our winner and the brand new Finest in Circulation Champion is:

Vince Lombardo! Vince is a Senior Infrastructure Engineer at Wells Fargo, and he developed a cybersecurity pipeline to effectively gather, course of, and make information from an asset polling instrument accessible for database ingestion. Cybersecurity is a standard area for DataFlow deployments because of the want for well timed entry to information throughout programs, instruments, and protocols. What’s fascinating about Vince’s instrument is that it cleverly makes use of “pagination” performance to constantly distribute up-to-the minute outcomes from a instrument that doesn’t all the time return a full set of outcomes immediately. For extra element on the profitable movement, take a look at Vince’s github web page right here.   

Vince’s profitable movement

Vince started by funneling information from six API endpoints from an asset polling instrument containing cybersecurity and tech ops information into two discrete information subjects. The movement he constructed differentiates between check or true API name earlier than initiating a safe log in. The sensible half comes subsequent. As a result of the polling instrument can take time to return queries, Vince added a processor to loop till the question completes, returning question standing till the question is full. Completeness is estimated by evaluating a check end result with “estimated whole.” When a close to match is detected, the information pull is triggered after which checked once more for completeness earlier than being remodeled into rows and columns and merged right into a batch for database ingestion.

Determine 1: The a part of the movement that loops till the Tanium question has accomplished

Vince’s movement met all of our standards and was the clear contest winner. This movement is full and adheres to NiFi greatest practices being each environment friendly and extremely safe. By using pagination, this dataflow ensures an entire end result set is available from an information supply with extremely variable question execution occasions. It’s deployable, has clear enterprise worth, and serves as an incredible instance of common information distribution in motion. Congratulations Vince!  

Runner up

Ramakrishna Sanikommu was our runner up. His submission publish could be discovered right here. RK constructed some easy flows to tug streaming information into Google Cloud Storage and Snowflake.  Many builders use DataFlow to filter/enrich streams and ingest into cloud information lakes and warehouses the place the flexibility to course of and route anyplace makes DataFlow very efficient.  RK constructed a number of flows shortly, first pulling a number of information sources from a Google Pub/Sub subject and merging them right into a file for ingestion into GCS. He then constructed a second movement to execute a Python script and cargo the information into Snowflake. His flows adhered to greatest practices and demonstrated some gentle transformations. RK correctly used the DataViewer as effectively to view contents of a queue.

Determine 2: Ramakrishna’s first movement consuming information from Google PubSub and ingesting it into Google Cloud Storage

 

Determine 3: Ramakrishna’s second movement studying information from Google Cloud Storage and ingesting it into Snowflake

Abstract and looking out forward

In lower than 10 years since its inception, NiFi has achieved completely large scale each by way of reputation and the dimension of deployments. NiFi’s origins, nonetheless, have been fairly easyfor any two programs to work collectively, there are fairly a couple of issues that need to agree. They need to not solely converse some widespread information language however account for myriad issues like relevance, safety, precedence, authorization, and so on. NiFi was constructed as a form of Swiss Military Knife to shortly join totally different programs and coordinate dataflows from one to a different utilizing an intuitive no-code improvement canvas.  

Since buying the corporate primarily chargeable for sustaining the NiFi code base in 2015, Cloudera has continued to pour assets into the Open Supply venture, which now boasts greater than 500 contributors throughout the globe and hundreds of lively neighborhood members in Slack. NiFi has developed significantly, staying forward of safety vulnerabilities and including connectors with releases each quarter. The “Finest in Circulation” contest was an excessive amount of enjoyable, and demonstrated the urge for food for neighborhood round Apache NiFi. Right here at Cloudera we’re excited to host future occasions for NiFi builders, so keep tuned to seek out out what’s subsequent. To check drive Cloudera DataFlow your self, click on right here to request a trial of Cloudera Knowledge Platform within the Public Cloud.  https://www.cloudera.com/marketing campaign/try-cdp-public-cloud.html 

Assets

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
3,912FollowersFollow
0SubscribersSubscribe
- Advertisement -spot_img

Latest Articles