Meet pNFS v4.2: The Protocol That is Revolutionizing High-Performance Computing and AI

Published: September 11th, 2025

There is a growing recognition across all industry verticals that the strategic application of AI workloads can significantly enhance organizational competitiveness, particularly in managing and activating untapped value from unstructured data. IDC Research has highlighted a growing focus on advancing AI within the high-performance computing (HPC) landscape, underscoring the importance of this intersection. Although HPC and large hyperscale environments are well-suited for the intense performance needed for AI, issues arise as the most significant growth over the next 12-24 months will occur in Fortune 1000 companies that have traditional enterprise IT systems not equipped for such demanding performance standards.

In traditional enterprises, the lifecycle of unstructured data is hierarchical: new data is created and gradually decreases in significance until it is archived. Enterprise storage systems are specifically designed to accommodate this process. However, the landscape is shifting with emerging demand for AI inference, agentic AI, digital twins, and other workloads that require rapid, high-performance access to all data within an organization, regardless of its age.

The dilemma for enterprise data managers is how to accommodate these changes without retooling their entire IT environments or purchasing new, proprietary storage repositories that duplicate existing data environments, adding unnecessary complexity and costs.

It has become increasingly clear that an alternative approach is needed, one that enables organizations to break through proprietary storage vendor silos and activate their unstructured data, leveraging the infrastructure they already own.

Meet the Bridge Builder: Connecting Legacy Infrastructure with Next-Gen AI Solutions

It is increasingly evident that open and flexible standards-based architectures can bridge the gap between current infrastructures and rapidly evolving, data-intensive new technologies. A key challenge is expanding resources to enable efficient AI processing without heavy investments in specialized storage infrastructures.

One of the key tools available to help transform current infrastructure into a scalable and high-performance storage system is Parallel NFS (pNFS) v4.2. This standardized and open technology eliminates any proprietary barriers or the need for costly specialized hardware.

Included in all standard Linux distributions, the basic Network File System (NFS) was not suitable for high-performance computing tasks due to its limitations in handling computationally-intensive workloads. These limitations drove the creation of specialized and proprietary storage systems. However, whether utilizing parallel file systems like Lustre or proprietary scale-out NAS solutions, these alternatives required proprietary clients, adding additional cost and complexity. Furthermore, they often do not integrate well with existing storage systems, creating more proprietary isolated pockets of data.

The development advancements of pNFS have led to significant technical improvements, with the latest version, pNFS v4.2 with Flex Files, elevating the technology to new heights with its strong focus on openness, standardization, scalability, and remarkable performance. Significantly, there is no need for proprietary client software as pNFS v4.2 is already installed on every Linux server in every data center worldwide due to its ubiquity in all standard Linux distributions.

Breaking Through the Limits: How pNFS v4.2 and Flex Files Address Performance Challenges

Traditional NAS systems have scalability and performance limitations due to their fundamental architecture, which combines data and metadata along the same path. Additionally, such systems require data to route through proprietary controller nodes, adding additional latency and further restricting scalability, especially in performance-intensive environments. However, pNFS addresses these challenges by separating metadata from the data paths. This logical separation enables the metadata server to provide applications with a layout or a direct path to access data from any storage node, significantly reducing latency and enabling linear scalability across heterogeneous environments. The pNFSv4.2 client is already included in the Linux kernel on servers from various vendors, eliminating the need for added steps or redirections to access data on the underlying storage.

AI and machine learning workloads further stress legacy storage systems by generating millions of small file operations. These operations overwhelm traditional metadata-intensive protocols, especially when the metadata and data paths are combined. As a true parallel file protocol, pNFS v4.2 not only can parallelize I/O but also incorporates advanced metadata management techniques, including client-side caching, which dramatically reduces metadata traffic, resulting in a tangible boost to I/O performance and overall system responsiveness.

Another common constraint in conventional networked storage is the reliance on a single TCP/IP connection, which restricts simultaneous data processing. pNFS v4.2 addresses this limitation by utilizing N-Connect, a technique that enables multiple TCP sessions per mount point. Although not universally known outside of storage circles, N-Connect is gaining recognition for its ability to maximize bandwidth utilization and improve both throughput and resilience.

Flexibility is also a concern, particularly when proprietary file systems require specialized backend infrastructure, which can limit an organization’s ability to adapt or protect its existing investments. In contrast, pNFS v4.2 supports compatibility with any storage type that supports standard NFSv3, allowing seamless integration into existing NAS deployments without requiring architectural changes.

Finally, as AI pipelines grow more complex and dynamic, fixed data layouts fall short. To meet the demands of modern workloads, a more adaptable solution is needed. Flex Files, an extension of pNFS, offers the agility needed for modern workloads, provides dynamic layout capabilities, and supports advanced distribution models such as striping and mirroring. The use of pNFS v4.2 with Flex Files allows for easy adaptation of existing traditional IT infrastructure to handle the performance and scalability required for these workloads.

This isn’t just a theoretical concept; Meta has successfully implemented this architecture for Llama 2, 3, and 4 LLMs in both on-premises and cloud-based data centers using standard Linux and the included pNFSv4.2 client found in most Linux distributions. Without needing to install any proprietary client software on their application servers, and without altering their existing storage, Meta was able to linearly scale out its AI Supercluster to extreme scales, feeding tens of thousands of GPUs with data from across more than 1,000 storage nodes.

A Strategic Perspective: Leveraging pNFS to Drive Innovation Through Data

For IT decision-makers seeking to leverage future-proof, efficient, and scalable solutions for running AI and deep learning workloads, pNFS v4.2 offers new strategic possibilities. This protocol provides a powerful trifecta of high performance, openness, and cost-effectiveness, making it a critical technology for driving data-intensive innovation – particularly in situations where traditional storage solutions fall short.

Article Tags

Flex files, Linux, pNFS

About Trond Myklebust

Trond Myklebust, co-founder and CTO of Hammerspace, brings more than 20 years of experience with networked storage and Linux kernel development.

View all posts by Trond Myklebust

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_WTGVKVXEZJ	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_107693958_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
_heatmaps_g2g_101137905	10 minutes	No description
cf_7167_id	20 years	No description
cf_7167_person_last_update	session	No description
GoogleAdServingTest	session	No description
prism_252377639	1 month	No description
querylyvid	3 months	No description
xtc	1 year 1 month	No description

Meet pNFS v4.2: The Protocol That is Revolutionizing High-Performance Computing and AI

Article Tags

Subscribe to SDTimes

About Trond Myklebust

Related Articles

Red Hat Summit news: RHEL 10, OpenShift Lightspeed

Edera launches open source OpenPaX patch to improve memory safety in Linux

ITOps Times Open-Source Project of the Week: Falco

ITOps Times Open-Source Project of the Week: Art of Command Line