The Data Problems Undermining Midmarket AI Projects In 2026
Weak data foundations are slowing midmarket AI projects as they move from pilots to production.
AI systems are only as good as the data powering them. That point was reinforced at Nvidia’s GTC 2026 conference in San Jose, Calif., where founder, President and CEO Jensen Huang told attendees that “structured data is the foundation of trustworthy AI.”
For many midmarket organizations, that foundation is still not in place—shaped by data environments that are fragmented, inconsistently defined and difficult to govern.
Research firm Gartner predicts that by 2027, 60 percent of AI projects will be abandoned because organizations lack AI-ready data.
[RELATED: Analysis: How The Midmarket Can Deliver ROI With AI]
MES Computing spoke with several data and AI leaders who pinpointed where the data foundation is falling short and what needs to change for midmarket organizations to successfully scale AI.
Why Data Issues Stay Hidden Until AI Hits Production
The early stages of an AI initiative can mask data problems, said Sayali Patil, AI infrastructure reliability engineer at Cisco and Splunk, in an email conversation. She further noted that a small team, for instance, typically spends weeks resolving issues like two systems defining the same metric differently, filling gaps, and excluding edge cases before a pilot demo. “The pilot often works because someone cleaned the data before the demo, but once leadership approves the production rollout, that manual cleanup does not follow, and the data is on its own,” Patil clarified.
The next set of issues tends to come from the data itself, according to Jackie Swanson, managing partner at Gartner Consulting.
“Inconsistent data definitions across business units, siloed systems that have never been required to talk to each other, and historical data captured for reporting rather than model training are some of the factors that affect data quality for AI projects,” Swanson said in an emailed statement.
For Josh Piper, product manager at Datactics, another hindrance to AI-ready data comes from poor reconciliation of multiple data-holding systems within the midmarket.
[RELATED: Commvault Unveils New Cyber Resilience Offering For Data Lakehouses]
“Midmarket companies typically have three to five core systems that have never been properly reconciled, for example, a CRM, an ERP, and maybe a legacy platform. In a pilot, you can paper over duplicates manually, but in production, the model is making decisions based on fragmented or duplicated entity representations, and the outputs degrade in ways that are hard to diagnose because the system isn’t expecting this,” Piper said.
Poor Data Visibility Is Slowing Midmarket AI At Scale
Even when IT leaders recognize messy data issues, Swanson noted that many struggle to address them because they lack visibility into how their data is structured, moved and maintained across systems. This, for Swanson, often comes down to long-standing gaps in data management. Many organizations, she argued, cannot tell you where a data point originated, how it was transformed, or whether it is still accurate. “That is a data management debt that AI often makes visible,” she noted.
This lack of visibility, she said, becomes even more pronounced as AI initiatives go into production and, at this stage, data flows across multiple systems, pipelines evolve, and small changes upstream can affect downstream outputs with no clear signals. Without a reliable view of lineage and data movement, teams are left working with incomplete context.
[RELATED: AI Skills Gap ‘At Least 10 Times Larger Than Cybersecurity Skills Gap:’ EC Council President]
Patil echoed a similar view, pointing to the same visibility gap at the pipeline level. She explained that most teams are watching model accuracy metrics while the data is moving toward the model. By the time the issue appears in the output, it has already been building upstream. The impact, she continued, is not always immediate failure because systems continue to run; however, consistency becomes harder to maintain, which, over time, leads to varied results, outputs that are difficult to explain, and a gradual loss of confidence in the system.
Cost-Conscious Steps Midmarket IT Leaders Can Take to Fix AI Data Readiness
Data experts MES Computing spoke with offered several cost-conscious steps that midmarket IT leaders can take to improve data readiness for their AI deployments.
Stress-Test Data Pipelines Under Real Production Conditions
Patil suggested “taking the most important data source feeding the AI initiative and stress-test it deliberately, throttle it to 60 percent of its normal throughput. Add a 15-minute delay to the event stream. Corrupt a portion of a lookup table that the model joins against in a controlled environment. Then observe exactly what the system does.” This approach replaces assumptions with evidence. Rather than relying on model accuracy alone, teams can see how resilient their data pipelines are when conditions change.
Establish A Shared Data Quality Scorecard Across Teams
Launch a data quality scorecard that’s visible to the data team and AI project teams, Piper recommended. This approach pulls data quality out of the shadows and turns it into a shared metric, so it stops being a “data team problem” and becomes a collective priority. This level of transparency, he maintained, ensures everyone stays aligned on data health, making it much easier to catch issues before they tank your model’s performance.
Treat Data Readiness As An Active Part Of The AI Initiative
For Swanson, the starting point is reframing how data work is positioned. Data readiness, she noted, should sit inside the AI initiative rather than be treated as something to complete beforehand. “Auditing data quality, mapping sources and identifying gaps gives leadership something concrete while buying time to do it properly,” she clarified.
Focus Governance On The Data That Matters First
Sayali suggested that the bigger risk for midmarket leadership is trying to fix their data issues all at one go, adding that it’s a common source of wasted effort. “The place midmarket companies waste budget is attempting to run data catalog implementation, lineage tracking, access governance and observability buildout simultaneously with a live AI launch,” she said. A narrower and better approach tends to produce better results. “They identify the two or three data assets the model’s most consequential outputs depend on, define specific measurable SLAs for those assets, enforce them automatically in the pipeline, and treat everything else as the next phase.”
[RELATED: 5 Rules To Getting Started With AI Governance]
Assign Clear Ownership For Data Quality
Accountability is another area that often gets overlooked. Patil emphasized the importance of naming a single owner for effective data governance. “Name one person accountable for data quality on the AI initiative, with the authority to stop deployment if thresholds are not met,” she said. “When data quality belongs to everyone, it is prioritized by no one.”