CIO.works – Page 31

HAL Reliability Evaluation

AI Agent Reliability Tracker: Evaluates 14 AI agents on 2 benchmarks, finding slight reliability improvements despite accuracy growth. Key issues include inconsistent performance, low resource consistency, and variability across models. Recommendations for enhanced evaluation include multi-run testing, targeted optimization for reliability, and differentiated standards based on use case.

https://hal.cs.princeton.edu/reliability/

Any Advice for a New CIO? : CIO

By CIO / Blog / leadership, CIO

New CIO seeks advice; nervous due to limited infrastructure/security experience and former hands-on CIO. Commenters advise focusing on strategic leadership and team support rather than technical execution, letting technical experts handle infrastructure and security while setting priorities and removing obstacles. They emphasize understanding knowledge gaps, documenting critical systems, aligning IT with business goals, building rapport with staff and other executives, and joining peer networks. The role is described as business-oriented with an emphasis on governance, communication, and direction rather than deep technical mastery.

https://www.reddit.com/r/CIO/comments/1relgn3/any_advice_for_a_new_cio/

Bullshit Benchmark Explorer

By CIO / Blog / AI

BullshitBench evaluates model responses to nonsensical questions, assessing their ability to identify and challenge invalid assumptions. A leaderboards ranks models based on their effectiveness, with Claude Sonnet 4.6 (Anthropic) scoring highest at 94.5% for clear pushback, indicating a strong capacity for detecting nonsense. Other models from various organizations follow, showcasing performance differences in reasoning capabilities across responses to absurd inquiries. An example illustrates the stark contrast between a model that correctly identifies no impact of screw type on food flavor versus another that incorrectly attributes culinary changes to a switch in screws.

https://petergpt.github.io/bullshit-benchmark/viewer/index.html

Top CIO Conferences, According to the CIOs Who Attend Them

By CIO / Blog / networking, CIO

CIOs prioritize various conferences for networking and learning, including Gartner events, Dreamforce, AWS, Microsoft Ignite, the AI Summit, and industry-specific gatherings. Preferences vary by role and strategy, with some favoring smaller forums while others attend large vendor showcases. Key insights from attendees highlight the importance of peer learning and specific events tailored to their technology partnerships and industries.

https://www.techtarget.com/searchcio/feature/Top-CIO-conferences-according-to-the-CIOs-who-attend-them

Threat Modeling AI Applications

By CIO / Blog / security controls, threats, cybersecurity, AI, risk management

The post explains how to adapt threat modeling for AI systems, which differ from traditional software in that they produce probabilistic outputs, follow instructions, and have expanded attack surfaces. It recommends explicitly defining what assets the system must protect, understanding real usage patterns, and identifying risks such as prompt injection, misuse of tools, data integrity failures, and harmful outputs. It concludes that AI threat modeling requires structured analysis early in design to assess likelihood and impact and inform architectural mitigations.

https://www.microsoft.com/en-us/security/blog/2026/02/26/threat-modeling-ai-applications/

2026 State of Software Security: Risky Debt Is Rising, But Your Strategy Starts Here

By CIO / Blog / coding, report, software development, AI

2026 State of Software Security Report: Rising security debt affects 82% of organizations, with critical vulnerabilities increasing significantly. A three-step strategy—Prioritize, Protect, Prove—addresses these risks: focus on critical flaws, integrate security in development, and provide evidence of compliance. Organizations must shift from reactive to proactive security management. Download the full report for detailed insights.

https://www.veracode.com/blog/2026-state-of-software-security-report-risky-security-debt/

How Not to Measure the ROI From AI in Your Software Organization

By CIO / Blog / productivity, AI, software development

Extreme TLDR: Measuring AI ROI in software requires understanding user diversity and context. Avoid assuming uniformity in usage, effects, or focusing solely on individual performance. Emphasize collective outcomes, account for changes over time, and prioritize thoughtful measurements based on evidence and learning culture.

https://www.fightforthehuman.com/how-not-to-measure-the-roi-from-ai-in-your-software-organization/

Ask Marcia: How Great Leaders (like You) Communicate

By CIO / Blog / leadership, communication

Effective leadership communication is built on respect, which fosters learning, alignment, and better decisions. Leaders must practice deep listening, clarity of thought, audience awareness, precision with language, emotional intelligence, inquiry and dialogue, and courage and candor. These skills are essential for navigating uncertainty, complexity, and rapid change in today’s leadership landscape.

https://www.bizjournals.com/bizwomen/news/mentoring-matters/2026/02/ask-marcia-how-great-leaders-you-communicate.html

Why Exposure Quantification Is the New Mandate for CISOs

By CIO / Blog / cybersecurity, risk management, leadership

CISOs must prioritize exposure quantification due to the evolving landscape of cybersecurity. Past views of breaches as mere IT issues are outdated; breaches now impact governance and require measurable evidence for compliance. Traditional methods fail against dynamic IT environments, necessitating continuous risk assessment. Regulators demand quantifiable security maturity, with incidents exposing critical vulnerabilities highlighting a need for better visibility. Effective exposure quantification hinges on integrating data, understanding attack paths, and communicating risks to align with business objectives. Ultimately, embedding this practice into governance will enhance trust and strategic decision-making.

https://www.frontier-enterprise.com/why-exposure-quantification-is-the-new-mandate-for-cisos/

Measuring AI Agent Autonomy in Practice Anthropic

By CIO / Blog / AI, coding, AI agent, trends, research

TLDR: This research examines AI agent autonomy, focusing on Claude Code's interactions and user behavior. It finds that Claude is increasingly autonomous, working longer without interruptions and auto-approving more frequently as users gain experience. However, experienced users also interrupt more, indicating active oversight. Most agent tasks are low-risk, mainly in software engineering, with limited high-risk applications. Recommendations include enhancing post-deployment monitoring, training AI to recognize uncertainty, and designing for effective user oversight. Overall, autonomy levels are rising amid evolving agent applications.

https://www.anthropic.com/research/measuring-agent-autonomy