Seattle tech leaders launch nonprofit to push for greater transparency in AI training data

Jai Jaisimha and Rob Eleveld are co-founders of the Transparency Coalition. (GeekWire Photo / Todd Bishop)

Artificial intelligence is a powerful technology that promises to reshape the future, but it also poses many challenges and risks. One of the most pressing issues is the lack of regulation and oversight of the data used to train AI models. A new nonprofit, the Seattle-based Transparency Coalition, is aiming to address this issue.

The co-founders of the group, veteran startup founders and technology leaders Rob Eleveld and Jai Jaisimha, join us on this episode of the GeekWire Podcast to discuss their reasons for starting the organization, and their goals to help shape emerging legislation and public policy in this area.

Listen below, and continue reading for notes on the conversation.

Subscribe to GeekWire in Apple Podcasts, Spotify, or wherever you listen.

Origins and mission: The Transparency Coalition started with a literal fireside chat. Jaisimha and Eleveld discussed concerns about issues with AI transparency and unconstrained training data while camping on Whidbey Island.

They decided to found the Transparency Coalition as a nonprofit organization to address these problems through policy advocacy and public education. Their goal is to promote more ethical and responsible development of AI by increasing transparency into how models are trained, and the data used to train them.

Both have extensive experience as technology and startup leaders:

Eleveld, a former U.S. Navy submarine officer, was CEO of Ekata, an identity verification company acquired by Mastercard in 2021, after earlier leadership roles at companies including Whitepages, Optify and Shiftboard.
Jaisimha, who earned his PhD from the University of Washington in electrical and computer engineering, is a UW affiliate professor who worked at companies such as RealNetworks, Amazon, Microsoft and Medio. He founded and led the startup Appnique, which applies machine learning to mobile advertising campaigns.

“I’ve always been a fan of applying AI to constrained problems, well-thought-through data sets,” Jaisimha explained. “And I’d just become concerned about the sloppy nature of data collection practices, and overblown promises about what these algorithms could do. … The heart of it all was the inputs of the AI.”

Their focus right now is two-fold:

Influencing state-level policy and legislation through advocacy, testimony, and education of policymakers. They have been actively engaging with legislators in Washington and California.
Broad educational efforts to raise awareness and understanding of AI issues among stakeholders like policymakers, business leaders, and the general public concerned with these topics.

Potential implications: Requiring transparency around training data and how models are used could significantly change the scope of AI models. If companies need to disclose what data is used and get consent, the datasets would likely need to be more focused and constrained to avoid using copyrighted or private content without permission.

One effect would be to narrow the scope of AI applications to address specific problems. Transparency could also make the outputs more predictable and accountable since the relationship to the training data would be clear.

“If you have to license training data, it becomes part of your cost of goods,” Eleveld said. “So the projects get narrower and smaller and more focused on detecting Stage 3 pancreatic cancer [for example], as opposed to trying to answer every question ever posed by humanity. We think narrower and more focused generative AI is much better for society. It’s much more controlled. You can trace the outputs … to what the inputs or the training data was.”

Potential legislation could include:

Standard definitions of key terms like AI, training data, and transparency.
Requirements for transparency into what data is used to train models.
An audit mechanism to verify the data used to train the models.
Ensuring use of personal data and copyrighted content is opt-in rather than opt-out.

Funding: Eleveld said he and his wife are providing the initial seed funding for the Transparency Coalition. It is a 501(c)(4) nonprofit organization, which allows for more flexibility in lobbying and policy advocacy compared to a typical 501(c)(3) charity. They are now seeking grants from foundations, family offices, and others interested in influencing policy, since donations to a 501(c)(4) are not tax deductible like they would be for a 501(c)(3).

Partnerships and next steps: They are collaborating with AI research organizations like the Responsible AI Systems and Experiences group at the University of Washington to help bring forward best practices from researchers. Part of the idea is to connect policymakers with AI thinkers to help address key issues and identify solutions.

“This isn’t just some magic box,” Eleveld said about AI models. “There are inputs and outputs to it like any other system, and it should be broken down and understood at a baseline level. And if it is understood, then people start asking the right kinds of questions, and hopefully coming to some better policy positions.”

Audio editing and production by Curt Milton.

Listen above, or subscribe to GeekWire in Apple Podcasts, Spotify, or wherever you listen.

Seattle tech leaders launch nonprofit to push for greater transparency in AI training data

Most Popular on GeekWire

Job Listings on GeekWork

Related Stories

Most Popular on GeekWire

Job Listings on GeekWork