Keeping up with an industry as rapidly changing as AI is a challenge. So until AI can do it for you, here's a quick recap of recent stories in the world of machine learning, as well as notable research and experiments that we couldn't cover on our own.
By the way — TechCrunch will be launching an AI newsletter soon. stay tuned.
On the AI front, eight prominent U.S. newspapers this week, including the New York Daily News, which is owned by investment giant Alden Global Capital, the Chicago Tribune, and the Orlando Sentinel, published articles related to OpenAI and Microsoft's use of generative AI technology. A lawsuit was filed for copyright infringement. They, like the New York Times in its ongoing lawsuit against OpenAI, accuse OpenAI and Microsoft of scraping IP without permission or compensation to build and commercialize generative models such as GPT-4. ing.
“We've spent billions of dollars gathering information and reporting news in print. OpenAI and Microsoft have a big technology strategy to steal our work and build their own businesses at our expense. We can't allow it to grow,” Frank Pine said. The editor-in-chief of Alden's newspaper said in a statement:
Given OpenAI's existing partnerships with publishers and its reluctance to make its entire business model dependent on fair use arguments, the lawsuit is likely to end in a settlement and licensing agreement. Seem. But what happens to the rest of the content creators whose work is being fed to model training for free?
OpenAI seems to be thinking about that.
A recently published research paper co-authored by Boaz Barak, a scientist on OpenAI's Super Alignment team, proposes a framework that would compensate copyright holders “in proportion to their contribution to the creation of AI-generated content.” . how? Through cooperative game theory.
This framework uses a game theory concept known as the Shapley value to assess how much content (such as text, images, and other data) in the training data set influences model generation. Based on that evaluation, the content owner's “fair share” (i.e., compensation) is determined.
Suppose you have an image generation model trained using the artwork of four artists: John, Jacob, Jack, and Jebdiah. You ask Jack to draw a flower in style. Using this framework, you can determine the impact each artist's work had on the art produced by the model, and therefore the compensation each artist should receive.
However, this framework has the disadvantage of high computational cost. The researchers' workaround is based on compensation estimates rather than exact calculations. Will it satisfy content creators? I'm just not sure. We'll definitely know when OpenAI makes it practical someday.
Here are some other notable AI stories from the past few days.
Microsoft Reaffirms Facial Recognition Ban: Language added to terms of service for Azure OpenAI Service, Microsoft's fully managed wrapper around OpenAI technology, allowing integration “by or for” police departments for facial recognition in the United States. The nature of AI-native startups: AI startups face a different set of challenges than typical Software-as-a-Service companies. That was the message from Rudina Seseri, founder and managing partner of Glasswing Ventures, at the TechCrunch Early Stage event in Boston last week. Ron has the full story. Anthropic launches business plan: AI startup Anthropic is launching a new paid plan for businesses and a new iOS app. Team (Enterprise Plan) provides customers with priority access to Anthropic's Claude 3 family of generative AI models, as well as additional administrative and user management controls. CodeWhisperer is deprecated: Amazon CodeWhisperer is now Q Developer, part of Amazon's Q family of business-oriented generative AI chatbots. Q Developer, available through AWS, like CodeWhisperer, helps developers with some of the tasks they do in their daily work, such as debugging and upgrading apps. Just walk out of Sam's Club. Sam's Club, owned by Walmart, says he's using AI to speed up “exit technology.” Instead of requiring store staff to match members' purchases to receipts as they leave the store, Sam's Club customers who pay at the register or through the Scan & Go mobile app can identify their purchases without having to double check their purchases. You can now leave the store location. . Fish Harvesting, Automation: Fish harvesting is inherently laborious. Shinkei is working on improvements with automated systems that ship fish more humanely and reliably, which could result in a completely different seafood economy, Devin reports. Yelp's AI Assistant: Yelp this week announced a new AI-powered chatbot for consumers, powered by an OpenAI model. This helps consumers connect with relevant businesses for their tasks (installing light fixtures, upgrading outdoor spaces, etc.). The company is rolling out his AI assistant in the “Projects” tab of his iOS app, and later this year he plans to expand it to Android.
More machine learning
This winter, Argonne National Laboratory will host quite the party, bringing together 100 AI and energy experts to discuss how rapidly evolving technologies can benefit the nation's infrastructure and research and development in the field. Looks like it's gone. The resulting report is more or less what you would expect from that crowd, a huge pie, but still informative.
Looking at nuclear power, power grids, carbon management, energy storage, and materials, a theme that emerged from this meeting was first that researchers need access to high-performance computing tools and resources. . Second, learn how to identify weaknesses in simulations and predictions (including those made possible by the first). Third, there is a need for AI tools that can integrate and access data from multiple sources and in different formats. We've seen all of these things happen in different ways across the industry, so it's not all that surprising, but at the federal level nothing gets done unless a few ministers put out papers. Therefore, it is good to keep a record of this.
Georgia Tech and Mehta will use a large new database called OpenDAC, a trove of reactions, materials, and calculations aimed at making it easier for scientists to design carbon capture processes, to I'm working on some of it. This study focuses on organometallic frameworks, which are a promising and popular material type for carbon capture, but there are thousands of variations and have not been thoroughly tested.
The Georgia Tech team worked with Oak Ridge National Laboratory and Metasphere to simulate the quantum chemical interactions of these materials. In doing so, we spent approximately 400 million hours of computing time, far more than any university could easily muster. I hope it will be useful to climate researchers working in this area. It's all documented here.
We hear a lot about AI applications in the medical field, but most of them are in so-called advisory roles, helping experts notice things they might not otherwise have noticed, or helping engineers notice things they might not otherwise have noticed. It has helped me find patterns that would have taken me hours to find. Part of the reason is that these machine learning models only find connections between statistics without understanding what caused what caused it. Researchers from the University of Cambridge and the Ludwig-Maximilians-University of Munich are working on this question because it could be very useful in planning treatment beyond basic correlations.
The research, led by LMU Professor Stefan Feuerriegel, aims to create a model that can identify causal mechanisms, not just correlations. “We give the machine rules to recognize the causal structure and to formulate the problem correctly. Then the machine recognizes the effect of the intervention and, so to speak, the real result is reflected in the data entered into the computer. We need to learn to understand how it is reflected,” he said. It's still early days for them, and they recognize that, but they believe their work is part of an important decade-long development period.
At the University of Pennsylvania, graduate student Lo Encarnacion is taking a new angle on the field of “algorithmic justice,” which has been pioneered in the past seven or eight years (primarily by women and people of color). Her work focuses on users rather than platforms, documenting what she calls “emergency audits.”
What would users do if Tiktok or Instagram released a slightly racist filter or an image generator with amazing features? Sure, they'd complain, but they'd also use it. We continue to learn how to avoid or even exacerbate the problems encoded therein. This may not be the “solution” we think of, but it shows diversity and resilience on the user side of the equation. Users are not as vulnerable or passive as you think.