Pin the Tail on the Model: Blindfolded Repair of User-Flagged Failures in Text-to-Image Services

Gefei Tan (Northwestern University), Ali Shahin Shamsabadi (Brave Software), Ellen Kolesnikova (Decatur High School), Hamed Haddadi (Brave Software and Imperial College London), Xiao Wang (Northwestern University) | Alignment, MPC

Diffusion models are increasingly deployed in real-world text-to-image services. These models, however, encode implicit assumptions about the world based on webscraped image-caption pairs used during training. Over time, such assumptions may become outdated, incorrect, or socially biased–leading to failures where the generated images misalign with users’ expectations or evolving societal norms. Identifying and fixing such failures is challenging and, thus, a valuable asset for service providers, as failures often emerge post-deployment and demand specialized expertise and resources to resolve them. In this work, we introduce SURE, the first end-to-end framework that SecUrely REpairs failures flagged by users of diffusionbased services. SURE enables the service provider to securely collaborate with an external third-party specialized in model repairing (i.e., Model Repair Institute) without compromising the confidentiality of user feedback, the service provider’s proprietary model, or the Model Repair Institute’s proprietary repairing knowledge. To achieve the best possible efficiency, we propose a co-design of a model editing algorithm with a customized two-party cryptographic protocol. Our experiments show that SURE is highly practical: SURE securely and effectively repairs all 32 layers of Stable Diffusion v1.4 in under 17 seconds (four orders of magnitude more efficient than a general baseline). Our results demonstrate that practical, secure model repair is attainable for large-scale, modern diffusion services.

View paper

Links

Ready for a better Internet?

Brave’s easy-to-use browser blocks ads by default, making the Web faster, safer, and less cluttered for people all over the world.