Gemini turns into smaller models for iPhone

Apple’s artificial intelligence work with Google has entered a new phase. It has been reported that the company used the distillation method to produce Apple Intelligence models that are smaller than this model and can work within the device, instead of moving Google’s large Gemini model directly to the iPhone. New details have emerged in the Apple and Google agreement. Google Gemini is in a more central position in Apple’s new Siri and Apple Intelligence plans.

According to the information provided by Ars Technica from The Information, Apple is developing smaller models that can work on the iPhone using Google’s large Gemini model. In this process, Gemini is used as a great teacher model. Smaller models developed by Apple are trained to imitate the output of this larger model in certain tasks. Distillation is a method used to transfer the information in a large artificial intelligence model to a smaller model.

The grand model produces high-quality answers, intermediate study outcomes, and decision patterns on a given task. The small model is trained on these outputs and becomes capable of performing the same task with less processing power. The work Apple is doing here is based on a different technical process than porting Gemini to the iPhone as is. According to the report, Apple trains its own small models by examining some parts of Gemini or its behavior in certain tasks.

These small models can be used in Apple Intelligence features running on the device. Lower latency, less server load and a more controlled data processing structure are provided for artificial intelligence features that run locally on iPhone, iPad and Mac. Apple’s agreement with Google was previously brought to the agenda with the major renewal on the Siri side. According to information provided by Bloomberg, Apple will benefit from Google’s 1.2 trillion-parameter Gemini model for the new Siri.

It has been reported that the company will pay approximately $1 billion annually to Google, and that this structure will be used as a temporary solution until Apple prepares its own systems. The same report cited by Reuters also included that this agreement will not bring Google AI Search to Apple operating systems. New information reveals that the agreement is not limited to just adding a cloud-based Gemini layer to Siri.

Apple can use the Gemini model as a source model in its own Apple Foundation Models studies. This includes the development of smaller local models that benefit from Google’s model capacity but are offered to the end user under Apple’s signature. Apple’s current Apple Intelligence architecture already works in two layers. Simpler and more personal tasks are handled by small models within the device. For more complex demands, Private Cloud Compute comes into play.

Apple’s 2025 technical report states that the on-device model has approximately 3 billion parameters, and larger and scalable models are used on the server side. Distillation work through Gemini can expand this on-device model approach. For Apple, the on-device artificial intelligence model is a key part of its privacy strategy. Locally running models can process the user’s request without sending it to the server. This structure can be used in text editing, notification summarization, short response creation, text comprehension, in-app commands and personal context-based tasks.

For heavier operations, Apple silicon servers on the Private Cloud Compute side come into play. The large model of Google Gemini is not directly suitable for iPhone hardware. Very large models require high memory, intense processing power and strong data center infrastructure. Therefore, Apple’s separation of Gemini into miniaturized task models offers a more technically feasible way for artificial intelligence features that can run within the limits of the Neural Engine, CPU, GPU and memory on the iPhone.

Another noteworthy point in the report is the claim that Apple did not use these small models directly to produce a large Gemini competitor. According to the report from The Information, Apple’s work is currently focused on producing mini models derived from Gemini. The purpose of these models is to perform certain tasks on the device. Gemini’s presence on the cloud or server side continues for a larger and general-purpose Siri model.

Siri renewal is also at the center of this picture. Apple is working on a new version of Siri that understands more personal context, can take action between apps, and respond based on information on the user’s device. The company had previously announced these features within the scope of Apple Intelligence, but postponed the advanced Siri features. On the new Siri side, which is expected to arrive in 2026, Google Gemini-supported infrastructure and Apple’s own models will work together.

This common structure can be divided into different processing layers according to the user’s wishes. For simple and narrow tasks, the small Apple model inside the device can work. For operations requiring more complex language understanding, long-context reasoning, or larger model capacity, a Private Cloud Compute or Google Gemini-based system can be used. It will become clear with the final product announcement which tasks Apple will process entirely on the device and which on the server side.

The support Apple receives from Google is also related to the time pressure of the company in the artificial intelligence race. Siri has long been criticized for falling behind in complex and multi-step tasks compared to systems such as Google Assistant, Alexa and ChatGPT. Changes were made in artificial intelligence management on Apple’s side, and Mike Rockwell from the Vision Pro team was appointed as the head of Siri work.

The company’s use of Google Gemini while developing its own models is seen as a technical transition plan that accelerates this process. The brand side of the agreement is still unclear. It is not clear whether Apple will display the Gemini name to the user or whether it will offer the system entirely under the umbrella of Apple Intelligence. Previously, in ChatGPT integration, Apple proceeded with user consent and open service name.

It was reported that the new Gemini-supported Siri will work as a part of Apple’s own infrastructure. Integrating Google AI Search into iOS is not within the scope of this agreement. The direct result of the distillation process for iPhone users may be artificial intelligence features that work faster and are less dependent on an internet connection. However, it currently does not seem realistic for every advanced AI process to run entirely within the device.

Large language models still require high memory and processing capacity. Apple’s own Private Cloud Compute system and Google Gemini connection will come into play in scenarios where on-device model limits are exceeded. Apple’s most important difference in this area is that it designs artificial intelligence processes together with the hardware, operating system and privacy architecture. The company offers on-device models for Apple Intelligence, Apple silicon servers and the Private Cloud Compute security layer as parts of the same structure.

How small models derived from Gemini will be added to this architecture will be decisive in terms of the performance of the new Siri and the daily artificial intelligence experience on the iPhone.