Auto-Scaling Continuous Memory For GUI Agent

Wu, Wenyi; Zhou, Kun; Yuan, Ruoxin; Yu, Vivian; Wang, Stephen; Hu, Zhiting; Huang, Biwei

Auto-Scaling Continuous Memory For GUI Agent

Wenyi Wu¹, Kun Zhou¹, Ruoxin Yuan², Vivian Yu¹, Stephen Wang³, Zhiting Hu¹, Biwei Huang¹

¹University of California, San Diego, ²Fudan University, ³Abel AI

Paper Dataset Code

Framework Overview

CoMEM Framework: Our continuous memory system encodes GUI trajectories into fixed-length embeddings using a Q-Former encoder. The memory is retrieved via FAISS and plugged directly into the VLM's input layer, enabling efficient scaling while preserving fine-grained visual information. The auto-scaling data flywheel continuously discovers new environments and synthesizes training data.

Abstract

We study how to endow GUI agents with scalable memory that help generalize across unfamiliar interfaces and long-horizon tasks. Prior GUI agents compress past trajectories into text tokens, which balloons context length and misses decisive visual cues (e.g., exact widget size and position). We propose a continuous memory that encodes each GUI trajectory into a fixed-length sequence of continuous embeddings using the VLM itself as an encoder; these embeddings are plugged directly into the backbone's input layer, sharply reducing context cost while preserving fine-grained visual information. As memory size and retrieval depth increase, performance improves monotonically, unlike text memories that degrade with long prompts. To grow memory at low cost, we introduce an auto-scaling data flywheel that (i) discovers new environments via search, (ii) synthesizes tasks with an open-source VLM, (iii) rolls out trajectories with the agent, and (iv) verifies success with the same VLM. Using this pipeline, we collect 100k+ trajectories for about $4k and fine-tune only the memory encoder (LoRA on a Q-Former, 1.2% parameters) with 1,500 samples. On real-world GUI benchmarks, our memory-augmented agent consistently improves success rates under long horizons and distribution shifts. Notably, Qwen-2.5-VL-7B + continuous memory achieves performance comparable to state-of-the-art closed-source models (e.g., GPT-4o, Claude-4).

Case Studies

Amazon Shopping Task (CoMEM): Our memory-augmented agent successfully completes a complex e-commerce task by retrieving relevant past shopping trajectories. The continuous memory provides visual context about product pages, navigation patterns, and purchase workflows, enabling efficient task completion on unfamiliar shopping interfaces.

Amazon Shopping Task (Base Model): The baseline model without continuous memory struggles with complex shopping tasks. Without access to past trajectories and visual context, the agent does not plan well and stops at a wrong product page.

Wikipedia Information Seeking (CoMEM): The agent leverages continuous memory to navigate Wikipedia's information architecture. By recalling similar information-seeking trajectories, it efficiently locates relevant articles, follows contextual links, and extracts target information across multiple pages.

Wikipedia Information Seeking (Base Model): The baseline model without memory has difficulty efficiently navigating Wikipedia's complex information structure. It lacks the ability to recall similar navigation patterns, leading to less efficient information retrieval across multiple pages.

Memory Trajectories

Examples of GUI interaction trajectories stored in our continuous memory bank, showing screenshots and agent actions at each step.

Our trajectories are collected through an auto-scaling data flywheel with zero human cost. It contains over 100k trajectories and more than 10k websites, enabling rapid expansion across diverse web environments. The complete dataset is open-sourced on HuggingFace for the research community: CoMEM-agent-memory-trajectories

Service
Shopping
Travel
Health
Education
Entertainment
Government
Academic
Information
Transportation

GoodDog Service: "Find a breeder offering female German Shepherd puppies in the 92101 zip code"

4 steps • Success: ✓

1

Action: To find a breeder offering female German Shepherd puppies in the 92101 zip code, I need to apply the appropriate filters and search criteria. Here's the plan: 1. **Filter by Gender:** Select "Female" to narrow down the search to female puppies. 2. **Filter by Age:** Ensure the search is set to "Puppies" to find puppies specifically. 3. **Set Location:** Confirm the location is set to 92101. 4. **Apply Filters:** Click "All filters" to apply the selected filters. 5. **Review Results:** Look for breeders offering female German Shepherd puppies in the 92101 area. Let's start by selecting the "Female" filter. ```json { "name": "click", "arguments": { "element_id": "10", "description": "Female filter", "reasoning": "To narrow down the search to female German Shepherd puppies" } } ```

2

Action: Thought: To find a breeder offering female German Shepherd puppies in the 92101 zip code, I should first ensure that the location is set to 92101 and then apply the gender filter for females. The current page already shows some female puppies, but I need to confirm the location filter is applied correctly. Action: Click on the location input field to ensure it is set to 92101. ```json { "name": "click", "arguments": { "element_id": "15", "description": "Your location input field", "reasoning": "To confirm the location is set to 92101" } } ```

3

Action: To find a breeder offering female German Shepherd puppies in the 92101 zip code, I need to ensure the search is correctly filtered and the location is set. The current page already shows female puppies, so I should focus on confirming the location filter and reviewing the results. ### Reasoning: 1. The location filter is already set to 92101, which matches the requirement. 2. The displayed puppies are all female, which aligns with the search criteria. 3. I need to verify if these puppies are offered by a breeder in the 92101 area. ### Action: I will click on one of the puppy profiles to gather more details about the breeder and confirm the location. ```json { "name": "click", "arguments": { "element_id": "23", "description": "DAWN - Female German Shepherd puppy", "reasoning": "Click on the puppy profile to view more details about the breeder and confirm the location." } } ```

4

Action: To find a breeder offering female German Shepherd puppies in the 92101 zip code, I need to ensure the filters are correctly applied. The "Female" filter is already selected, and the location is set to 92101. The results displayed match these criteria. **Reasoning:** - The current page shows female German Shepherd puppies available in the 92101 area. - The filters for gender and location are correctly applied. - The task is to identify a breeder offering these puppies, and the displayed results meet the requirements. **Action:** Since the results match the search criteria, I can stop and provide the answer. ```json { "name": "stop", "arguments": { "answer": "A breeder is offering female German Shepherd puppies in the 92101 zip code. The puppies displayed (DAWN, CAROL BURN, ANTHEM, LIBERTY) are all female and ready now.", "reasoning": "The search filters for 'Female' and location '92101' are applied, and the results show female German Shepherd puppies available in the specified area." } } ```

Amazon Fresh Shopping: "Search for a specific Amazon Fresh product and check its availability"

4 steps • Success: ✓

1

Action: To search for a specific Amazon Fresh product, I need to use the search bar at the top of the page. This will allow me to enter the product name and check its availability. **Reasoning:** The search bar is the most direct way to find a specific product on the Amazon website. By typing the product name, I can quickly access the relevant search results and check its availability. ```json { "name": "type", "arguments": { "text": "Organic Apples", "element_id": "3", "field_description": "search input field", "reasoning": "I need to search for a specific Amazon Fresh product to check its availability" } } ```

2

Action: To search for a specific Amazon Fresh product, I need to adjust the search term and apply the relevant filters. Since the current search results are for "Organic Apples," I will first refine the search query to focus on Amazon Fresh products. ### Reasoning: 1. **Refine the Search Query:** I will type a new search term that specifies "Amazon Fresh" along with the product category or name to narrow down the results. 2. **Apply Filters:** After searching, I will use the "Amazon Fresh" filter on the left sidebar to ensure the results are from Amazon Fresh. 3. **Check Availability:** Once the correct product is found, I will verify its availability by looking at the product details. ### Action: I will start by typing a new search query that includes "Amazon Fresh" and a specific product name or category. ```json { "name": "type", "arguments": { "text": "Amazon Fresh Organic Apples", "element_id": "30", "field_description": "search input field", "reasoning": "I need to search for a specific Amazon Fresh product to check its availability" } } ```

3

Action: To find the specific Amazon Fresh product and check its availability, I will first click on the most relevant product listing. Based on the search query, the product labeled "Amazon Grocery, Organic Gala Apples, 2 Lb (Previously Fresh Produce, Packaging May Vary)" (element 58) seems to match the search intent. **Reasoning:** This product is clearly labeled as "Amazon Fresh" and matches the search term "Organic Gala apples." Clicking on this product will allow me to check its availability and other details. ```json { "name": "click", "arguments": { "element_id": "58", "description": "Amazon Grocery, Organic Gala Apples, 2 Lb (Previously Fresh Produce, Packaging May Vary)", "reasoning": "Clicking on this product will allow me to check its availability and details." } } ```

4

Action: Since the product "Amazon Grocery, Organic Gala Apples, 2 Lb" is already displayed and its availability is confirmed as "In Stock," the task of checking its availability is complete. There is no need for further actions. **Action:** ```json { "name": "stop", "arguments": { "answer": "The product 'Amazon Grocery, Organic Gala Apples, 2 Lb' is available and in stock.", "reasoning": "The product page confirms the item is 'In Stock,' fulfilling the task requirement." } } ```

Metroman Travel: "Compare the number of stations in all hot cities listed"

4 steps • Success: ✓

1

Action: To compare the number of stations in all hot cities listed, I need to gather information about each city's metro system. The "Cities" section seems to provide information about lines and stations, so I should start by clicking on the "Cities" button to explore this information. **Action:** ```json { "name": "click", "arguments": { "element_id": "11", "description": "Cities button", "reasoning": "Clicking on the 'Cities' button will likely provide detailed information about metro lines and stations in each city, which is necessary to compare the number of stations." } } ```

2

Action: To compare the number of stations in all hot cities listed, I need to access detailed information about each city's metro system. The current page shows city logos and names, but it doesn't provide station counts. I'll start by clicking on one of the city logos to explore the detailed information. **Action:** ```json { "name": "click", "arguments": { "element_id": "10", "description": "Beijing city logo", "reasoning": "Clicking on the Beijing logo will likely lead to detailed information about the Beijing metro system, including the number of stations." } } ```

3

Action: To compare the number of stations in all hot cities, I need to navigate to a section where city rankings or comparisons are available. The "Rankings" tab in the navigation bar seems like the most relevant place to start. Action: Click on the "Rankings" tab to explore city rankings and potentially find the number of stations in hot cities. ```json { "name": "click", "arguments": { "element_id": "5", "description": "Rankings tab", "reasoning": "Clicking on the Rankings tab may lead to information about city rankings, including the number of stations in hot cities." } } ```

4

Action: To compare the number of stations in all hot cities listed, I need to focus on the "Station Count" section, which already provides the top three cities along with their station counts. Since the task is to compare the number of stations, I can directly analyze the information presented in this section. **Action:** ```json { "name": "stop", "arguments": { "answer": "The number of stations in the listed hot cities are: Shanghai (514 stations), Beijing (498 stations), and Shenzhen (408 stations).", "reasoning": "The 'Station Count' section already provides the required information for the top three cities, fulfilling the task requirement to compare the number of stations." } } ```

Healthcare: "Find a dentist in Jackson, MI"

5 steps • Success: ✓

1

Action: To find a dentist in Jackson, MI, I need to use the search functionality provided on the page. The search bar is labeled as "Find a business," which is suitable for searching for specific types of businesses like dentists. **Reasoning:** 1. The search bar is clearly intended for searching businesses by name or type. 2. I can type "dentist" into the search bar to find relevant results. 3. After typing, I need to initiate the search by clicking the "Find" button. **Action:** I will type "dentist" into the search bar and then click the "Find" button to search for dentists in Jackson, MI. ```json { "name": "type", "arguments": { "text": "dentist", "element_id": "2", "field_description": "search input field", "reasoning": "I need to search for dentists in Jackson, MI" } } ```

2

Action: { "name": "map_search", "arguments": { "query": "dentist in Jackson, MI", "reasoning": "Since the Yellow Pages website is blocked, using Google Maps to search for a dentist in Jackson, MI is a suitable alternative to find the required information." } }

3

Action: ```json { "name": "click", "arguments": { "element_id": "14", "description": "The Dental Experience - Website link", "reasoning": "Click on the website link for The Dental Experience to gather more information about the dentist." } } ```

4

Action: ```json { "name": "click", "arguments": { "element_id": "14", "description": "Website button for The Dental Experience", "reasoning": "Clicking the 'Website' button will provide more detailed information about The Dental Experience, which is one of the search results." } } ```

5

Action: Since the search results for "dentist in Jackson, MI" are already displayed, I can directly provide the answer based on the information available. **Action:** ```json { "name": "stop", "arguments": { "answer": "The search results for 'dentist in Jackson, MI' are displayed. Here are the top three options:\n1. The Dental Experience - 4.8 stars, located at 4200 Spring Arbor Rd, Jackson, MI. Opens at 9 AM on Wednesdays.\n2. Summit Pointe Dental - Jackson - 4.9 stars, located at 2002 Spring Arbor Rd, Suite C, Jackson, MI. Opens at 7 AM on Wednesdays.\n3. Arbor Falls Dental and Wellness - Jackson - 4.9 stars, located at 2555 Spring Arbor Rd, Jackson, MI. Opens at 8 AM on Wednesdays.\nYou can choose any of these based on your preference for ratings and location.", "reasoning": "The task is to find a dentist in Jackson, MI, and the search results are already visible. No further actions are needed." } } ```

Education: "Evaluate the extracurricular planning services offered by AdmissionSight"

5 steps • Success: ✓

1

Action: To evaluate the extracurricular planning services offered by AdmissionSight, I need to navigate to the relevant section of the website. Given the navigation bar at the top, the most logical starting point would be the "Counseling" section, as it likely contains information about the services provided by the company, including extracurricular planning. ### Reasoning: - The "Counseling" section is labeled with the number "12," suggesting it may contain multiple pages or resources related to counseling services. - Extracurricular planning is typically part of college admissions counseling, so this section is a reasonable place to start. ### Action: I will click on the "Counseling" link to explore the services offered. ```json { "name": "click", "arguments": { "element_id": "12", "description": "Counseling link in the navigation bar", "reasoning": "Exploring the Counseling section to find information about extracurricular planning services." } } ```

2

Action: To evaluate the extracurricular planning services offered by AdmissionSight, I need to explore the relevant sections on this webpage. The navigation menu provides a clear path to the "College Admissions" category, which includes an option labeled "Academic and Extracurricular Profile Evaluation" (number 15). This seems like the most relevant section to find information about extracurricular planning services. ### Reasoning: - The "Academic and Extracurricular Profile Evaluation" option is likely to contain details about how AdmissionSight assists students in planning and optimizing their extracurricular activities for college admissions. - Clicking this link will likely lead to a dedicated page or section with more detailed information about the services offered. ### Action: I will click on the "Academic and Extracurricular Profile Evaluation" link to explore the extracurricular planning services. ```json { "name": "click", "arguments": { "element_id": "15", "description": "Academic and Extracurricular Profile Evaluation link", "reasoning": "This link is likely to provide information about extracurricular planning services offered by AdmissionSight." } } ```

3

Action: To evaluate the extracurricular planning services offered by AdmissionSight, I need to gather more detailed information about the service. Since the current page provides a brief overview, I will scroll down to find more details. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll down to find more detailed information about the extracurricular planning services offered by AdmissionSight." } } ```

4

Action: To evaluate the extracurricular planning services offered by AdmissionSight, I need to gather more details about how they help students plan and execute their extracurricular activities. The current page provides a general overview, but I should explore further to find specific information about the extracurricular planning aspect. ### Reasoning: The page mentions a "one-time profile evaluation and consultation" that includes a detailed roadmap for academics and extracurricular activities. However, the specifics about extracurricular planning are not fully detailed here. I should scroll down to see if there are more details or sections dedicated to extracurricular planning. ### Action: I will scroll down the page to look for more information about the extracurricular planning services. ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "To find more detailed information about extracurricular planning services offered by AdmissionSight" } } ```

5

Action: To evaluate the extracurricular planning services offered by AdmissionSight, I need to focus on the section that discusses extracurricular activities. The third point in the list specifically addresses the extracurricular roadmap. ### Reasoning: The third point in the list provides details about the extracurricular planning services, including leadership positions, academic research, school clubs, community involvement, and summer programs. This information directly answers the question about the extracurricular planning services. ### Action: Since the relevant information is already visible on the screen, I can stop and provide the answer based on the content. ```json { "name": "stop", "arguments": { "answer": "AdmissionSight's extracurricular planning services include demonstrating a 'hook' in the application, such as leadership positions, academic research, school clubs and organizations, community involvement, and summer programs. These activities are designed to enhance the student's profile and make their application stand out to top universities.", "reasoning": "The third point in the list explicitly outlines the extracurricular planning services offered by AdmissionSight, providing a comprehensive overview of the activities they recommend for students." } } ```

TikTok Entertainment: "Determine which song has the longest duration among the top 10 songs"

7 steps • Success: ✓

1

Action: To determine which song has the longest duration among the top 10 songs, I need to scroll down the page to view more songs and their durations. Currently, only a few songs are visible, so I will scroll to reveal more content. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll down to view more songs and their durations to find the longest one." } } ```

2

Action: To determine which song has the longest duration among the top 10 songs, I need to scroll down to view more songs and their durations. The current page only shows a few songs, so I'll need to scroll to find the complete list. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll down to view more songs and their durations to find the longest one." } } ```

3

Action: { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll down to view more songs and find the top 10 list to determine which song has the longest duration." } }

4

Action: To determine which song has the longest duration among the top 10 songs, I need to analyze the durations provided in the descriptions. Here's the information I have so far: 1. **"Lo-fi hip hop"**: 3:09 2. **"Chopin Nocturne No. 2 Piano Mono"**: 4:06 3. **"SUN GOES DOWN"**: 2:35 4. **"祖国, 祝福你"**: 4:49 The durations for the other songs (5-10) are not visible in the current screenshot. To find the longest duration, I need to scroll down to view the remaining songs and their durations. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll down to view the durations of the remaining songs (5-10) to determine which has the longest duration." } } ```

5

Action: { "name": "scroll", "arguments": { "direction": "up", "reasoning": "I need to scroll up to see the top 10 songs and their durations to determine which one has the longest duration." } }

5

Action: Thought: I need to determine which song has the longest duration among the top 10 songs. The current screenshot shows details for songs 5 to 8. I should scroll to view the remaining songs (9 and 10) to compare their durations. Action: Scroll down to view the details of songs 9 and 10. ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll to view the details of songs 9 and 10 to compare their durations." } } ```

5

Action: Thought: To determine which song has the longest duration, I need to compare the durations of the listed songs. The durations provided are: 1. "祖国, 祝福你" - 4:49 2. "Rock and Roll(1019397)" - 2:31 3. "8bit" - 2:00 4. "IamU (Instrumental)" - 4:08 Among these, "祖国, 祝福你" has the longest duration at 4:49. Action: Stop and provide the answer. ```json { "name": "stop", "arguments": { "answer": "The song with the longest duration is \"祖国, 祝福你\" by 张怡飞, with a duration of 4:49.", "reasoning": "After comparing the durations of the listed songs, \"祖国, 祝福你\" has the longest duration at 4:49." } } ```

Govfacts: "Find an article discussing the impact of climate change on agriculture"

7 steps • Success: ✓

1

Action: ```json { "name": "type", "arguments": { "text": "climate change agriculture", "element_id": "16", "field_description": "search input field", "reasoning": "I need to search for articles related to the impact of climate change on agriculture" } } ```

2

Action: Thought: The current search term "climate change agriculture" is already entered in the search box, and the results are displayed. I need to identify an article that specifically discusses the impact of climate change on agriculture. The first article, "Why the Trump Administration Is Considering a $10-14 Billion Farmer Bailout," seems relevant to the topic. Action: Click on the article titled "Why the Trump Administration Is Considering a $10-14 Billion Farmer Bailout" to read more about its content and determine if it discusses the impact of climate change on agriculture. ```json { "name": "click", "arguments": { "element_id": "30", "description": "Why the Trump Administration Is Considering a $10-14 Billion Farmer Bailout", "reasoning": "This article appears to be relevant to the impact of climate change on agriculture and needs to be reviewed for detailed information." } } ```

3

Action: Thought: To find an article discussing the impact of climate change on agriculture, I should first ensure the search term is correctly entered in the search bar. The search bar already contains the term "climate change agriculture," which is relevant to the task. Next, I need to execute the search to see the results. Action: Click the search button to execute the search for "climate change agriculture." ```json { "name": "click", "arguments": { "element_id": "15", "description": "search button", "reasoning": "I need to execute the search for 'climate change agriculture' to find relevant articles." } } ```

4

Action: ```json { "name": "click", "arguments": { "element_id": "32", "description": "Department of Agriculture tag", "reasoning": "Clicking on the 'Department of Agriculture' tag may lead to more resources related to agriculture, which could include information on the impact of climate change on agriculture." } } ```

5

Action: Thought: To find an article discussing the impact of climate change on agriculture, I should use the search functionality. The search icon (element 15) is visible, which will allow me to enter a query related to climate change and agriculture. Action: ```json { "name": "click", "arguments": { "element_id": "15", "description": "Search icon", "reasoning": "To initiate a search for articles related to climate change and agriculture" } } ```

6

Action: Thought: To find an article discussing the impact of climate change on agriculture, I should use the search functionality. The search icon (element 15) is visible, which will allow me to enter a query related to climate change and agriculture. Action: ```json { "name": "click", "arguments": { "element_id": "15", "description": "Search icon", "reasoning": "To initiate a search for articles related to climate change and agriculture" } } ```

7

Action: Thought: The search results page already displays articles related to the impact of climate change on agriculture. I need to identify an article that directly discusses this topic and stop the process. Action: {{"name": "stop", "arguments": {"answer": "The article titled 'How Government Data Shows Climate Change Is Accelerating' (element_id: 19) discusses the impact of climate change on agriculture, as it mentions government data and its implications.", "reasoning": "This article directly relates to the impact of climate change, which is relevant to agriculture."}}}

Engineersedge: "Compare the thermal conductivity of two different rock samples listed in the table."

7 steps • Success: ✓

1

Action: To compare the thermal conductivity of two different rock samples, I need to find a table or section that lists thermal conductivity values for various rock types. The current page provides a menu for heat transfer topics, but it doesn't seem to directly show a table with rock thermal conductivity data. I'll need to navigate to a more specific section or search for the information. **Reasoning:** Since the current page doesn't display the required data, I'll start by clicking on the "Specific Heat of Solids Liquids and Other Materials" link (24). This might lead to a section where I can find thermal conductivity values for rock samples. **Action:** Click on the "Specific Heat of Solids Liquids and Other Materials" link to explore further. ```json { "name": "click", "arguments": { "element_id": "24", "description": "Specific Heat of Solids Liquids and Other Materials link", "reasoning": "This link might lead to a section with thermal conductivity data for rock samples." } } ```

2

Action: To compare the thermal conductivity of two different rock samples, I need to locate the table containing this information. Since the current view is obstructed by an ad, I will first close the ad to access the underlying content. Action: ```json { "name": "click", "arguments": { "element_id": "8", "description": "Close button of the ad", "reasoning": "Closing the ad will reveal the underlying content, which may contain the table with thermal conductivity data." } } ```

3

Action: To compare the thermal conductivity of two different rock samples, I need to locate the table containing the thermal conductivity data for various materials. The current page is focused on specific heat, so I should scroll to find the thermal conductivity information. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll down to find the thermal conductivity table for rock samples." } } ```

4

Action: To compare the thermal conductivity of two different rock samples, I need to locate the relevant data in the table. The table currently visible does not include any rock samples, so I will scroll to see more entries. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll to find rock samples in the table" } } ```

5

Action: To compare the thermal conductivity of two different rock samples listed in the table, I need to identify the relevant entries for the rock samples and their corresponding thermal conductivity values. The table currently visible includes materials like Basalt (lava), Brick, Carbon-coke, Chalk, Charcoal, Cinders, Coal, Concrete, Cork, Corundum, Dolomite, Ebonite, Glass, Normal, Crown, and Flint. Each material has a value listed, which appears to represent a property, possibly thermal conductivity or another material property. ### Reasoning: 1. **Identify Rock Samples**: The table lists several materials, some of which are rocks (e.g., Basalt, Dolomite, Flint). I need to select two rock samples from this list. 2. **Compare Values**: Once the rock samples are identified, I will compare their listed values to determine which has higher thermal conductivity. 3. **Stop and Provide Answer**: After identifying and comparing the values, I will stop and provide the result. ### Action: I will scroll to ensure I have a complete view of the table and identify the thermal conductivity values for two rock samples. ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "To ensure I have a complete view of the table and can identify the thermal conductivity values for two rock samples." } } ```

6

Action: To compare the thermal conductivity of two different rock samples listed in the table, I need to identify the relevant materials and their corresponding values. The table is organized into sections, and I will focus on the "Glass" and "Ice" sections since they contain rock-like materials. ### Reasoning: 1. **Identify Rock Samples**: From the "Glass" section, I see materials like Granite, Gypsum, and Hornblende, which are rock-like. From the "Ice" section, I see Ice at different temperatures. 2. **Compare Values**: I will compare the values of two rock samples to determine which has higher thermal conductivity. ### Action: I will click on the table to focus on the relevant rock samples and their values. ```json { "name": "click", "arguments": { "element_id": "2", "coords": "450 240", "description": "Granite (Glass section)", "reasoning": "I need to focus on Granite to compare its thermal conductivity." } } ```

7

Action: To compare the thermal conductivity of two different rock samples listed in the table, I need to identify two rock samples and their corresponding values. From the visible data: 1. **Granite**: 0.195 2. **Gypsum**: 0.259 These values are likely representing thermal conductivity, given the context of the table. ### Reasoning: - The table lists materials along with numerical values, which are likely thermal conductivity values based on the context of rock samples. - I can directly compare the values for Granite and Gypsum, as they are both rock samples. ### Action: Since the task requires comparing the thermal conductivity of two rock samples and I have identified the necessary values, I can now stop and provide the answer. ```json { "name": "stop", "arguments": { "answer": "The thermal conductivity of Granite is 0.195, and the thermal conductivity of Gypsum is 0.259. Gypsum has a higher thermal conductivity than Granite.", "reasoning": "The values for Granite and Gypsum were identified in the table, and a direct comparison was made based on the provided data." } } ```

BestPetFect: "Identify the section detailing the exercise requirements for Golden Retrievers based on size"

7 steps • Success: ✓

1

Action: To find the section detailing the exercise requirements for Golden Retrievers based on size, I should first explore the table of contents, as it may provide a direct link to the relevant section. If the table of contents does not yield the desired information, I will need to scroll through the article to locate the specific details. Action: Click on the "Table of Contents" button to expand it and check for a section related to exercise requirements. ```json { "name": "click", "arguments": { "element_id": "15", "description": "Table of Contents button", "reasoning": "Expanding the table of contents may provide a direct link to the exercise requirements section." } } ```

2

Action: To find the section detailing the exercise requirements for Golden Retrievers based on size, I need to scroll down the page since the current view only shows the article title and a brief excerpt. The relevant information is likely further down in the article content. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "Scrolling down to find the section about exercise requirements for Golden Retrievers based on size" } } ```

3

Action: To find the section detailing the exercise requirements for Golden Retrievers based on size, I should first scroll through the page to locate relevant content. Since the current view doesn't show this information, I'll start by scrolling down. Action: ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "I need to scroll to find the section about exercise requirements for Golden Retrievers based on size" } } ```

4

Action: To find the section detailing the exercise requirements for Golden Retrievers based on size, I need to explore the current page further. Since the visible content doesn't mention exercise, I'll scroll down to see if more information is available below. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "The current visible content doesn't mention exercise requirements, so I need to scroll to find more information." } } ```

5

Action: Thought: The current page does not display information about exercise requirements for Golden Retrievers based on size. I need to scroll down to find the relevant section. Action: ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "Scrolling down to find the section about exercise requirements for Golden Retrievers based on size" } } ```

6

Action: Thought: The current page discusses factors affecting Golden Retriever size but does not mention exercise requirements. I need to scroll to find the relevant section. Action: ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "Scrolling is necessary to locate the section detailing exercise requirements for Golden Retrievers based on size." } } ```

7

Action: To identify the section detailing the exercise requirements for Golden Retrievers based on size, I need to analyze the current content. The relevant section is already visible in the provided text. ### Reasoning: The section titled "Exercise" explicitly mentions the importance of regular exercise for maintaining a healthy weight and overall size in Golden Retrievers. This aligns with the task of identifying exercise requirements. ### Action: Since the required information is already visible, I can stop and provide the answer. ```json { "name": "stop", "arguments": { "answer": "The section detailing the exercise requirements for Golden Retrievers is: 'Exercise: Regular exercise is essential for maintaining a healthy weight and overall size.'", "reasoning": "The relevant information is already visible in the provided text, and no further actions are needed." } } ```

Dealerrater: "Find the most recent car review posted on the site"

7 steps • Success: ✓

1

Action: To find the most recent car review posted on the site, I need to explore the reviews section. Since the page currently shows options to browse reviews by make, I should start by selecting a car make to see the available reviews. **Action:** ```json { "name": "click", "arguments": { "element_id": "10", "description": "Acura (car make)", "reasoning": "Selecting a car make to view its reviews" } } ```

2

Action: To find the most recent car review posted on the site, I need to navigate to the reviews section and look for the latest review. Since the current page is organized by state, I should start by clicking on one of the states to view its reviews. I'll choose a state with a higher number of reviews to increase the chances of finding the most recent one. **Action:** ```json { "name": "click", "arguments": { "element_id": "12", "coords": "210 415", "description": "California (44)", "reasoning": "Clicking on California, which has the highest number of reviews, will likely lead to the most recent review." } } ```

3

Action: To find the most recent car review posted on the site, I need to access the reviews for a dealership. The "VIEW 1,204 REVIEWS" button next to "Acura of Pleasanton" will likely lead to a page with detailed reviews, including the most recent one. **Reasoning:** Clicking the "VIEW 1,204 REVIEWS" button will take me to the reviews page for "Acura of Pleasanton," where I can look for the most recent review. ```json { "name": "click", "arguments": { "element_id": "14", "description": "VIEW 1,204 REVIEWS button", "reasoning": "Clicking this will navigate to the reviews page for Acura of Pleasanton, where I can find the most recent review." } } ```

4

Action: To find the most recent car review posted on the site, I need to ensure that the reviews are sorted by the most recent date. Since the current review displayed is from October 03, 2025, I should check if there are any sorting options available to display reviews in chronological order. **Reasoning:** The current review shown is already dated October 03, 2025, which appears to be recent. However, I need to confirm if this is indeed the most recent review by checking the sorting or filtering options. If there's a way to sort reviews by date, I should use it to ensure I'm seeing the most recent review. **Action:** I will look for a sorting option or filter that allows me to sort reviews by the most recent date. If such an option is available, I will click it to update the review list. ```json { "name": "click", "arguments": { "element_id": "11", "description": "By Type dropdown or sort option", "reasoning": "To find the most recent review, I need to sort the reviews by date if possible." } } ```

5

Action: To find the most recent car review posted on the site, I need to ensure that the reviews are sorted by the most recent date. Since the current review displayed is from October 3, 2025, it might already be the most recent one. However, I should verify this by scrolling down to see if there are any newer reviews. **Action:** ```json { "name": "scroll", "arguments": { "direction": "down", "reasoning": "To check if there are any more recent reviews posted below the current one." } } ```

6

Action: To find the most recent car review posted on the site, I need to identify the review with the latest timestamp. The timestamps are clearly visible next to each review. The most recent review is dated "October 5, 2025," which is the topmost review in the list. Action: Since I have identified the most recent review, I can now stop and provide the answer. ```json { "name": "stop", "arguments": { "answer": "The most recent car review posted on the site is dated October 5, 2025. It is a response from 'Acura of Pleasanton' thanking mgimarquez for sharing their car shopping experience and expressing appreciation for the opportunity to serve them.", "reasoning": "The review dated October 5, 2025, is the most recent one visible in the list, and it is a response from the dealership to a customer review." } } ```

Memory Bank Statistics: 100k+ trajectories • 10k+ websites • $4k collection cost

Performance comparison Leaderboard

Bold indicates the best performance, underlined denotes the second-best. Results from closed-source base models are for reference only and are excluded from ranking.

Model	MMInA		Mind2Web				WebVoyager	Avg.
Model	Wiki	Shop	Shop	Travel	Info	Service	Overall	Avg.
Closed Source
GPT-4o	51.3%	37.0%	15.4%	14.3%	22.6%	29.4%	31.8%	27.8%
Gemini-Pro-Vision	52.3%	41.6%	12.5%	25.0%	20.8%	22.8%	47.7%	30.4%
Claude-4	50.0%	40.0%	10.5%	22.2%	19.8%	26.7%	40.9%	28.8%
Open Source
Qwen2-VL-7B	7.8%	0.0%	0.0%	2.2%	8.3%	14.0%	31.8%	8.8%
Qwen2.5-VL-7B	36.7%	15.5%	2.6%	9.5%	9.6%	17.3%	40.0%	14.4%
GLM 4.1V-9B	34.7%	20.3%	13.3%	11.1%	13.6%	33.3%	40.0%	23.0%
Qwen2.5-VL-32B	43.3%	37.6%	8.0%	12.2%	7.6%	13.0%	40.9%	21.6%
Specialized Finetuned
UI-TARS-1.5	36.4%	1.0%	0.0%	14.3%	5.6%	6.5%	34.8%	13.2%
CogAgent	20.5%	7.0%	10.7%	20.0%	12.4%	20.6%	-	15.3%
Websight	12.0%	9.5%	8.3%	6.7%	13.3%	17.6%	47.7%	15.8%
Memory-augmented
UI-TARS-1.5-7B + Text-based Memory	16.0%	1.0%	0.0%	11.0%	3.6%	8.6%	34.0%	10.0%
UI-TARS-1.5-7B + CoMEM	41.3%	17.9%	14.3%	18.2%	23.3%	18.9%	38.0%	23.8%
Qwen2.5-VL-7B + Text-based Memory	34.2%	31.4%	7.1%	17.8%	12.7%	16.6%	44.0%	22.2%
Qwen2.5-VL-7B + CoMEM	47.4%	45.0%	22.2%	18.8%	26.5%	17.7%	54.5%	31.7%

Benchmark Results: Our continuous memory-augmented agent consistently outperforms baseline approaches across three real-world GUI benchmarks (MMInA, Mind2Web, and WebVoyager). Notably, Qwen-2.5-VL-7B with continuous memory achieves performance comparable to GPT-4o and Claude-4.

BibTeX

@article{wu2025comemagent,
  title={Auto-Scaling Continuous Memory For GUI Agent},
  author={Wenyi Wu and Kun Zhou and Ruoxin Yuan and Vivian Yu and Stephen Wang and Zhiting Hu and Biwei Huang},
  journal={arXiv preprint arXiv:2510.09038},
  year={2025},
  url={https://github.com/WenyiWU0111/CoMEM}
}

More Works from Our Lab

Towards General Continuous Memory for Vision-Language Models

Auto-Scaling Continuous Memory For GUI Agent

Framework Overview

Abstract

Case Studies

Amazon Shopping Task (Base Model): The baseline model without continuous memory struggles with complex shopping tasks. Without access to past trajectories and visual context, the agent does not plan well and stops at a wrong product page.

Wikipedia Information Seeking (Base Model): The baseline model without memory has difficulty efficiently navigating Wikipedia's complex information structure. It lacks the ability to recall similar navigation patterns, leading to less efficient information retrieval across multiple pages.

Memory Trajectories

GoodDog Service: "Find a breeder offering female German Shepherd puppies in the 92101 zip code"

Amazon Fresh Shopping: "Search for a specific Amazon Fresh product and check its availability"

Metroman Travel: "Compare the number of stations in all hot cities listed"

Healthcare: "Find a dentist in Jackson, MI"

Education: "Evaluate the extracurricular planning services offered by AdmissionSight"

TikTok Entertainment: "Determine which song has the longest duration among the top 10 songs"

Govfacts: "Find an article discussing the impact of climate change on agriculture"

Engineersedge: "Compare the thermal conductivity of two different rock samples listed in the table."

BestPetFect: "Identify the section detailing the exercise requirements for Golden Retrievers based on size"

Dealerrater: "Find the most recent car review posted on the site"

Performance comparison Leaderboard

Benchmark Results: Our continuous memory-augmented agent consistently outperforms baseline approaches across three real-world GUI benchmarks (MMInA, Mind2Web, and WebVoyager). Notably, Qwen-2.5-VL-7B with continuous memory achieves performance comparable to GPT-4o and Claude-4.

BibTeX