View Layout
Add the following views in a view controller:
Label
View A, with a subview of the same size: MTKView A
View B, with a subview of the same size: MTKView B
Refresh Rates of Each View
The label view refreshes at 60fps (driven by CADisplayLink).
MTKView A and B refresh at 15fps.
MTKView Implementation Details
The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering.
The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame.
self.metalView.enableSetNeedsDisplay = NO;
self.metalView.paused = YES;
A new high-priority queue is created for drawing, instead of handling it on the main queue.
MTKView Latency Tracking
The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer.
The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView.
Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that.
I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism.
Observation from Instruments
From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details.
Questions
According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer?
The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens?
Explanation of the Reasoning Behind Some MTKView Code Details
Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering.
Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
Delve into the world of graphics and game development. Discuss creating stunning visuals, optimizing game mechanics, and share resources for game developers.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi Apple & devs,
I'm trying to test various Windows .exe files using the Game Porting Toolkit (GPTK), but I’m hitting a wall: no matter what .exe I try, the command returns instantly with no output — no error, no logs, nothing.
Here's what I'm doing:
I'm using macOS Sequioa 15.5 on M1 macbook pro.
I installed gameportingtoolkt GPTK 2.1 through brew from gcenx:
brew install gcenx/wine/game-porting-toolkit
When I run any .exe using GPTK's wine64, like this, e.g. with steam
user@JMacBook-Pro / % WINEPREFIX=~/wine_prefix /usr/local/bin/gameportingtoolkit 'C:\SteamSetup.exe' --verbose
user@JMacBook-Pro / %
Immediate exit without any return code, output, nor errors.
No output, no crash, no logs. Same result with simple test apps
Running with WINEDEBUG=+all (still no output)
Even running wine64 does the same thing.
I’ve tried:
Removing and reinstalling GPTK
Creating a fresh WINEPREFIX
Checking /tmp and ~/Library/Logs for logs — nothing
Has anyone else experienced this or have any idea how to debug it?
Is there ANY Apple support for this??
Thanks in advance.
I have been trying to run an open source Windows executable that I would like to help porting to macOS using the Game Porting Toolkit but I stumbled on an issue quite early in the application lifecycle.
It looks like the funtion GetThreadDpiHostingBehavior is missing in USER32.dll
Has anyone any idea how to solve that?
During the startup, it fails with the following error:
TiXL crashed. We're really sorry.
The last backup was saved Unknown time to...
C:\users\crossover\AppData\Roaming\TiXL\Backup
Please refer to Help > Using Backups on what to do next.
System.EntryPointNotFoundException: Unable to find an entry point named 'GetThreadDpiHostingBehavior' in DLL 'USER32.dll'.
at System.Windows.Forms.ScaleHelper.DpiAwarenessScope..ctor(DPI_AWARENESS_CONTEXT context, DPI_HOSTING_BEHAVIOR behavior)
at System.Windows.Forms.ScaleHelper.EnterDpiAwarenessScope(DPI_AWARENESS_CONTEXT awareness, DPI_HOSTING_BEHAVIOR dpiHosting)
at System.Windows.Forms.NativeWindow.CreateHandle(CreateParams cp)
at System.Windows.Forms.Control.CreateHandle()
at System.Windows.Forms.Application.ThreadContext.get_MarshallingControl()
at System.Windows.Forms.WindowsFormsSynchronizationContext..ctor()
at System.Windows.Forms.WindowsFormsSynchronizationContext.InstallIfNeeded()
at System.Windows.Forms.Control..ctor(Boolean autoInstallSyncContext)
at System.Windows.Forms.ScrollableControl..ctor()
at System.Windows.Forms.ContainerControl..ctor()
at System.Windows.Forms.Form..ctor()
at T3.Editor.SplashScreen.SplashScreen.SplashForm..ctor()
at T3.Editor.SplashScreen.SplashScreen.Show(String imagePath) in C:\Users\pixtur\dev\tooll\tixl\Editor\SplashScreen\SplashScreen.cs:line 25
at T3.Editor.Program.Main(String[] args) in C:\Users\pixtur\dev\tooll\tixl\Editor\Program.cs:line 111
Hello. In the iOS app i'm working on we are very tight on memory budget and I was looking at ways to reduce our texture memory usage. However I noticed that comparing ASTC8x8 to ASTC12x12, there is no actual difference in allocated memory for most of our textures despite ASTC12x12 having less than half the bpp of 8x8. The difference between the two only becomes apparent for textures 1024x1024 and larger, and even in that case the actual texture data is sometimes only 60% of the allocation size. I understand there must be some alignment and padding going on, but this seems extreme. For an example scene in my app with astc12x12 for most textures there is over a 100mb difference in astc size on disk versus when loaded, so I would love to be able to recover even a portion of that memory.
Here is some test code with some measurements i've taken using an iphone 11:
for(int i = 0; i < 11; i++) {
MTLTextureDescriptor *texDesc = [[MTLTextureDescriptor alloc] init];
texDesc.pixelFormat = MTLPixelFormatASTC_12x12_LDR;
int dim = 12;
int n = 2 << i;
int mips = i+1;
texDesc.width = n;
texDesc.height = n;
texDesc.mipmapLevelCount = mips;
texDesc.resourceOptions = MTLResourceStorageModeShared;
texDesc.usage = MTLTextureUsageShaderRead;
// Calculate the equivalent astc texture size
int blocks = 0;
if(mips == 1) {
blocks = n/dim + (n%dim>0? 1 : 0);
blocks *= blocks;
} else {
for(int j = 0; j < mips; j++) {
int a = 2 << j;
int cur = a/dim + (a%dim>0? 1 : 0);
blocks += cur*cur;
}
}
auto tex = [objCObj newTextureWithDescriptor:texDesc];
printf("%dx%d, mips %d, Astc: %d, Metal: %d\n", n, n, mips, blocks*16, (int)tex.allocatedSize);
}
MTLPixelFormatASTC_12x12_LDR
128x128, mips 7, Astc: 2768, Metal: 6016
256x256, mips 8, Astc: 10512, Metal: 32768
512x512, mips 9, Astc: 40096, Metal: 98304
1024x1024, mips 10, Astc: 158432, Metal: 262144
128x128, mips 1, Astc: 1936, Metal: 4096
256x256, mips 1, Astc: 7744, Metal: 16384
512x512, mips 1, Astc: 29584, Metal: 65536
1024x1024, mips 1, Astc: 118336, Metal: 147456
MTLPixelFormatASTC_8x8_LDR
128x128, mips 7, Astc: 5488, Metal: 6016
256x256, mips 8, Astc: 21872, Metal: 32768
512x512, mips 9, Astc: 87408, Metal: 98304
1024x1024, mips 10, Astc: 349552, Metal: 360448
128x128, mips 1, Astc: 4096, Metal: 4096
256x256, mips 1, Astc: 16384, Metal: 16384
512x512, mips 1, Astc: 65536, Metal: 65536
1024x1024, mips 1, Astc: 262144, Metal: 262144
I also tried using MTLHeaps (placement and automatic) hoping they might be better, but saw nearly the same numbers.
Is there any way to have metal allocate these textures in a more compact way to save on memory?
Since macOS 15.3.2, we have observed that when another window is moved near the App Store's install button, the button disappears.
We have attached a related video in the Feedback submission here https://feedbackassistant.apple.com/feedback/20444423
Our application overlays a transparent, watermark-window on top of the system window, which causes the install button in the App Store to be hidden when a user attempts to install an application.Could you advise on how to avoid this issue?
Hello!
I'm a developer working on a plugin for the Elgato Stream Deck, called GPU Metrics. The plugin currently only works on Windows but I'd like to bring it to macOS. However, based on forum posts I've read (and StackOverflow) there isn't a very clear path to query GPU metrics like usage, temperature, used GPU memory, and power consumption. There are some tools out there that do similar things, but I wanted to see what would be the recommendation from Apple's engineering team to get this data via a public API.
Requirements:
Access GPU utilization, temperature, memory usage, power usage
C/C++ based API for querying the metrics so I can expose the data to JavaScript via Node Addon
No need to compatibile with Intel-based Macs, as Apple silicon will be fine for now
Plugin GitHub
Thank you!
Noah
Hello experts, I'm trying to implement a material with custom shader code, but I saw that visionOS doesn't allow you to inject custom Metal functions or use CustomMaterial like iOS/macOS, nor can you directly write Metal Shading Language (.metal) and use it through ShaderGraphMaterial. So my question is, if i want to implement your own shader code, how should i do it?
Matchmaking rules
https://developer.apple.com/documentation/gamekit/matchmaking-rules?language=objc
AppStoreConnectApi rules
https://developer.apple.com/documentation/appstoreconnectapi/rules?language=objc
・Environment
Unity 6000.2.2f1
XCode 16.1
iOS 26
3 iPhones
・AppStoreConnectApi rules
"type": "gameCenterMatchmakingRuleSets",
"id": "f6a88caf-85db-42bf-xxxxxxxxxxxxxxxxxxxx",
"attributes": {
"referenceName": "co.mygame.RuleSets.GvERandom34",
"ruleLanguageVersion": 1,
"minPlayers": 3,
"maxPlayers": 4
},
"type": "gameCenterMatchmakingRules",
"id": "6afa68ce-4d2c-496f-xxxxxxxxxxxxxxxxxxxx",
"attributes": {
"referenceName": "GameVersion",
"description": "Check Game Version. GvERandom34",
"type": "COMPATIBLE",
"expression": "requests[0].properties.gameVersion == requests[1].properties.gameVersion",
"weight": null
},
"type": "gameCenterMatchmakingQueues",
"id": "7fb645ef-4eca-4510-xxxxxxxxxxxxxxxxxxxx",
"attributes": {
"referenceName": "co.mygame.que.GvERandom34",
"classicMatchmakingBundleIds": []
},
・Objective-C Execution code
queueName = "co.mygame.que.GvERandom34"
keyStr = "gameVersion "
valueStr = "1.0"
- (void)MatchQueueParamStr1Start:(NSString*)queueName keyStr:(NSString*)keyStr valueStr:(NSString*)valueStr
{
if (@available(iOS 17.2, tvOS 17.2, macOS 14.2, visionOS 1.1, *) == NO)
{
DBGLOG(@"MatchQueueParamStr1Start Not support.");
return;
}
self->_matchMakingFlag = YES;
self->_matchFinishFlag = NO;
self->_myMatch = nil;
GKMatchRequest *req = [[GKMatchRequest alloc] init];
if (@available(iOS 17.2, tvOS 17.2, macOS 14.2, visionOS 1.1, *))
{
req.queueName = queueName;
req.properties = @{keyStr: valueStr};
}
[[GKMatchmaker sharedMatchmaker] findMatchForRequest:req withCompletionHandler: ^(GKMatch *match, NSError *error)
{
if (error)
{
[self SetupErrorInfo:error descriptionText:@"findMatchForRequest"];
}
else if(match)
{
self->_myMatch = match;
self->_myMatch.delegate = self;
}
self->_matchMakingFlag = NO;
self->_matchFinishFlag = YES;
}];
}
・
I'm trying to match with three devices.
Matching doesn't work.
5 minutes later times out.
What's the problem?
When running on my iPhone SE3 under IOS 18.4.1, achievement banners show as expected. The same code running on my iPad Air2 under IOS 15.8.4, achievement banners do not show, but they are accepted (as shown in the GameCenterViewController). The banners also don't show when running the simulator under iPhone 16 Pro Max under IOS 18.2 or simulator under iPhone SE3 under IOS 18.3. I haven't tried others. [Note that I clear the achievements each run during test so that I can duplicate this]
Topic:
Graphics & Games
SubTopic:
GameKit
Updated my app to include turn-based matches. Beta testing through FlightTest and all was well between iOS 18.x and 26.2 devices. One beta tester upgraded to 26.2 during beta testing and now when the MatchMaker VC is opened, it does not show existing matches. Worse, he can create new matches and play his turn, but the new match won't even show up in MMVC, even after opponent takes turn.
My app has been reviewed and is ready for release, but I'd like to know how to solve this before I release. He has tried re-installing the app, including an updated FlightTest version that is the same as the about-to-be-released reviewed version.
Topic:
Graphics & Games
SubTopic:
GameKit
Hello, I'm tracking down a bug where useResource doesn't seem to apply proper synchronization when a resource is produced by the render pass then consumed by the compute pass, but when I use MTLFence between the to signal and wait between the render/compute encoders, the artifact goes away.
The resource is created with MTLHazardTrackingModeTracked and useResource is called on the compute encoder after the render pass. Metal API Validation doesn't report any warnings/errors.
Am I misunderstanding the difference between the two APIs? I dug through the Metal documentation and it looks like useResource should handle synchronization given the resource has MTLHazardTrackingModeTracked but on the other hand, MTLFence should be used to ensure proper synchronization between command encoders. Can someone can clarify the difference between the two APIs and when to use them.
Context
I’m deploying large language models on iPhone using llama.cpp. A new iPhone Air (12 GB RAM) reports a Metal MTLDevice.recommendedMaxWorkingSetSize of 8,192 MB, and my attempt to load Llama-2-13B Q4_K (~7.32 GB weights) fails during model initialization.
Environment
Device: iPhone Air (12 GB RAM)
iOS: 26
Xcode: 26.0.1
Build: Metal backend enabled llama.cpp
App runs on device (not Simulator)
What I’m seeing
MTLCreateSystemDefaultDevice().recommendedMaxWorkingSetSize == 8192 MiB
Loading Llama-2-13B Q4_K (7.32 GB) fails to complete. Logs indicate memory pressure / allocation issues consistent with the 8 GB working-set guidance.
Smaller models (e.g., 7B/8B with similar quantization) load and run (8B Q4_K provide around 9 tokens/second decoding speed).
Questions
Is 8,192 MB an expected recommendedMaxWorkingSetSize on a 12 GB iPhone?
What values should I expect on other 2025 devices including iPhone 17 (8 GB RAM) and iPhone 17 Pro (12 GB RAM)
Is it strictly enforced by Metal allocations (heaps/buffers), or advisory for best performance/eviction behavior?
Can a process practically exceed this for long-lived buffers without immediate Jetsam risk?
Any guidance for LLM scenarios near the limit?
Hi Apple,
In VisionOS, for real-time streaming of large 3D scenes, I plan to create Metal buffers and textures in multiple threads and then use a compute shader on the main thread to copy the Metal resources into RealityKit, minimizing main thread usage. Given that most of RealityKit's default APIs require execution on the main actor (main thread), it is not ideal for streaming data. Is this approach the best way to handle streaming data and real-time rendering?
Thank you very much.
I'm currently learning Metal. While reading the reference, I came across a strange description.
Page 78 in Version 4 Reference (2025-10-25) says:
It is legal to call the following set_indices functions to set the indices if the position in the index buffer is valid and if the position in the index buffer is a multiple of 2 (uchar2 overload) or 2 (uchar4 overload). The index I needs to be in the range [0, max_indices).
void set_indices(uint I, uchar2 v);
void set_indices(uint I, uchar4 v);
However, it seems that the uchar4 overload should be multiple of 4.
Furthermore, there is no explanation of what these methods actually do. I believe it involves setting two to four consecutive indices at once, but there is no mention of that here.
I would like to know if the above understanding is correct.
Hi everyone,
I’ve been developing a custom, end-to-end 3D rendering engine called Crescent from scratch using C++20 and Metal-cpp (targeting macOS and visionOS). My primary goal is to build a zero-bottleneck, GPU-driven pipeline that maximizes the potential of Apple Silicon’s Unified Memory and TBDR architecture.
While the fundamental systems are stable, I am looking for architectural feedback from Metal framework engineers regarding specific synchronization and latency challenges.
Current Core Implementations:
GPU-Driven Instance Culling: High-performance occlusion culling using a Hierarchical Z-Buffer (HZB) approach via Compute Shaders.
Clustered Forward Shading: Support for high-count dynamic lights through view-space clustering.
Temporal Stability: Custom TAA with history rejection and Motion Blur resolve.
Asset Infrastructure: Robust GUID-based scene serialization and a JSON-driven ECS hierarchy.
The Architectural Challenge:
I am currently seeing slight synchronization overhead when generating the HZB mip-chain. On Apple Silicon, I am evaluating the cost of encoder transitions versus cache-friendly barriers.
&& m_hzbInitPipeline && m_hzbDownsamplePipeline && !m_hzbMipViews.empty();
if (canBuildHzb) {
MTL::ComputeCommandEncoder* hzbInit = commandBuffer->computeCommandEncoder();
hzbInit->setComputePipelineState(m_hzbInitPipeline);
hzbInit->setTexture(m_depthTexture, 0);
hzbInit->setTexture(m_hzbMipViews[0], 1);
if (m_pointClampSampler) {
hzbInit->setSamplerState(m_pointClampSampler, 0);
} else if (m_linearClampSampler) {
hzbInit->setSamplerState(m_linearClampSampler, 0);
}
const uint32_t hzbWidth = m_hzbMipViews[0]->width();
const uint32_t hzbHeight = m_hzbMipViews[0]->height();
const uint32_t threads = 8;
MTL::Size tgSize = MTL::Size(threads, threads, 1);
MTL::Size gridSize = MTL::Size((hzbWidth + threads - 1) / threads * threads,
(hzbHeight + threads - 1) / threads * threads,
1);
hzbInit->dispatchThreads(gridSize, tgSize);
hzbInit->endEncoding();
for (size_t mip = 1; mip < m_hzbMipViews.size(); ++mip) {
MTL::Texture* src = m_hzbMipViews[mip - 1];
MTL::Texture* dst = m_hzbMipViews[mip];
if (!src || !dst) {
continue;
}
MTL::ComputeCommandEncoder* downEncoder = commandBuffer->computeCommandEncoder();
downEncoder->setComputePipelineState(m_hzbDownsamplePipeline);
downEncoder->setTexture(src, 0);
downEncoder->setTexture(dst, 1);
const uint32_t mipWidth = dst->width();
const uint32_t mipHeight = dst->height();
MTL::Size downGrid = MTL::Size((mipWidth + threads - 1) / threads * threads,
(mipHeight + threads - 1) / threads * threads,
1);
downEncoder->dispatchThreads(downGrid, tgSize);
downEncoder->endEncoding();
}
if (m_instanceCullHzbPipeline) {
dispatchInstanceCulling(m_instanceCullHzbPipeline, true);
}
}
My Questions:
Encoder Synchronization: Would you recommend moving this loop into a single ComputeCommandEncoder using MTLBarrier between dispatches to maintain L2 cache residency, or is the overhead of separate encoders negligible for depth-downsampling on TBDR?
visionOS Bindless Latency: For stereo rendering on visionOS, what are the best practices for managing MTL4ArgumentTable updates at 90Hz+? I want to ensure that updating bindless resources for each eye doesn't introduce unnecessary CPU-to-GPU latency.
Memory Management: Are there specific hints for Memoryless textures that could be applied to intermediate HZB levels to save bandwidth during this process?
I’ve attached a screenshot of a scene rendered with the engine (PBR, SSR, and IBL).
When trying to play with friends Krazy Krownz doesn’t allow me to click multiplayer even though my Apple Game Center connected and my friends Apple game center connected as well. I even tried sending an invite from Apple Game Center to friends and Krazy Krownz doesn’t even show up on the list of available multiplayer games.
I’ve signed out and back in the same issue remain.
I’ve try to contact the game developer, but the website doesn’t work.
Topic:
Graphics & Games
SubTopic:
GameKit
I'm developing a game that supports GameKit turn based matches. What I don't understand is this:
Is tapping on the Game Center notification push messages the only way for the GKTurnBasedEventListener to trigger? What if someone misses the push message (swiping it away by accident or something like that) but still wants to join? Is there some inbox somewhere where the pending messages can be seen or fetched?
Also it was mentioned in a very old WWDC video (from 2013, I think that's the latest with information about turn based matches) that the notification also includes a badge for the icon. However, I do not understand how to implement that. Is there any documentation for that?
I'm developing a turn based game. When I present the GKTurnBasedMatchmakerViewController players can opt in for automatch instead of selecting a specific friend as opponent.
How exactly does the matching work if a player doesn't specify anything explicitly?
Does Game Center send push notifications in a round robin fashion to all friends and the first one to accept is then matched as opponent? Is this documented somewhere?
Hello, when I'm looking to customize the icons of my phone, the applications that are in the grouping genres without replacing with all-black images, I don't know what happens by changing the color of the applications in group of change no color throws just listen not the black stuff
Topic:
Graphics & Games
SubTopic:
General
I am puzzled by the setAddress(_:attributeStride:index:) of MTL4ArgumentTable. Can anyone please explain what the attributeStride parameter is for? The doc says that it is "The stride between attributes in the buffer." but why?
Who uses this for what? On the C++ side in the shaders the stride is determined by the C++ type, as far as I know. What am I missing here?
Thanks!