Push To Talk framework doesn't active audio session in background

We are trying to extend our app with Push To Talk functionality by integrating the Push To Talk framework. We are extensively testing what happens if the app is running in the foreground, in the background or not running at all.

When the app is in the foreground, and the user has joined a channel we maintain an open connection to our server. When a remote participant starts streaming audio, we immediately call setActiveRemoteParticipant on our PTChannelManager instance. The PTT system will than call our delegate's channelManager:didActivate audioSession method and we can successfully play the incoming audio.

When the app is not running at all, there is of course no active connection initially. When another participant starts talking we send a push notification. The PTT system will start our app in the background, call the incomingPushResult method on our delegate, after returning the remote participant the PTT framework will then call the channelmanager:didJoin delegate method which we will use to re-establish the server connection, the PTT framework then calls our channelManager:didActivate audioSession delegate method and we can then successfully play audio.

Now the problem. When the application was initially in the foreground and has an established server connection, we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed / removed from memory. This allows us to finish an incoming audio stream, quickly react on incoming responses etc. When we then receive an incoming audio stream after a certain delay (for example 5 seconds) we call the channelManager.setRemoteParticipant method (using try await syntax). This finishes successfully, without any error, however the channelManager:didActivate audioSession delegate method is never called. Manually setting up an audio session is not allowed either and returns an error.

Our current workaround for this issue is to disconnect the server connection as soon as the app goes into the background. This will make sure our server sends a push notification, which is successful in activating the audio session after which we can play audio. However, this means we need to re-establish the connection which will introduce an unnecessary delay before we can start playback (and currently means we loose some audio). This also means we need to do extra checks when going to the background to make sure there is no active incoming stream. After each incoming stream we have to check again if we are in the background and disconnect immediately to make sure we get a push notification next time. This can of course also lead to race conditions in an active conversation where we might need to disconnect between incoming streams and if we don't do this in time we might never get an activated audio session.

Now this might be by design, as Apple might not want us to keep the server connection active when the application enters the background state. But if that's the case I would expect the channelManager.setRemoteParticipant method to throw an error, but it doesn't. It returns successfully after which we would expect the audio session to get activated as well. So maybe we are not setting the capabilities of our project correctly (we might need other background permissions as well, although we already experimented with that), or we need to do something else to make this work?

Answered by DTS Engineer in 872136022

Now the problem. When the application was initially in the foreground and has an established server connection, we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed/removed from memory. This allows us to finish an incoming audio stream, quickly react on incoming responses, etc. When we then receive an incoming audio stream after a certain delay (for example, 5 seconds), we call the channelManager.setRemoteParticipant method (using try await syntax).

So, the short summary is that this should "just work". More specifically, all PTT apps are allowed to initiate playback at any time by calling setRemoteParticipant(), even if they're in the background.

In particular, what you're describing here:

we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed/removed from memory.

...is actually pretty common to most PTT apps, as it helps keep the conversation stream live/current compared to relying entirely on PTT pushes.

Similarly, just to be clear:

Now this might be by design, as Apple might not want us to keep the server connection active when the application enters the background state.

...no, this is not something we expect/require. You can do so if you choose, but that's not something we're particularly trying to "require".

That leads to here:

I've created an example app to demonstrate my problem. It can be found here: https://github.com/egeniq/ptt-audio-activation-test-ios

Looking at your sample, the main thing I noticed is that you're not configuring the audio session or requesting record access. After modifying your start() to include that functionality:

func start() {
	Task {
		NSLog("start()")
		do {
			channelManager = try await PTChannelManager.channelManager(delegate: self, restorationDelegate: self)
		} catch let error as PTInstantiationError {
			NSLog("Failed to create channel manager: \(error)")
		}
		AVAudioSession.sharedInstance().requestRecordPermission { granted in
			NSLog("Audio session record permission granted: \(granted)")
		}
		try AVAudioSession.sharedInstance().setCategory(AVAudioSession.Category.playAndRecord, mode: .default, options: [AVAudioSession.CategoryOptions.allowBluetoothHFP, AVAudioSession.CategoryOptions.allowBluetoothA2DP, AVAudioSession.CategoryOptions.defaultToSpeaker])
		
	}
}

...activation started happening in the background as expected:

Set remote participant after 5s fired
Setting active participant: DELAYED
Audio session activated

I don't know how that translates to your real app (which, I assume, configures its audio session), but that's the problem with your test app.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I've created an example app to demonstrate my problem. It can be found here: https://github.com/egeniq/ptt-audio-activation-test-ios

To reproduce, do the following:

  • Change team selection in Xcode.
  • Run the app.
  • Choose join.
  • Tap on the "Set remote participant NOW" button.
  • See in the log output that the audio session gets activated.
  • Tap on the "Clear remote participant" button.
  • See in the log output that the audio session gets deactivated.
  • Tap on the "Set remote participant after 5s" button.
  • Immediately go back to your phone's homescreen.
  • After 5s see in the log output that the remote participant is successfully set.
  • However also note that no audio session activation occurs.
Accepted Answer

Now the problem. When the application was initially in the foreground and has an established server connection, we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed/removed from memory. This allows us to finish an incoming audio stream, quickly react on incoming responses, etc. When we then receive an incoming audio stream after a certain delay (for example, 5 seconds), we call the channelManager.setRemoteParticipant method (using try await syntax).

So, the short summary is that this should "just work". More specifically, all PTT apps are allowed to initiate playback at any time by calling setRemoteParticipant(), even if they're in the background.

In particular, what you're describing here:

we initially keep the server connection active when the app enters the background state, until a certain timeout or the system decides our app needs to be killed/removed from memory.

...is actually pretty common to most PTT apps, as it helps keep the conversation stream live/current compared to relying entirely on PTT pushes.

Similarly, just to be clear:

Now this might be by design, as Apple might not want us to keep the server connection active when the application enters the background state.

...no, this is not something we expect/require. You can do so if you choose, but that's not something we're particularly trying to "require".

That leads to here:

I've created an example app to demonstrate my problem. It can be found here: https://github.com/egeniq/ptt-audio-activation-test-ios

Looking at your sample, the main thing I noticed is that you're not configuring the audio session or requesting record access. After modifying your start() to include that functionality:

func start() {
	Task {
		NSLog("start()")
		do {
			channelManager = try await PTChannelManager.channelManager(delegate: self, restorationDelegate: self)
		} catch let error as PTInstantiationError {
			NSLog("Failed to create channel manager: \(error)")
		}
		AVAudioSession.sharedInstance().requestRecordPermission { granted in
			NSLog("Audio session record permission granted: \(granted)")
		}
		try AVAudioSession.sharedInstance().setCategory(AVAudioSession.Category.playAndRecord, mode: .default, options: [AVAudioSession.CategoryOptions.allowBluetoothHFP, AVAudioSession.CategoryOptions.allowBluetoothA2DP, AVAudioSession.CategoryOptions.defaultToSpeaker])
		
	}
}

...activation started happening in the background as expected:

Set remote participant after 5s fired
Setting active participant: DELAYED
Audio session activated

I don't know how that translates to your real app (which, I assume, configures its audio session), but that's the problem with your test app.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I indeed used a simplified example for demonstration purposes where I didn't add the microphone permission and audio session setup. In my real application I do.

However, your answer did trigger me to check that code again, and I noticed that I accidentally added .mixWithOthers to the options. And it seems that causes the audio session not to get activated when running in the background. So I removed that option and now everything is running fine!

Also great to hear that it is ok to keep things running in the background when there is an active PTT session. It indeed improves the conversation flow a lot.

Thanks!

However, your answer did trigger me to check that code again, and I noticed that I accidentally added .mixWithOthers to the options. And it seems that causes the audio session not to get activated when running in the background.

Ahh... That makes sense.

So, as a bit of a history, before late-iOS 12/13 the way PTT apps worked was that they used a mixable (non-mixable activation never worked) PlayAndRecord session. It's expected/allowed that mixable sessions are allowed to activate in the background (for example, it's how things like turn-by-turn directions work), but that probably shouldn't have been allowed for recording sessions, since it basically allows an app to start recording whenever it wants.

To fix that, we disabled all background recording activation in late-iOS 12, so PTT apps were then moved to a CallKit-based workaround (iOS 12->iOS 15) and then later the PTT framework (iOS 16+). The CallKit workaround (as the PTT framework today) came with other major benefits[1], so the "mixable" session configuration quickly went away as existing PTT apps reconfigured their code to comply with CallKit.

In any case, I suspect there is still some code deep in the audio system from the original iOS 12 change, and that's what's causing activation to fail.

[1] Notably, using a standard (non-CallKit) audio session meant that PTT apps operated under the "standard" audio session priority system, which meant that ANY incoming call would IMMEDIATELY trigger an audio interruption. That both cut off any existing audio and also meant that the app was immediately forced to suspend, leaving the app without any good way to recover. Ironically, this was the EXACT same problem standard voip apps had… which is what had originally led to the creation of CallKit in iOS 10. The CallKit workaround is one of the few cases I can think of where a bug workaround was actually a BETTER solution than the original solution.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Wow, thank you for the extensive background information!

Wow, thank you for the extensive background information!

You're very welcome.

Kevin Elliott
DTS Engineer, CoreOS/Hardware

We are running into an issue updating the PTT service status when our app is in the background.

In one of your earlier responses you mentioned that keeping the PTT connection alive in the background is common:

"...is actually pretty common to most PTT apps, as it helps keep the conversation stream live/current compared to relying entirely on PTT pushes."

However, what we are currently observing is different.

When the debugger is not attached, it appears that all tasks are immediately suspended once the app goes to the background. When the app returns to the foreground, the URLSessionWebSocketTask fails with:

Domain: NSPOSIXErrorDomain
Code: 53

This effectively means we must immediately fall back to push notifications whenever the app backgrounds. While this works, it is not ideal.

The larger issue is that this failure is only detected after the app resumes, which means we cannot update the service status on PTChannelManager while the app is still in the background.

This leads to two different UI behaviors in the PTT interface:

Case 1: Connection active when entering background

  • The PTT UI still allows pressing the Talk button.
  • Pressing Talk triggers a reconnect, which works.

Case 2: Network lost before entering background and later restored

  • The PTT UI remains disabled (e.g. the Talk button stays disabled).
  • The service status is only corrected after reopening the app.

From your earlier response, I understood that some background activity to keep the conversation stream current should be possible.

If background work is not allowed in this situation, I would expect the app to be woken when the PTT Dynamic Island UI is opened so it can refresh the service status and update the Talk button state.

Are we missing something in how background modes should be configured for PTT apps, or is this the expected behavior for WebSocket connections used with Push-to-Talk?

Thanks!

Hmm. Maybe the problem is that I use .unavailable while disconnected, maybe I should use .connecting, even though I might not be truly reconnecting at that time. It does seem the framework does use that to activate the app when I show the PTT UI.

Scratch that. Setting the state to .connecting doesn't trigger anything in the system UI either it seems.

So the only workaround I can think of is setting the state always to .ready when the app enters the background. Which seems fishy.

When the debugger is not attached, it appears that all tasks are immediately suspended once the app goes to the background.

Yes, that's the system’s normal behavior. The debugger disables app suspension as well as the system’s normal watchdog timers because that's basically the only way it can work.

This effectively means we must immediately fall back to push notifications whenever the app backgrounds. While this works, it is not ideal.

So, there are actually two points I'd make here:

(1)
Your app can use UIApplication.beginBackgroundTaskWithName to keep itself awake in the background for a short period of time (~30s). This is actually a larger and much more complicated topic, so I'd recommend looking through the resources in this post, particularly "UIApplication Background Task Notes". However, the main thing I'd emphasize here is that you should think of your app’s background execution time as something your app manages, NOT the system. For example, the FIRST thing your app should do any time it's woken in the background is start a background task so that YOU now have some control over when your app will suspend. Lots of minor issues boil down to "I didn't realize I was relying on the system keeping me awake and now the system changed and I'm broken". That ONLY happens because the app didn't have its own background task in place.

(2)
My actual recommendation here is don't bother having your server try and track application state but instead just send a push "every" time. The nature of networking means that there isn't actually any good way for your server to "know" what your app’s state actually is. That is, the only way to find out there is any failure is when an expected message times out, which means you'll either be forced to maintain a highly active "heartbeat" connection or live with large windows where a disconnect has occurred which your server is unaware of.

Push latency does mean that your app will regularly receive the push "after" (generally ~4-10s) it the server already "told" it whatever it needed to know; however, that's generally less disruptive than it might seem. More specifically:

  • If you're already "active" and current, return the PTPushResult of the current participant (the one that's already active). The system will detect that configuration and nothing visible to the user will happen.

  • If you've already finished handling the transfer and are "done" (so you don't want to lay anything), then you can return a valid participant and then call setActiveRemoteParticipant() to stop that playback.

Depending on timing, that second path may be visible to the user, but that's generally less disruptive than the network level issues would be.

Stating that another way, I think you need to reverse your thinking here:

This effectively means we must immediately fall back to push notifications whenever the app backgrounds.

The term "fall back" here implies that the "normal" case is your app having an active connection and push is the odd/edge case. The problem here is that, in fact, the exact OPPOSITE is true. Most of the time your app is going to be in the background asleep, so most of the time push is how your app is going to find out about messages.

One suggestion I'd actually make here is to modify your app’s logic so that it AGGRESSIVELY drops its network connection, even in the foreground. That forces "everything" to go over push, which makes it easier to debug and test that code path. Once that path is working well, you can then start staying connected to your server and use that to improve your foreground and extended conversation handling.

That leads to here:

Hmm. Maybe the problem is that I use .unavailable while disconnected, maybe I should use .connecting, even though I might not be truly reconnecting at that time. It does seem the framework does use that to activate the app when I show the PTT UI.

We should update the documentation to cover this better, but PTServiceStatus is a largely cosmetic/UI feature. It was originally added to support enterprise use cases where the server was using incomingServiceUpdatePushForChannelManager to update the client device side. What I'd emphasize here is that those APIs are basically optional additions which many PTT apps don't use at all.

The don't actually change how pushes are handled which means, yes, this:

So the only workaround I can think of is setting the state always to .ready when the app enters the background.

...is the state your app should spend most, if not ALL, of its time "at". Ready just means "I expect my app to be able to function normally". You should not be trying to update your service status to match your network status.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks again for the extensive answer!

To be clear, we got things covered regarding not being able to continue doing work in the background.

What was unexpected to us however was the behaviour of the setServiceStatus method, but now I understand its use-case is different from what we thought it was. Especially because one of the values of the enum is connecting we thought we should use it to track the connection state from the client side of things. And then you run into trouble as soon as the application is moved to the background because you will lose the ability to update the state, so the last state will stick. And as the PTT UI actually is only accessible when your app is in the background, this didn't make sense to us.

Especially because one of the values of the enum is connecting, we thought we should use it to track the connection state from the client side of things.

I believe this was mostly added to be used with NEAppPushProvider. The extension point only runs on isolated networks, but it maintains its own ongoing connection so the state can be updated dynamically.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Push To Talk framework doesn't active audio session in background
 
 
Q