FileManager.replaceItemAt(_:withItemAt:) fails sporadically on ubiquitous items

Question

Created 1w

Replies 9

Boosts 0

Participants 2

I’m encountering a strange, sporadic error in FileManager.replaceItemAt(_:withItemAt:) when trying to update files that happen to be stored in cloud containers such as iCloud Drive or Dropbox. Here’s my setup:

I have an NSDocument-based app which uses a zip file format (although the error can be reproduced using any kind of file).
In my NSDocument.writeToURL: implementation, I do the following:

Create a temp folder using FileManager.url(for: .itemReplacementDirectory, in: .userDomainMask, appropriateFor: fileURL, create: true).
Copy the original zip file into the temp directory.
Update the zip file in the temp directory.
Move the updated zip file into place by moving it from the temp directory to the original location using FileManager.replaceItemAt(_:withItemAt:).

This all works perfectly - most of the time. However, very occasionally I receive a save error caused by replaceItemAt(_withItemAt:) failing. Saving can work fine for hundreds of times, but then, once in a while, I’ll receive an “operation not permitted” error in replaceItemAt.

I have narrowed the issue down and found that it only occurs when the original file is in a cloud container - when FileManager.isUbiquitousItem(at:) returns true for the original fileURL I am trying to replace. (e.g. Because the user has placed the file in iCloud Drive.) Although strangely, the permissions issue seems to be with the temp file rather than with the original (if I try copying or deleting the temp file after this error occurs, I’m not allowed; I am allowed to delete the original though - not that I’d want to of course).

Here’s an example of the error thrown by replaceItemAt:

Error Domain=NSCocoaErrorDomain Code=513 "You don’t have permission to save the file “test-file.txt” in the folder “Dropbox”." UserInfo={NSFileBackupItemLeftBehindLocationKey=file:///var/folders/mt/0snrr8fx7270rm0b14ll5k500000gn/T/TemporaryItems/NSIRD_TempFolderBug_y3UvzP/test-file.txt, NSFileOriginalItemLocationKey=file:///var/folders/mt/0snrr8fx7270rm0b14ll5k500000gn/T/TemporaryItems/NSIRD_TempFolderBug_y3UvzP/test-file.txt, NSURL=file:///Users/username/Library/CloudStorage/Dropbox/test-file.txt, NSFileNewItemLocationKey=file:///Users/username/Library/CloudStorage/Dropbox/test-file.txt, NSUnderlyingError=0xb1e22ff90 {Error Domain=NSCocoaErrorDomain Code=513 "You don’t have permission to save the file “test-file.txt” in the folder “NSIRD_TempFolderBug_y3UvzP”." UserInfo={NSURL=file:///var/folders/mt/0snrr8fx7270rm0b14ll5k500000gn/T/TemporaryItems/NSIRD_TempFolderBug_y3UvzP/test-file.txt, NSFilePath=/var/folders/mt/0snrr8fx7270rm0b14ll5k500000gn/T/TemporaryItems/NSIRD_TempFolderBug_y3UvzP/test-file.txt, NSUnderlyingError=0xb1e22ffc0 {Error Domain=NSPOSIXErrorDomain Code=1 "Operation not permitted"}}}}

And here’s some very simple sample code that reproduces the issue in a test app:

    // Ask user to choose this via a save panel.
    var savingURL: URL? {
        didSet {
            setUpSpamSave()
        }
    }
    
    var spamSaveTimer: Timer?
    
    // Set up a timer to save the file every 0.2 seconds so that we can see the sporadic save problem quickly.
    func setUpSpamSave() {
        spamSaveTimer?.invalidate()
        let timer = Timer(fire: Date(), interval: 0.2, repeats: true) { [weak self] _ in
            self?.spamSave()
        }
        spamSaveTimer = timer
        RunLoop.main.add(timer, forMode: .default)
    }
    
    func spamSave() {
        guard let savingURL else { return }
        
        let fileManager = FileManager.default
        
        // Create a new file in a temp folder.
        guard let replacementDirURL = try? fileManager.url(for: .itemReplacementDirectory, in: .userDomainMask, appropriateFor: savingURL, create: true) else {
            return
        }
        let tempURL = replacementDirURL.appendingPathComponent(savingURL.lastPathComponent)
        guard (try? "Dummy text".write(to: tempURL, atomically: false, encoding: .utf8)) != nil else {
            return
        }
        
        do {
            // Use replaceItemAt to safely move the new file into place.
            _ = try fileManager.replaceItemAt(savingURL, withItemAt: tempURL)
            print("save succeeded!")
            
            try? fileManager.removeItem(at: replacementDirURL) // Clean up.
            
        } catch {
            print("save failed with error: \(error)")
            // Note: if we try to remove replaceDirURL here or do anything with tempURL we will be refused permission.
            NSAlert(error: error).runModal()
        }
    }

If you run this code and set savingURL to a location in a non-cloud container such as your ~/Documents directory, it will run forever, resaving the file over and over again without any problems.

But if you run the code and set savingURL to a location in a cloud container, such as in an iCloud Drive folder, it will work fine for a while, but after a few minutes - after maybe 100 saves, maybe 500 - it will throw a permissions error in replaceItemAt.

(Note that my real app has all the save code wrapped in file coordination via NSDocument methods, so I don’t believe file coordination to be the problem.)

What am I doing wrong here? How do I avoid this error? Thanks in advance for any suggestions.

Answered by DTS Engineer in 878034022

I hadn’t filed a bug report yet because I had assumed it was something I was doing wrong, given that using replaceItem and a temporary folder is presumably a common pattern. I’ll file a report tomorrow - I’m following the iCloud Drive profile instructions you linked to and am now waiting the 24 hours they say I need to wait before I can get the sysdiagnose. Once I have that, I’ll file the report along with a sample project.

Perfect, thank you.

...using replaceItem and a temporary folder is presumably a common pattern

I didn't get into it above, but there very likely are some nuances/details involved that are a contributing factor. For example, I suspect this doesn't happen if you start with a security-scoped bookmark, which you resolve to a bookmark before each save. I think NSDocument's default implementation also writes out a new file before each save, which means it's always starting with a "new" URL. That doesn't mean there's anything "wrong" with what you're doing, but that's probably why this isn't more widespread.

With a bit of refactoring, I probably could retry the save. In my app, this is all done inside my NSDocument’s writeToURL method. I use my own drop-in replacement for FileWrapper (you helped me with some of the finer points of FileWrapper a few years ago).

Fabulous! Always good to hear when my ideas have worked out!

A potential problem with the re-save approach is that my save usually works by copying the zip file at the original location to a temporary location, updating it there, and then moving it into place using replaceItemAt.

Just to clarify, are you:

a) Copying the file once, modifying it over time, then copying that file back for each save.

b) Copying the file prior to each save operation.

I suspect you're doing "a" (and it's probably what I would do), but if you're doing "b”, then that changes things a bit.

Assuming you're starting with "a", then my intuition would be to:

Commit your change to your temp file.
Clone that file into a new temp file.
Use that new temp file as the source for your save.

There are a few advantages to this:

It may avoid the immediate issue here, since you'll always be replacing with a "new" file object.
If anything goes wrong, you can retry the save by restarting with a clean clone.
It can be a useful architecture to build on for other edge cases.

Expanding on that last point, one of the issues you can run into is cases where the files involved are large and the final save destination is VERY slow (like an SMB drive). Putting that in concrete terms, let’s say you want to autosave every 1s, but the save destination is going to take 5s-10s to complete the save. Here is one way to handle that:

Your app copies from the destination to your local storage. This becomes your "working" copy that you modify.
Your app autosaves to local storage every ~1s.
Your app pushes that initial save data to the final target.
Every time the final save finishes, it starts a new save using the most recent save.

In other words, your app can rely on its "standard" set of 1s autosaves, but you're actually only saving to the final target every 5-10s (as the previous save finishes).

One final point here— if you're working with package and large file counts, directory cloning may provide a significant performance benefit. The man pages warn against cloning directories, but this forum post explains what the actual risks are and when it's a reasonable option.

I wonder, though— given that the original file has in fact been replaced by the temp file despite the error, can I not just check for this and ignore the error if the file seems to have been replaced after all? E.g.: ... Is there something wrong with this approach?

That is a REALLY tough call. The problem here is that your visibility into the exact cause of the error is limited, so while it's certainly safe in the particular case, it's hard to be sure that you're ACTUALLY dealing with this exact case. Even worse, my concern here would be the proliferation of edge cases, both in terms of what's out there "today" and in terms of future configurations/changes.

My own instincts would be to redo the entire save, but if you want to do this, I would do two things:

Look at the NSError object you're getting back "in detail" so you can identify as "specific" a failure as possible. Notably, I think you can use NSUnderlyingErrorKey to pull an NSError object for the lower-level error, so I'd look at that object (and possibly any underlying NSError), not just catching the "fail".
...then an ID check to confirm you're "right" about the failure. I'd even check things like file size and possibly times so that you're "sure" everything looks the way you'd expect. Most of that metadata is collected with a single syscall, so it gets you a little extra safety without actually making things slower.

The goal is to fingerprint a particular failure you consider "safe", not just trusting the ID change. Having said that, I'd also be tempted to expand the check in #2 and run it against all files, not just the cases where you got an error. Done properly, there’s minimal performance cost, and there are worse things an app can do than double-checking its saves.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Boost

Answer 1

KeithB OP

1w

An update on this weird behaviour:

I have discovered that when replaceItem fails in this circumstance, the temp file has in fact been moved into place correctly and has replaced the original file. But when I get the error, the old original file has taken the place of the old temp file and it's that which cannot be removed.

I have tested this by checking both the content and the fileResourceIdentifier of the original file and the temp file, and logging them before and after the error. After the error they are swapped.

0

Answer 2

DTS Engineer OP

Apple

1w

I’m encountering a strange, sporadic error in FileManager.replaceItemAt(_:withItemAt:) when trying to update files that happen to be stored in cloud containers such as iCloud Drive or Dropbox.

Have you filed a bug on this and, if so, what's the bug number? As part of that bug, I'd suggest installing the "iCloud Drive" profile, reproducing the issue a few times, then uploading a sysdiagnose of the failure. See the profile installation instructions for the full details of that process.

And here’s some very simple sample code that reproduces the issue in a test app:

Thank you for that. I got your code up and running in a test app and was able to replicate the problem fairly easily. As to WHY it's happening, that's unclear. From the console log, it appears that the entire replace sequence worked fine but the sandbox then rejected access to the temporary file as the kernel was trying to cleanup post-swap. Weirdly, it doesn't appear to be blocking actual access to the file (continuing after the failure worked fine), so I think the issue is at least partially tied to the very specific circumstances the swap creates.

What am I doing wrong here?

I'm not sure you're doing anything wrong, as I think this is a bug.

How do I avoid this error?

Have you tried retrying the save? That appears to work in my testing, though it may not be a workable solution in your case. Beyond that, I'd need a better understanding of exactly how you're interacting with the files and what your full requirements are.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 3

KeithB OP

1w

Many thanks for the reply and information, Kevin, much appreciated.

Have you filed a bug on this and, if so, what's the bug number? As part of that bug, I'd suggest installing the "iCloud Drive" profile, reproducing the issue a few times, then uploading a sysdiagnose of the failure. See the profile installation instructions for the full details of that process.

I hadn’t filed a bug report yet because I had assumed it was something I was doing wrong given that using replaceItem and a temporary folder is presumably a common pattern. I’ll file a report tomorrow - I’m following the iCloud Drive profile instructions you linked to and am now waiting the 24 hours they say I need to wait before I can get the sysdiagnose. Once I have that I’ll file the report along with a sample project.

Have you tried retrying the save? That appears to work in my testing, though it may not be a workable solution in your case. Beyond that, I'd need a better understanding of exactly how you're interacting with the files and what your full requirements are.

With a bit of refactoring I probably could retry the save. In my app this is all done inside my NSDocument’s writeToURL method. I use my own drop-in replacement for FileWrapper (you helped me with some of the finer points of FileWrapper a few years ago) that incrementally writes changes to a zip file using Libzip, which supports incremental saves on copy-on-write systems such as APFS.

A potential problem with the re-save approach is that my save usually works by copying the zip file at the original location to a temporary location, updating it there, and then moving it into place using replaceItemAt. After this particular replaceItemAt error, however, the original file has in fact been updated despite the error (the error being on the old version of the file which is now in the temporary directory). So if I re-save by making a copy of that and try updating again, I could potentially mess up the file by trying to save into it stuff that has actually already been done. (However, I do keep a snapshot of the older archive around in case of problems, so I might be able to work around this problem using that.)

I wonder, though - given that the original file has in fact been replaced by the temp file despite the error, can I not just check for this and ignore the error if the file seems to have been replaced after all? E.g.:

Before replacement, record the file resource ID of the temp file.
Use replaceItemAt(originalURL, withItemAt: tempURL).
If there’s an error, get the file resource ID for the file at the intended saving location and compare it against the ID I recorded in (1). If they are the same, I know the replacement has succeeded despite the error. In this case, I can just try to delete the temporary folder and move on.
If the file IDs of the current user file and the temp file from before replace don’t match or couldn’t be got, attempt a re-save.

Is there something wrong with this approach? (I’ve attached some sample code below demonstrating how this might work.)

Many thanks, Keith

// Get a temporary folder appropriate for creating the new file in.
let replacementDirURL = try fileManager.url(for: .itemReplacementDirectory, in: .userDomainMask, appropriateFor: savingURL, create: true)

// Create the new file at the temporary location.
let tempURL = replacementDirURL.appendingPathComponent(savingURL.lastPathComponent)
try createNewContentAt(url: tempURL)

// Record the file resource ID of the temp file we created.
let tempFileID = (try? tempURL.resourceValues(forKeys: [.fileResourceIdentifierKey]))?.fileResourceIdentifier

// Now try to move the file into place.
do {
    // Use replaceItemAt to safely replace the original file with the updated file we created at the temp location.
    _ = try fileManager.replaceItemAt(savingURL, withItemAt: tempURL)

    // Clean up.
    try? fileManager.removeItem(at: replacementDirURL)
    
} catch {
    // Check to see if the original file was in fact replaced despite the error.
    if let tempFileID,
       let savingFileID = (try? savingURL.resourceValues(forKeys: [.fileResourceIdentifierKey]))?.fileResourceIdentifier,
       tempFileID.isEqual(savingFileID) {
        
        // If so, just try to remove the temp dir and move on.
        try? fileManager.removeItem(at: replacementDirURL)
                
    } else {
        // If we got here, replace really did fail and we need to handle it.
                
        // We should do some more work and try to resave here before throwing an error.
                
        throw error
    }
}

0

Answer 4

KeithB OP

4d

Just to add that I have now filed the bug as #FB22107069.

0

Answer 5

DTS Engineer OP

Apple

4d

Recommended

I hadn’t filed a bug report yet because I had assumed it was something I was doing wrong, given that using replaceItem and a temporary folder is presumably a common pattern. I’ll file a report tomorrow - I’m following the iCloud Drive profile instructions you linked to and am now waiting the 24 hours they say I need to wait before I can get the sysdiagnose. Once I have that, I’ll file the report along with a sample project.

Perfect, thank you.

...using replaceItem and a temporary folder is presumably a common pattern

I didn't get into it above, but there very likely are some nuances/details involved that are a contributing factor. For example, I suspect this doesn't happen if you start with a security-scoped bookmark, which you resolve to a bookmark before each save. I think NSDocument's default implementation also writes out a new file before each save, which means it's always starting with a "new" URL. That doesn't mean there's anything "wrong" with what you're doing, but that's probably why this isn't more widespread.

With a bit of refactoring, I probably could retry the save. In my app, this is all done inside my NSDocument’s writeToURL method. I use my own drop-in replacement for FileWrapper (you helped me with some of the finer points of FileWrapper a few years ago).

Fabulous! Always good to hear when my ideas have worked out!

A potential problem with the re-save approach is that my save usually works by copying the zip file at the original location to a temporary location, updating it there, and then moving it into place using replaceItemAt.

Just to clarify, are you:

a) Copying the file once, modifying it over time, then copying that file back for each save.

b) Copying the file prior to each save operation.

I suspect you're doing "a" (and it's probably what I would do), but if you're doing "b”, then that changes things a bit.

Assuming you're starting with "a", then my intuition would be to:

Commit your change to your temp file.
Clone that file into a new temp file.
Use that new temp file as the source for your save.

There are a few advantages to this:

It may avoid the immediate issue here, since you'll always be replacing with a "new" file object.
If anything goes wrong, you can retry the save by restarting with a clean clone.
It can be a useful architecture to build on for other edge cases.

Expanding on that last point, one of the issues you can run into is cases where the files involved are large and the final save destination is VERY slow (like an SMB drive). Putting that in concrete terms, let’s say you want to autosave every 1s, but the save destination is going to take 5s-10s to complete the save. Here is one way to handle that:

Your app copies from the destination to your local storage. This becomes your "working" copy that you modify.
Your app autosaves to local storage every ~1s.
Your app pushes that initial save data to the final target.
Every time the final save finishes, it starts a new save using the most recent save.

In other words, your app can rely on its "standard" set of 1s autosaves, but you're actually only saving to the final target every 5-10s (as the previous save finishes).

One final point here— if you're working with package and large file counts, directory cloning may provide a significant performance benefit. The man pages warn against cloning directories, but this forum post explains what the actual risks are and when it's a reasonable option.

I wonder, though— given that the original file has in fact been replaced by the temp file despite the error, can I not just check for this and ignore the error if the file seems to have been replaced after all? E.g.: ... Is there something wrong with this approach?

That is a REALLY tough call. The problem here is that your visibility into the exact cause of the error is limited, so while it's certainly safe in the particular case, it's hard to be sure that you're ACTUALLY dealing with this exact case. Even worse, my concern here would be the proliferation of edge cases, both in terms of what's out there "today" and in terms of future configurations/changes.

My own instincts would be to redo the entire save, but if you want to do this, I would do two things:

Look at the NSError object you're getting back "in detail" so you can identify as "specific" a failure as possible. Notably, I think you can use NSUnderlyingErrorKey to pull an NSError object for the lower-level error, so I'd look at that object (and possibly any underlying NSError), not just catching the "fail".
...then an ID check to confirm you're "right" about the failure. I'd even check things like file size and possibly times so that you're "sure" everything looks the way you'd expect. Most of that metadata is collected with a single syscall, so it gets you a little extra safety without actually making things slower.

The goal is to fingerprint a particular failure you consider "safe", not just trusting the ID change. Having said that, I'd also be tempted to expand the check in #2 and run it against all files, not just the cases where you got an error. Done properly, there’s minimal performance cost, and there are worse things an app can do than double-checking its saves.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

1

Answer 6

KeithB OP

3d

Thanks again for the reply, and especially for such a thorough and helpful one!

For example, I suspect this doesn't happen if you start with a security-scoped bookmark, which you resolve to a bookmark before each save.

Out of curiosity I just tested this, and I still see the bug. To see it yourself, just use the code from my first post but change the savingURL accessor to use a security-scoped bookmark, as follows:

private var bookmarkData: Data?
    
// Ask user to choose this via a save panel.
var savingURL: URL? {
    get {
        var isStale = false
        if let bookmarkData,
           let url = try? URL(resolvingBookmarkData: bookmarkData, options: .withSecurityScope, relativeTo: nil, bookmarkDataIsStale: &isStale) {
            if isStale {
                // Should really update the bookmark data here...
            }
            return url
        } else {
            return nil
        }
    }
    set(newURL) {
        bookmarkData = try? newURL?.bookmarkData(options: .withSecurityScope)
        setUpSpamSave()
    }
}

Then add the following to the top of spamSave() after checking savingURL is non-nil:

let didAccess = savingURL.startAccessingSecurityScopedResource()
if didAccess == false {
    print("Failed to start accessing scoped URL.")
}
 defer {
    if didAccess {
        savingURL.stopAccessingSecurityScopedResource()
    }
}

The save will fail with the same permissions error every now and then.

Just to clarify, are you:

a) Copying the file once, modifying it over time, then copying that file back for each save.

b) Copying the file prior to each save operation.

I suspect you're doing "a" (and it's probably what I would do), but if you're doing "b”, then that changes things a bit.

I’m actually doing (b), since this is very fast on copy-on-write volumes such as APFS even for large files. (Copy-to-temp file is almost instant; updating the zip file is super-fast too thanks to LibZip’s support for copy-on-write, meaning it doesn’t recreate the entire zip file; then it's just a matter of moving the updated file back into place using replaceItemAt(_:withItemAt:.)

For slower volumes, much like Pages.app, we offer a second, package-based version of our file format which supports in-place saving. (Zip-based is the default since it works everywhere and package-based files don’t work with cloud-based services other than iCloud Drive on iOS. But if saves get particularly slow, we prompt users to consider the package-based option.)

As I mentioned, I keep a snapshot of the zip file from the previous save around (for cases where the user has accidentally deleted the underlying file between saves), and I have done some testing and I can indeed re-save successfully using that. That is slow, though, since it has to recreate the entire archive. On APFS it’s faster to make a backup copy of the temp file before using replaceItemAt and then to try with the copy if it fails - that seems to work well.

Based on your suggestions, though, I’m going to do a bit of refactoring. Keeping the zip file around in the temp folder and making copies from it for replaceItemAt sounds like a great solution with multiple advantages, and since my custom file wrapper already keeps a reference to the previous snapshot, it wouldn’t be difficult to have it keep a reference to a temp file URL too.

Your app copies from the destination to your local storage…. Your app pushes that initial save data to the final target.

This is a little off-topic, but in this case - where you are doing the intensive work on your local storage and then pushing back to the slower volume when done - what is the safest way of replacing the original file? The point of FileManager.url(for: . itemReplacementDirectory…) is to return a temp folder on the same volume as the passed-in URL, since replaceItemAt(_:withItemAt:) won’t work if the original and new URLs are on different volumes. The only other way I can think of risks data loss:

Delete the original file from the destination.
Move the updated file from the local storage to the destination.

We could make a temp copy of the original file before (1), but if the volume is slow, that adds back in some of the slowness we’re avoiding by doing work on another volume.

My own instincts would be to redo the entire save, but if you want to do this, I would do two things:

I’m going to focus on retrying the save. I’m curious though as to whether the bug could occur twice in immediate succession, so that the resave also triggers the error. Although I can’t get this to happen in testing, given that the error seems random, I wonder if it is possible if I ran the test for long enough - a day, say. In that case, I wonder if this approach would work:

Save - encounter some sort of save error.
Retry the save no matter what the error was.
If we get another error, examine the error and if it was caused by the bug, just try deleting the temp file and move on.

I’ve attached some code at the end of this post that scrutinises the error to check it matches the one triggered by this bug. (Although I wonder if I should use .fileContentIdentifier instead of .fileResourceIdentifier.)

Anyway, thanks again, as I’m very close to a solution now.

0

Answer 7

KeithB OP

3d

Error scrutiny:

struct FileInfo: Equatable {

    init?(url: URL) {
        guard
            let resourceVals = try?url.resourceValues(forKeys: [.fileResourceIdentifierKey, .fileSizeKey]),
            let fileID = resourceVals.fileResourceIdentifier,
            let fileSize = resourceVals.fileSize
        else {
            return nil
        }
        self.fileID = fileID
        self.fileSize = fileSize
    }
        
    private let fileID: (any NSCopying & NSSecureCoding & NSObjectProtocol)
    private let fileSize: Int
        
    static func == (lhs: ViewController.FileInfo, rhs: ViewController.FileInfo) -> Bool {
        return lhs.fileSize == rhs.fileSize && lhs.fileID.isEqual(rhs.fileID)
    }
}

func isSafeReplaceError(_ error: Error, fileURL: URL, tempURL: URL, oldFileInfo: FileInfo?, oldTempFileInfo: FileInfo?) -> Bool {
    // Using the file resource IDs and file size, ensure that the temp file and original file have been swapped.
    guard
        let oldFileInfo,
        let oldTempFileInfo,
        let fileInfo = FileInfo(url: fileURL),
        let tempFileInfo = FileInfo(url: tempURL),
        oldFileInfo == tempFileInfo,
        oldTempFileInfo == fileInfo,
        tempFileInfo != fileInfo
    else {
        return false
    }
        
    let nsError = error as NSError
        
    guard
        // Check this is a permissions error in the Cocoa error domain.
        nsError.domain == NSCocoaErrorDomain,
        nsError.code == NSFileWriteNoPermissionError,
        // Check "NSURL" and "NSFileNewItemLocationKey" keys both point to the file we tried to replace.
        let errorURL = nsError.userInfo[NSURLErrorKey] as? URL,
        let newItemURL = nsError.userInfo["NSFileNewItemLocationKey"] as? URL,
        errorURL.path(percentEncoded: false) == newItemURL.path(percentEncoded: false),
        newItemURL.path(percentEncoded: false) == fileURL.path(percentEncoded: false),
        // Check "NSFileOriginalItemLocationKey" and "NSFileBackupItemLeftBehindLocationKey" both point to the temp file.
        let originalURL = nsError.userInfo["NSFileOriginalItemLocationKey"] as? URL,
        let leftBehindURL = nsError.userInfo["NSFileBackupItemLeftBehindLocationKey"] as? URL,
        originalURL.path(percentEncoded: false) == leftBehindURL.path(percentEncoded: false),
        originalURL.path(percentEncoded: false) == tempURL.path(percentEncoded: false),
        // Ensure there is only a single underlying error.
        nsError.underlyingErrors.count == 1
    else {
        return false
    }
        
    // Now get the underlying error.
    let underlyingError = nsError.underlyingErrors[0] as NSError
    guard
        // Check the underlying error is also a permissions error in the Cocoa domain.
        underlyingError.domain == NSCocoaErrorDomain,
        underlyingError.code == NSFileWriteNoPermissionError,
        // And ensure the the error is with the temp file.
        let underlyingErrorURL = underlyingError.userInfo[NSURLErrorKey] as? URL,
        underlyingErrorURL.path(percentEncoded: false) == tempURL.path(percentEncoded: false),
        // Ensure the underlying error also has a single underlying error.
        underlyingError.underlyingErrors.count == 1
    else {
        return false
    }
        
    // Now get the underlying error for the underlying error. This should be a POSIX error with error code 1 ("Operation not permitted").
    let rootError = underlyingError.underlyingErrors[0] as NSError
    return rootError.domain == NSPOSIXErrorDomain && rootError.code == 1
}

0

Answer 8

DTS Engineer OP

Apple

2d

Out of curiosity, I just tested this, and I still see the bug. To see it yourself, just use the code from my first post but change the savingURL accessor to use a security-scoped bookmark, as follows:

To be honest, that was basically a blind (well, slightly educated...) guess. To be honest, the whole combination of factors is fairly odd (timing is random, failure self-corrects, etc.).

One thing to pass along— I just did a bit of testing with retrying the copy, and "clearing" the error seems to be tied to TIME, not retry count. If you decide to go the retry route, you may want to delay the save for a second or so instead of just retrying over and over.

I’m actually doing (b), since this is very fast on copy-on-write volumes such as APFS even for large files. (Copy-to-temp file is almost instant.)

I think "very fast" actually understates how significant the performance difference is. As an "industry", I'm not sure we've really processed how constant-time copying should change file management.

For slower volumes, much like Pages.app, we offer a second, package-based version of our file format which supports in-place saving.

I don't know if anyone has ever shipped a solution that worked like this, but given the performance benefit, it might be worth thinking about using DiskImages as a "file format". You can mount the disk image outside of the user’s "view", then use it as your working storage. That won't work for all cases, but it could be useful in some situations.

This is a little off-topic, but in this case— where you are doing the intensive work on your local storage and then pushing back to the slower volume when done— what is the safest way of replacing the original file?

The "replaceItem(at:...)" documentation actually answers this, which is to copy the item to the destination volume, then use "replaceItem(at:...)" to finish the transfer. The reason "backupItemName" exists is that if anything occurs that prevents replace from completing, then backupItemName contains the original file. This is why we cover all the file systems where atomic file replacement doesn't exist.

Also note this is in the same reference:

"If an error occurs and the original item is not in the original location or a temporary location, the resulting error object contains a user info dictionary with the key "NSFileOriginalItemLocationKey". The value assigned to that key is an NSURL object with the location of the item."

I’m going to focus on retrying the save. I’m curious though as to whether the bug could occur twice in immediate succession, so that the resave also triggers the error.

Yes, it will, at least in my testing. More specifically, I modified your test project to this:

var finishedSave = false
var failCount = 0
while(!finishedSave) {
	do {
		// Use replaceItemAt to safely move the new file into place.
		_ = try fileManager.replaceItemAt(savingURL, withItemAt: tempURL)
		try? fileManager.removeItem(at: replacementDirURL) // Clean up.

		finishedSave = true
	} catch {
		failCount += 1
		if(failCount == 1){
			NSLog("First Fail on \(count)")
		}
	   sleep(1)
	}
}
if(failCount > 0) {
	NSLog("\(count) cleared after \(failCount) retries")
}

And... and it took from 1 to 10 retries for the replace to succeed. Note that the issue does seem to be tied to time, not try count. I tried this first without the sleep and all that changed was that I generated a lot more calls to "replaceItemAt". Now, I don't know how this would translate to a more "real" save logic.

(Although I wonder if I should use .fileContentIdentifier instead of .fileResourceIdentifier.)

Mostly, you'll want .fileResourceIdentifier. fileContentIdentifier is an APFS specific[1] identifier that allows you to identify related clones. I think it actually implies identical contents, so two files with the same fileContentIdentifier have the same physical content, not just relationship, but either way it's not really useful for what you're doing. fileResourceIdentifier is what you want, as it's basically "the inode number plus other data to deal with all the edge cases".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 9

KeithB OP

2d

Thanks again!

I think "very fast" actually understates how significant the performance difference is.

Ha, true. In practice it seems “instant”, to the extent that on APFS, updating huge zip files is not much slower than in-place saving into a package.

I don't know if anyone has ever shipped a solution that worked like this, but... it might be worth thinking about using DiskImages as a "file format".

Interesting! Although cross-platform compatibility might be an issue here.

The "replaceItem(at:...)" documentation actually answers this…

Sorry, I should have been more clear, although thinking about it I have been tying myself up in knots and the solution was indeed here all along. I was referring to the circumstances we were discussing before, where we don’t want to do the temp work on the same volume as the destination because the destination volume is slow.

In other words, we have deliberately created the temp folder for updating our file on another volume (e.g. one that supports APFS), because the one created using url(for: .itemReplacementDirectory…) would be too slow, and now we need to move that temp file into place on the other volume.

From your answer I realise I was overlooking the obvious: after doing the work in the fast temp directory, I then need to create a second temp directory on the slower destination volume using url(for: .itemReplacementDirectory…), copy the file across, and then use replaceItemAt from there.

Yes, it will, at least in my testing. More specifically, I modified your test project to this:

while(!finishedSave) {
    _ = try fileManager.replaceItemAt(savingURL, withItemAt: tempURL)

This approach wouldn’t work anyway. The nature of this specific error means that you cannot retry replaceItemAt on the same URLs like this, because after the error, savingURL and tempURL have swapped places. So in your sample code, if the second replaceItemAt succeeds, you’ve just replaced the newer version with the older version again, so that the save has effectively done nothing. We’ll only get the result we want when failCount % 2 == 0.

You can test this by logging the expected and actual final content of the file (i.e. log the content of tempURL before the loop, and the content of savingURL after it). Whenever failCount % 2 == 1, you’ll end up with old content at the destination, because of the alternate swapping of the original and new files.

The other problem with retrying replaceItemAt on the same URLs is that, as you note, tempURL (which after the initial replaceItemAt error contains the older file that was previously in the ubiquitous storage) still has the lock (?) on it which caused the permissions error. So any attempts to use that will continue to fail until the kernel (?) has finished with it.

For these reasons, we were previously talking about making a fresh copy of the updated temp file before trying replace, and calling replaceItemAt on that, so that we keep around a valid copy of the new file with which we can try again. (E.g. Have a working copy in the temp dir, update that, clone it, try replace using the clone, if that fails, try again with a fresh clone of the working copy.)

To update your code using this sort of approach:

var tempCopyURL = tempURL.deletingLastPathComponent().appending(path: UUID().uuidString)
var finishedSave = false
var failCount = 0
while (!finishedSave) {
    do {
        // Create a clone of our new file for replace.
        try fileManager.copyItem(at: tempURL, to: tempCopyURL)
        // Try to replace using the clone.
        _ = try fileManager.replaceItemAt(savingURL, withItemAt: tempCopyURL)
        try? fileManager.removeItem(at: replacementDirURL) // Clean up.
        finishedSave = true
    } catch {
        failCount += 1
        if(failCount == 1) {
            NSLog("First Fail on \(count-1)")
        }
        // Try again on the next pass with a fresh clone.
        tempCopyURL = tempURL.deletingLastPathComponent().appending(path: UUID().uuidString)
    }
}
if(failCount > 0) {
    NSLog("\(count-1) cleared after \(failCount) retries")
}

For me, this succeeds on the first retry every time, because we’re working with a fresh temp file, not the one that we’re denied access to. Out of 50,000 saves, I hit the error 150 times and each time it resolved on first retry. (It also ensures we end up with the correct version of the file being moved into place.) The disadvantage of course is that you’re adding in an extra copy of the temp file, which adds overhead on non-APFS/copy-on-write volumes.

To return to my original question:

I’m curious though as to whether the bug could occur twice in immediate succession, so that the resave also triggers the error.

Here I was wondering whether we could, on rare occasions, encounter the error twice in immediate succession even with the approach of using a fresh clone of the temp file for each attempt. My suspicion is that this shouldn’t happen, because here’s my wild (and completely uneducated!) guess as to what is happening:

Given that this weird error only happens for ubiquitous files, I’m guessing that the problem occurs when the kernel is intermittently doing something cloud-related with the original file, putting some sort of lock on it that prevents us from deleting it - but not from moving it for some reason.
replaceItemAt successfully swaps out the original ubiquitous file for the replacement, but the kernel still has a lock on the original file (which is now in the temp folder) and so won’t allow it to be deleted, so replaceItemAt throws an error.
So if at this point we immediately retry replaceItemAt with a fresh clone, all should be good because the kernel shouldn’t be doing anything yet with the file that was, in the same run loop, just swapped into the destination URL. (At this point in fact the file at the destination URL and the fresh clone we’re replacing it with are identical.)

Does that sound reasonable?

Mostly, you'll want .fileResourceIdentifier. fileContentIdentifier is an APFS specific[1] identifier

Thank you. I realised my mistake on this late yesterday while testing.

So, given all of the above, I think my approach should be:

Make a working copy in a temp dir (if destination doesn’t support cloning but local storage does, make the working copy on the local storage): workingCopyURL.
On save, update the working copy.
Copy the working copy to a folder created using url(for: .itemReplacementDirectory…): tempURL.
Use replaceItemAt, replacing destinationURL with tempURL.
If replaceItemAt fails, AND isUbiquitous is true for destinationURL, create a fresh copy of the working copy, and try replaceItemAt again with that. (If the file wasn’t ubiquitous, just throw the error.)
If replaceItemAt fails the second time, examine the error to check for this very specific bug, and if it all checks out, move on.

0