There was a lot of discussion on this in #mercurial on freenode but I'll do my best to summarize it. I don't expect this issue to be widespread as based on the commit I bisected too, it sounds like the patch just wasn't ready for some of the crazy stuff that has happened in the pidgin repository as it went from cvs -> svn -> mtn -> hg with some manual patching in between some of those steps as well. I was bootstrapping a Google Summer of Code student today and noticed that there were more modified files in the repository than there should have been. 26 files in fact that are very old and haven't been touched in a long time. I had created the repo from them using `hg init ssh://keep.imfreedom.org/prateek/pidgin` and then pushed a fresh clone of https://keep.imfreedom.org/pidgin/pidgin into it. keep.imfreedom.org is running my hgkeeper servering solution for mercurial. While I did all of this, keep was running hg 5.8. Anyways, I just did a commit by specifying the files we changed and then started debugging where these files came from. (This is important for later). I started trying to clone from other machines thinking it was something local at first. Interestingly enough, our mac build agent running hg 5.3.2 with python 2.7.17 never reproduced the issue, but a linux machine running hg 4.8.2 with python 2.7.18 did. I had a few others try with various combinations and all of them reproduced the issue as well, 26 modified files after a clone. However, Zash in #mercurial only saw 25 modified files. I ended up creating a tarball of the main pidgin repo which is currently available at https://data.imfreedom.org/pidgin/pidgin-raw.tar.xz. Using the raw repo I did a local directory clone `hg clone pidgin-raw pidgin-local` and verified the issue was not present. I then served the repo with `hg --cwd pidgin-raw serve` and cloned it with `hg clone http://localhost:8000 pidgin-local` and verified the issue was present. With that data point in hand I downgraded my local hg to 5.7.1 and performed the same test. This time, both clone did not express the issue. I then downgraded hg on keep to 5.7.1 and have been unable to reproduce for any non-poisoned repo there. I say non-poisoned because the repo I created for the GSoC student still shows the modified files on a fresh clone even with 5.7.1. That repo is https://keep.imfreedom.org/prateek/pidgin. I then bisected starting with tag:5.7.1 marked as good and tag:5.8 marked as bad. During each step, I ran `pip install -U --user .` in the hg source directory. After installation, I served a clean repo with `hg serve` and clone it with `hg clone http://localhost:8000 local` until bisect identified revision https://www.mercurial-scm.org/repo/hg/rev/49fd21f32695. The description seems like a likely candidated, but mharbison in #mercurial hasn't been able to reproduce on that revision. So as of right now, I have hg 5.7.1 running on keep.imfreedom.org to stop the spread of the poison repos. The poisoning isn't horrible, but it's not great either. So a remote init and push from a poisoned repo results in a poisoned repo. That is `hg init ssh://foo; hg push ssh://foo`. However, a local init and push does not poison. That is `hg init ../foo; hg push ../foo`. And as you may expect, a local clone of a poisoned repo poisons the new repo. That is `hg clone . ../foo`.
AFAIU it was confirmed in the IRC, and is related to storing copy/move metadata for files and the order of parents (kudos to nbjoerg for investigating).
This is actively corrupting revision on the fly. So we need: 1) an immediate fix or backout of the faulty code 2) a script to audit an existing repository and fix up the parent information.
Fixed by https://mercurial-scm.org/repo/hg/rev/bd0a2a919bf8 Pierre-Yves David <pierre-yves.david@octobus.net> corruption: add a test for issue6528 The initial reproduction script was provided by Charles Chamberlain from Jane Street. Differential Revision: https://phab.mercurial-scm.org/D10996 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/411dc27fd9fd Pierre-Yves David <pierre-yves.david@octobus.net> corruption: backout changeset 49fd21f32695 (issue6528) inverting the parent is masking copy information leading to bad content being fetched and bad status result. Since 49fd21f32695, exchange can actively swap these parent corrupting existing changesets and triggering the corruption. Data corruption are considered critical so backing this out and doing and unscheduled release seems in order. Differential Revision: https://phab.mercurial-scm.org/D10995 (please test the fix)
I am discussing the next steps here: https://bz.mercurial-scm.org/show_bug.cgi?id=6538
Bug was set to TESTING for 7 days, resolving
This bug affect version Mercurial 5.8 only and is fixed in Mercurial 5.8.1 Corruption that happened using 5.8 will however remains and keep propagating with the associated changes.
Fixed by https://mercurial-scm.org/repo/hg/rev/b30a53ffbf9b Raphaël Gomès <rgomes@octobus.net> debugcommands: introduce a debug command to repair repos affected by issue6528 This command is quite basic and slow, it will loop over the entirety of the filelogs in the repository and check each revision for corruption, then fixes the affected filelogs. It takes under 25 minutes for Mozilla-Central on my not-top-of-the-line laptop, using the `--to-report` and `--from-report` options will make this pretty tolerable to use, I think. This change also introduces a test for the fix. Differential Revision: https://phab.mercurial-scm.org/D11239 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/32e21ac3adb1 Raphaël Gomès <rgomes@octobus.net> repair: improve performance of detection of revisions affected by issue6528 Explanations inside the patch. I've tested this on Mozilla-Central and it's 5 times faster than the naive approach on my laptop. Differential Revision: https://phab.mercurial-scm.org/D11262 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/5b046c2e3000 Pierre-Yves David <pierre-yves.david@octobus.net> issue6528: implement _is_revision_affected using callback The delta comming from a bundle/stream does not exists in the revlog yet, so we will need other way to retrieve the same information. To prepare for this we split the function to use callbacks in the core logic. Differential Revision: https://phab.mercurial-scm.org/D11267 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/c02ce6def30c Pierre-Yves David <pierre-yves.david@octobus.net> issue6528: implement _is_revision_affected_fast using callback The delta comming from a bundle/stream does not exists in the revlog yet, so we will need other way to retrieve the same information. To prepare for this we split the function to use callbacks in the core logic. Differential Revision: https://phab.mercurial-scm.org/D11268 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/c30ca163b45e Pierre-Yves David <pierre-yves.david@octobus.net> issue6528: also filter delta on the fly when applying a changegroup This ensure that corrupted clone does not spread corruption to "fixed" version. This might come at a performance cost, we will had a config option to control this behavior in the next changesets. Differential Revision: https://phab.mercurial-scm.org/D11270 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/2813d406b036 Pierre-Yves David <pierre-yves.david@octobus.net> issue6528: add a config option to control the fixing on the fly This will allow people who know to be safe to avoid any performance overhead (and other potential issue). Differential Revision: https://phab.mercurial-scm.org/D11271 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/b669e40fbbd6 Raphaël Gomès <rgomes@octobus.net> help: update help text for debug-repair-issue6528 The changegroup fix was put in 5.9.1, this is now out of date. Alson, this can maybe encourage people to upgrade? Differential Revision: https://phab.mercurial-scm.org/D11392 (please test the fix)
*** Bug 6603 has been marked as a duplicate of this bug. ***
Fixed by https://mercurial-scm.org/repo/hg/rev/3eb3aef6d3bf Simon Sapin <simon.sapin@octobus.net> rhg: Mark it as expected that the issue6528 bug is not reproduced Differential Revision: https://phab.mercurial-scm.org/D11909 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/5b65721a75eb Joerg Sonnenberger <joerg@bec.de> revlog: recommit 49fd21f32695 with a fix for issue6528 `filelog.size` currently special cases two forms of metadata encoding: - copy data via the parent order as flag bit - censor data by peaking into the raw delta All other forms of metadata encoding including the empty metadata block are mishandled. In `basefilectx.cmp` the empty metadata block is explicitly checked to compensate for this. Restore 49fd21f32695, but disable it for filelog, so that the original flag bit use contines to work. Document all this mess for now in preparation of a proper rework. Differential Revision: https://phab.mercurial-scm.org/D11203 (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/e92de86cf4f8 Pierre-Yves David <pierre-yves.david@octobus.net> emitrevision: consider ancestors revision to emit as available base This should make more delta base valid. This notably affects: * case where we skipped some parent with empty delta to directly delta against an ancestors * case where an intermediate snapshots is stored. This change means we could sent largish intermediate snapshots over the wire. However this is actually a sub goal here. Sending snapshots over the wire means the client have a high odd of simply storing the pre-computed delta instead of doing a lengthy process that will… end up doing the same intermediate snapshot. In addition the overall size of snapshot (or any level) is "only" some or the overall delta size. (0.17% for my mercurial clone, 20% for my clone of Mozilla try). So Sending them other the wire is unlikely to change large impact on the bandwidth used. If we decide that minimising the bandwidth is an explicit goal, we should introduce new logic to filter-out snapshot as delta. The current code has no notion explicite of snapshot so far, they just tended to fall into the wobbly filtering options. In some cases, this patch can yield large improvement to the bundling time: ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = perf-bundle # benchmark.variants.revs = last-100000 before: 68.787066 seconds after: 47.552677 seconds (-30.87%) That translate to large improvement to the pull time : ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = pull # benchmark.variants.issue6528 = disabled # benchmark.variants.revs = last-100000 before: 142.186625 seconds after: 75.897745 seconds (-46.62%) No significant negative impact have been observed. (please test the fix)
Fixed by https://mercurial-scm.org/repo/hg/rev/f463eb675e85 Pierre-Yves David <pierre-yves.david@octobus.net> emitrevision: consider ancestors revision to emit as available base This should make more delta base valid. This notably affects: * case where we skipped some parent with empty delta to directly delta against an ancestors * case where an intermediate snapshots is stored. This change means we could sent largish intermediate snapshots over the wire. However this is actually a sub goal here. Sending snapshots over the wire means the client have a high odd of simply storing the pre-computed delta instead of doing a lengthy process that will… end up doing the same intermediate snapshot. In addition the overall size of snapshot (or any level) is "only" some or the overall delta size. (0.17% for my mercurial clone, 20% for my clone of Mozilla try). So Sending them other the wire is unlikely to change large impact on the bandwidth used. If we decide that minimising the bandwidth is an explicit goal, we should introduce new logic to filter-out snapshot as delta. The current code has no notion explicite of snapshot so far, they just tended to fall into the wobbly filtering options. In some cases, this patch can yield large improvement to the bundling time: ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = perf-bundle # benchmark.variants.revs = last-100000 before: 68.787066 seconds after: 47.552677 seconds (-30.87%) That translate to large improvement to the pull time : ### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog # benchmark.name = pull # benchmark.variants.issue6528 = disabled # benchmark.variants.revs = last-100000 before: 142.186625 seconds after: 75.897745 seconds (-46.62%) No significant negative impact have been observed. (please test the fix)