Archiving particular site kills networking on host machine #703

vitorio · 2024-10-15T04:54:36Z

I feel like this is a 500-mile-email or OpenOffice-won't-print-on-Tuesdays situation, but nevertheless: I have a URL which will eventually cause the browsertrix-crawler Docker image to kill networking on the host machine. I'm using latest, and this happens every time with this URL on both WSL2 and Ubuntu 20.04.

The crawl log eventually reports ~500 "Frame check timed out" messages, then ~7000 "continueResponse failed" messages, followed by ~1000 "Protocol error" exceptions. When it eventually returns, neither subsequent runs of the Docker image nor the host machine are able to resolve anything further, whether in browsers, or with ping in a terminal window, or anything. It seems like DNS resolution is killed rather than all networking, but I didn't do extensive tests (WSL2 could be recovered by restarting networking on the host machine, Ubuntu required a full reboot).

Testing the URL itself, it seems to never stop loading in both Firefox and Chrome. It's just this particular URL, a private bookmarking site capture of a page from 2014. The nearest Wayback Machine capture of the original page doesn't exhibit this issue. A wget of the capture doesn't exhibit this issue. Still, it doesn't seem like a crash in browsertrix-crawler should bring down DNS (or more) on the host machine.

I'm not sure the best way to triage this further, the crawl log is 3.5MB, and I'd like to not share the URL to a private account in a public issue queue if possible.

Here's a link to a redacted copy of the crawl log. I can re-run the capture attempt on a non-critical machine if there are other logs that would be useful, just let me know what they are. If there's a private way to provide the URL, happy to do that also.

Thanks!

github-project-automation bot added this to Webrecorder Projects Oct 15, 2024

github-project-automation bot moved this to Triage in Webrecorder Projects Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archiving particular site kills networking on host machine #703

Archiving particular site kills networking on host machine #703

vitorio commented Oct 15, 2024 •

edited

Loading

Archiving particular site kills networking on host machine #703

Archiving particular site kills networking on host machine #703

Comments

vitorio commented Oct 15, 2024 • edited Loading

vitorio commented Oct 15, 2024 •

edited

Loading