You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I feel like this is a 500-mile-email or OpenOffice-won't-print-on-Tuesdays situation, but nevertheless: I have a URL which will eventually cause the browsertrix-crawler Docker image to kill networking on the host machine. I'm using latest, and this happens every time with this URL on both WSL2 and Ubuntu 20.04.
The crawl log eventually reports ~500 "Frame check timed out" messages, then ~7000 "continueResponse failed" messages, followed by ~1000 "Protocol error" exceptions. When it eventually returns, neither subsequent runs of the Docker image nor the host machine are able to resolve anything further, whether in browsers, or with ping in a terminal window, or anything. It seems like DNS resolution is killed rather than all networking, but I didn't do extensive tests (WSL2 could be recovered by restarting networking on the host machine, Ubuntu required a full reboot).
Testing the URL itself, it seems to never stop loading in both Firefox and Chrome. It's just this particular URL, a private bookmarking site capture of a page from 2014. The nearest Wayback Machine capture of the original page doesn't exhibit this issue. A wget of the capture doesn't exhibit this issue. Still, it doesn't seem like a crash in browsertrix-crawler should bring down DNS (or more) on the host machine.
I'm not sure the best way to triage this further, the crawl log is 3.5MB, and I'd like to not share the URL to a private account in a public issue queue if possible.
Here's a link to a redacted copy of the crawl log. I can re-run the capture attempt on a non-critical machine if there are other logs that would be useful, just let me know what they are. If there's a private way to provide the URL, happy to do that also.
Thanks!
The text was updated successfully, but these errors were encountered:
I feel like this is a 500-mile-email or OpenOffice-won't-print-on-Tuesdays situation, but nevertheless: I have a URL which will eventually cause the browsertrix-crawler Docker image to kill networking on the host machine. I'm using latest, and this happens every time with this URL on both WSL2 and Ubuntu 20.04.
The crawl log eventually reports ~500 "Frame check timed out" messages, then ~7000 "continueResponse failed" messages, followed by ~1000 "Protocol error" exceptions. When it eventually returns, neither subsequent runs of the Docker image nor the host machine are able to resolve anything further, whether in browsers, or with ping in a terminal window, or anything. It seems like DNS resolution is killed rather than all networking, but I didn't do extensive tests (WSL2 could be recovered by restarting networking on the host machine, Ubuntu required a full reboot).
Testing the URL itself, it seems to never stop loading in both Firefox and Chrome. It's just this particular URL, a private bookmarking site capture of a page from 2014. The nearest Wayback Machine capture of the original page doesn't exhibit this issue. A wget of the capture doesn't exhibit this issue. Still, it doesn't seem like a crash in browsertrix-crawler should bring down DNS (or more) on the host machine.
I'm not sure the best way to triage this further, the crawl log is 3.5MB, and I'd like to not share the URL to a private account in a public issue queue if possible.
Here's a link to a redacted copy of the crawl log. I can re-run the capture attempt on a non-critical machine if there are other logs that would be useful, just let me know what they are. If there's a private way to provide the URL, happy to do that also.
Thanks!
The text was updated successfully, but these errors were encountered: